Exploring Bivalvia Phylogenomics with UltraConserved Elements

Testing ultraconserved elements (UCEs) for phylogenetic inference across bivalves (Mollusca: Bivalvia)

Sara Gonzalez-Delgado, Paula C. Rodríguez-Flores, and Gonzalo Giribet

In 1980, the difficulty in assembling DNA sequences meant that data sets had limited breadth in terms of taxa even for single gene analyses and Sanger sequencing. The advent of Next Generation Sequencing has had a huge impact on the entire field, increasing the throughput of sequence acquisition dramatically, and decreasing the costs of sequence acquisition. By the mid 2000s, as commercial NGS operations appeared, the amount of sequence data was growing exponentially and the cost of sequence acquisition dropped by orders of magnitude. DNA sequencing was both dramatically faster and less costly. As a disruptive technology, NGS technologies changed the limiting step for progress in systematics and evolutionary biology; for the first time, discovery in systematics and evolutionary biology became more dependent on the rate of data analysis than on data acquisition. The existence of CIPRES is in large measure due to the increased need for access to high end computational resources driven by the explosion of sequence data during this period.

This trend has continued to the present day, where the entire genome sequence of a eukaryotic organism can be obtained in a fairly short time at low cost. This in turn has made the field of Phylogenomics (formally born around 1998) flourish. It is now possible to assemble taxon-rich genome sequence data sets for comparison in a human relevant time frame, moving towards the ability to analyze the evolutionary history of an entire genome in context. The assembly and analysis of whole genome data sets is a heavy lift, both computationally and logistically ( e.g.making orthology assignments). The question becomes how much information can we glean from a subset of the genome, and what strategies can be used to ease the logistical and computational burden of analyzing genomic data?

The state of the field suggests that for many zoologists assembling full genomes remains labor-intensive, expensive, and requires fresh samples. The alternative of using mRNA sequencing for transcriptomics can be used to generate hundreds of loci, but incurs its own unique monetary and computational costs; and can only be done using RNA-suitable tissues. Recently, genome-subsampling techniques have been used as an alternative. Using hybridization for sequence capture is cost-efficient, and can succeed in analyzing DNA that has not been preserved under ideal conditions.

In this article, Gonzalez-Delgado et al. explore the possibility that Ultraconserved Elements (UCEs) may be informative in resolving family relationships across Bivalvia. Bivalve phylogenetics were chosen for study owing to their long history as a subject of phylogenetic analysis, the availability of genomic resources for many commercially important species, and the instability of the relationships between bivalve families. Conflicting results between mitogenomic and Sanger-based amplicon sequencing/phylotranscriptomics have so far made it difficult to establish the history of the clade with confidence.

UCEs are highly conserved short regions within the genome that are shared with little modification among evolutionarily distant taxa. UCEs and their flanking regions can be used to reconstruct the evolutionary history of taxa at various time scales, from deep to shallow phylogenetic divergences. Moreover, UCEs work well with degraded DNA as well as high quality DNA, which allowed the authors to make use of sub-optimally preserved deep-sea tissues already available as museum specimens across all bivalve clades.

The authors designed a probe set representing major bivalve taxonomic groups. The resulting universal probe set used 19,588 baits derived from genomes available at NCBI and captured 1,513 UCE loci. When tested in silico with 15 bivalve genomes, between 1493 and 592 (98.87–41.23%) UCEs were captured. Nine genomes of other major groups of Mollusca captured 1,453 (97.71%) of UCEs, while gastropods, cephalopods, and other molluscs had between 605 and 267 (41.52–25.26%).

In vitro testing of the probe set on 98 sequenced samples from all major bivalve groups gave an average of Illumina reads after assembly of 348,319.37 contigs, ranging from 1094 to 1,360,104 and the average of UCE locus recovery was 528.23, ranging from 9 to 1,275. A museum sample from1872 yielded 255 UCEs, showing the potential advantage of using UCEs to examine older and more degraded samples.

Phylogenetic reconstruction was accomplished using a set of tools available in CIPRES: After alignment with MAFFT, tree inference was conducted using IQ-Tree v. 2.2.6 w/ModelFinder, ExaBayes v1.5.1 under a GTRGAMMA model, and ASTRAL-III 5.15.4 to infer a species tree from all individual unrooted gene trees under the multispecies coalescent model.

Main Take Away Points:

  • The authors created a probe set of ultraconserved-elements that efficiently captures sequence data across Bivalvia
  • The probe set efficiently captures outgroups, invitro and in vitro, using an array of samples.
  • The probe set is quite efficient for fresh tissues c.ollected and preserved for molecular use to museum samples.
  • The probe set is also efficient for old DNA samples stored in freezers, stored dry as well as in EtOH.
  • An important difficulty was detected for DNA yields for museum samples collected between 1949 and1993, apparently due to the use of formalin.
  • The probe set is confirmed to be useful at different taxonomic levels, as results were existing bivalve phylogenies using fewer loci and phylotranscriptomic analyses.
  • The phylogenetic trees gave robust results in key clades down to species relationships
  • On the other hand the approach left some issues in bivalve phylogenetics unresolved.
  • The probe set provides a useful tool for meaningful exploration when the resources to sequence full genomes are not at hand.
hummingbird in flight

Get 1000 Hours free

On the UCSD Supercomputer

Start Your Trial