How does RAxML-Light work in the CIPRES Science Gateway? Our RAxML-Light interface allows users to take advantage of RAxML-Light for inference of very large trees. It is implemented as a script that combines RAxML 7.2.8, Parsimonator, and RAxML-Light. It is now possible to run RAxML-Light on large compute resources without the need to write perl-scripts that would be required for RAxML-Light alone. RAxML-Light is brought to you by the iPlant Collaborative. To use it, visit the iPlant Discovery Environment, and register. Why would I use RAxML-Light instead of regular RAxML? RAxML-Light is designed to decrease the memory footprint of regular tree searches. This makes it possible to analyze very large data sets without exceeding the available memory. Data sets appropriate for RAxML Light have more than 10,000 taxa and 10-20 genes, or more than around two hundred taxa and in the neighborhood of 1000 genes. For smaller data sets, you should probably stick with regular RAxML. The largest trees that have been computed using RAxML-Light alone include a tree with almost 120,000 taxa and 2 genes, which ran nicely on a single 48 core node with 128GB memory under the CAT model. Data sets with 1,481 taxa and 20,000,000 sites have also been analyzed using 672 cores and almost 1TB of RAM under the CAT model. The implementation available through the CIPRES Science Gateway runs on a single 32 core node with 64 GB of memory. If you feel your data set requires more resources, please let us know. RAxML-Light only implements CAT and GAMMA models of rate heterogeneity for DNA and protein data. Today we support only DNA data use, but expect to support protein data use in the near future. What features does RAxML-Light offer that allow reconstruction of huge trees?
Please Note:
What can RAxML-Light compute? RAxML-Light is used for analyzing very large trees to infer trees under Maximum Likelihood. Unlike standard RAxML, e a comprehensive (containing all taxa) bifurcating starting tree must be given to RAxML-Light. The script used in the CIPRES Gateway obtains the starting tree from standard RAxML or parsimonator. See Figure 1 below to see how these features work together. RAxML-Light program options are explained in the information sections of the interface, and in the manual. Many of these options are similar to the standard RAxML options. |
|
|
Figure1. Workflow for the RAxML-Light interface. | ||
How it Works To compute a ML tree on data set dna.phy, one need only upload the data set to the CIPRES Gateway, and configure the run. If you request bootstraps, the script will first generate a set of replica alignment files using the -f j option of standard RaxML: $RAXMLSERIAL -s $sequence_file -m $substitution_model -n BS -f j -b $bseed -N $bsearches This creates replica alignments called infile.BSn, where n is the number of each bootstrap. The number of bootstraps is user-specified. It can be 0, in which case this step is omitted. Next, parsimony starting trees are created for the input file and for the replica alignments using parsimonator: $PARSIMONATOR -s ${sequence_file}.BS\$i -n PB\$i -p \$seed This creates a parsimony starting tree called RAxML_parsimonyTree.PBn for each replica alignment, and a parsimony tree called RAxML_parsimonyTree.PRn for the input file (or the best likelihood tree from all the bootstrap searches). Next, RAxML-Light does rapid bootstrap searches on the replica alignments, and regular tree searches on the input data set. For bootstrap searches: $RAXMLLIGHT -s ${sequence_file}.BS\$i -m $substitution_model -n LB\$i -D $save_memory -t RAxML_parsimonyTree.PB\${i}.0 -T $searchcores & This uses replica alignments .BSn, and starting tree parsimonyTree.PB to infer a likelihood tree for each replica data set. The tree is written to RAxML_Tree.LBn, where n is the bootstrap number. These likelihood trees are used to measure convergence and provide support values. The best likelihood tree can also be used as a starting tree for regular searches. For regular searches: $RAXMLLIGHT -s $sequence_file -m $substitution_model -t \$start_tree -n LR\$i -T $searchcores $save_memory This uses the input alignment and either the parsimony tree or the best bootstrap likelihood tree to infer a likelihood tree for each replica data set. This command is repeated for n iterations, the number of regular searches specified by the user. The inferred tree is written to a file .LRn, where n is the regular search replicate number. These likelihood trees are used to measure convergence and provide support values. The best likelihood tree can also be used as a starting tree for regular searches. REFERENCES [Ott2007] M. Ott, J. Zola, S. Aluru, A. Stamatakis: “Large-scale Maximum Likelihood-based Phylogenetic Analysis on the IBM BlueGene/L”. In Proceedings of IEEE/ACM Supercomputing (SC2007) conference, Reno, Nevada, November 2007. [Stamatakis2010] A. Stamatakis: "Phylogenetic Search Algorithms for Maximum Likelihood". In M. Elloumi, A.Y. Zomaya, editors. Algorithms in Computational Biology: techniques, Approaches and Applications, John Wiley and Sons, 2010. If you use RAxML-Light, please cite: RAxML-Light version 1.0.5 by Alexandros Stamatakis Alexandros Stamatakis affiliation is: |
||
If there is a tool or a feature you need, please let us know.