GARLI (Genetic Algorithm for Rapid Likelihood Inference) performs phylogenetic searches (tree inference) on aligned sequence datasets using the maximum-likelihood criterion. Tips for running on XSEDE here. Version 0.96 of the program addeds amino acid and codon-based models, in addition to the standard nucleotide models available. Available substitution models include:
Nucleotide models: All models are nested within the General Time Reversible (GTR) model,
optionally with discrete gamma distributed rate heterogeneity and/or an inferred proportion of invariable sites.
Amino acid models: Many well known fixed amino acid rate matrices (Dayhoff, Jones, WAG, mtRev, mtmam) are supported, with either fixed or observed (aka “+F”) amino acid frequencies, and discrete gamma distributed rate heterogeneity and/or an inferred proportion of invariable sites.
Note: Garli does not accept the ambiguous amino acid characters. These include X, B, and Z. Derrick reports that this will be fixed in a minor bug-fix update that
soon. For unknown the default is "?", although with Nexus you can define it to be whatever you want. Thus, a Nexus dataset that uses only X and
not ? could be read fine by adding missing=X to the Format line. If the
dataset is in non-Nexus format (usually Phylip), then the only options
are ? and -. In all cases the gap character, by default "-", is treated
identically to a missing character
Codon models: The basic Goldman and Yang (1994) model and other related models are supported, with a number of options for codon frequencies (equal, “F1x4”, “F3x4”, observed) and one or more estimated non-synonymous rate categories (aka dN/dS or ω parameters). If you are new to this kind of analysis, please check our Codon model FAQ page.
GARLI is loosely based on the program GAML (Lewis 1998). It uses a genetic algorithm approach to simultaneously find the topology, branch lengths and model parameters that maximize the lnL. This involves the evolution of a population of solutions termed individuals, with each individual encoding a tree topology, a set of branch lengths and a set of model parameters. Each individual is assigned a fitness based on its lnL score. Each generation random mutations are applied to some of the components of the individuals, and their fitnesses are recalculated. The individuals are then chosen to be the parents of the individuals of the next generation, in proportion to their fitnesses. This process is repeated many times, and the population of individuals evolves toward higher fitness solutions. Note that the highest fitness individual is automatically maintained in the population, ensuring that it is not lost due to chance (genetic drift).
The mutation types used by GARLI are divided into three types: topological mutations, model parameter mutations and branch-length mutations. Topological mutations consist of the standard NNI and SPR rearrangement types, as well as a localized form of SPR in which the pruned subtree may only be reattached to branches within a certain radius of its former location. Topological mutations are followed by some degree of rough branch-length optimization. Model mutations simply choose one of the model parameters and multiply it by a gamma-distributed variable with mean 1.0. When branch-length mutations are performed, a number of branches are chosen and each has its current length multiplied by a different gamma-distributed variable.
Output files: The file garli_run.run00.boot.tre contains the best scoring tree in the set of replicates (2 by default) for bootstrap 00. For each bootstrap requested, there will be a corresponding file name, where the number 00 is incremented 1,2,3, etc up to 99 (if you ran 100 bootstraps). Each such file contains the best tree found for the replicates in the respective BS run.
The file “allBootTrees.tre “ assembles the best tree from each bootstrap into a single Nexus file for convenience of downloading. You must take the trees assembled in allBootTrees.tre and calculate the consensus tree outside of GARLI. The author suggests SumTrees, PAUP*, CONSENSE of the PHYLIP package or Phyutility. Other options exist.You can use Consense in the CIPRES Science Gateway to perform this operation. First you must save the allBootTrees.tre file to your current folder using the button provided at the top of the page when you view that file. Next, convert the allBootTrees.tre file to phylip format using NCL converter, specifying input format as Nexus and output format as Phylip. When the job completes, save the output file out.tre to your folder with the button provided. Finally, run Consense on this file.
If you have a local version of PAUP*, the GARLI wiki provides further help on how to accomplish this using PAUP*.
Dr. Zwickl is currently a researcher at NESCENT.
GARLI home page here.
INPUT = DNA or protein matrices in Nexus or non-interleaved Phylip format
The table below shows the kinds of results returned by CIPRES Science Gateway
Input File Names | Sample File from a Test |
input file | garli_input.nex |
parameter file | garli.conf |
Sample Output File Type | File Name |
screen_dump bootstrap 00 | garli_run.run00.screen.log |
best score every x generations, bootstrap 00 | garli_run.run00.log00.log |
best tree found in each set of replicates bootstrap 00 | garli_run.run00.boot.tre |
best trees from all bootstraps in one file | allBootTrees.tre |
If you use GARLI, please cite:
Zwickl, D. J. (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin. (pdf)
If there is a tool or a feature you need, please let us know.