GARLI extended CODON MODEL FAQS.

How do I use the CODON Model?

The codon models are built with three components:

  • Parameters describing the process of individual nucleotide substitutions.
  • Equilibrium codon frequencies.
  • Parameters describing the relative rate of nonsynonymous to synonymous substitutions.

  • The nucleotide substitution parameters within the codon models are exactly the same as those in the standard nucleotide models found in GARLI, and are specified by the ratematrix parameter. The ratematrix parameter allows models of the 2rate variety (inferring different rates for transitions and transversions, K2P or HKY-like), the 6rate variety (inferring different rates for all nucleotide pairs, GTR-like) or any other sub-model of GTR.

    The options for codon frequencies are specified with the statefrequencies parameter. The options are to use equal frequencies (not a good option), the frequencies observed in your dataset (termed empirical in GARLI), or the codon frequencies implied by the F1x4 or F3x4 methods (using PAML's terminology). These last two options calculate the codon frequencies as the product of the frequencies of the three nucleotides that make up each codon. In the F1x4 case the nucleotide frequencies are those observed in the dataset across all codon positions, while the F3x4 option uses the nucleotide frequencies observed in the data at each codon position separately.

    The final component of the codon models is the relative nonsynonymous/synonymous rate parameters (aka dN/dS or omega parameters). The default is to infer a single dN/dS value. Alternatively, a model can be specified that infers a specified number of dN/dS categories, with the dN/dS values and proportions falling in each category estimated (ratehetmodel = nonsynonymous). This is the discrete or M3 model in PAML's terminology.

    What is the role of reading frames in GARLI codon analysis?

    One thing to note is that codon models for tree inference require that you align the protein coding sequences along a correct reading frame; (e.g., gaps of 1 or 2 bases will impair the analysis). Maintaining the reading frame makes alignment much easier even if your analysis will be at the nucleotide level. Just running sequences through a sequence alignment program without looking is almost guaranteed to return an alignment that will not work for codon based inference.

    What are the restricions on the start of a coding sequence when using the CODON amino acid model?

    GARLI expects the alignment to begin on the first base of a codon. It can't figure out where the reading frame is, so if the alignment starts with a partial codon it needs to be removed or excluded from the alignment.

    Can I put my sequences in register by excluding parts of sequences using standard Nexus syntax when using the CODON amino acid model?

    Version 0.96 does allow normal NEXUS exclusions through an assumptions block, so the following would work to exclude the first two bases of an alignment and tell GARLI that the reading frame starts on the third.


    begin assumptions;
    exset * myexset = 1 2;
    end;.


    I got an error message saying a stop codon was encountered when using the CODON:AminoAcid Model.

    Stop codons under the selected genetic code are not allowed. These are as follows: Standard code (TAG, TAA and TGA); vertebrate mitochondria (TAG, TAA, AGA and AGG); invertebrate mitochondria (TAG and TAA). While stop codons could just be ignored and treated as missing, this can be dangerous. Sometimes they will come from a sequencing error, but more often from an alignment problem, an incorrectly chosen genetic code, or a sequence that is not really coding (e.g. an intron). In any case, error should be examined and resolved consciously by the user.

     

    If you have further questions, please let us know.