Clustal Format

CLUSTAL Format:

CLUSTAL format is usually identified with the suffix ".aln".

From the EBI Site: ALN format was originated in the alignment program ClustalW. The file starts with word "CLUSTAL" and then some information about which clustal program was run and the version of clustal used. e.g. "CLUSTAL W (2.1) multiple sequence alignment" The type of clustal program is "W" and the version is 2.1. The alignment is written in blocks of 60 residues. Every block starts with the sequence names, obtained from the input sequence, and a count of the total number of residues is shown at the end of the line. The information about which residues match is shown below each block of residues:

"*" means that the residues or nucleotides in that column are identical in all sequences in the alignment.
":" means that conserved substitutions have been observed.
"." means that semi-conserved substitutions are observed. An example is shown below.

CLUSTAL W 2.1 multiple sequence alignment      


FOSB_MOUSE      ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPAVDPYDMPGTSYSTPGLSAYSTGGASGS 60  
FOSB_HUMAN      ITTSQDLQWLVQPTLISSMAQSQGQPLASQPPVVDPYDMPGTSYSTPGMSGYSSGGASGS 60
                ********************************.***************:*.**:******  

A more strict definition of the format is as follows:

    • A line showing the degree of conservation for the columns of the alignment in this block.
    • One or more empty lines.

If there is a tool or a feature you need, please let us know.

hummingbird in flight

Get 1000 Hours free

On the UCSD Supercomputer

Start Your Trial