Cipres Portal2 Help| Hennig86 Format

A simple Hennig86 file looks like this:

xread
'an optional text string in single quote chars'
10 5
TaxonA   0000000000
TaxonB   0010111000
TaxonC 1011110000
TaxonD   1111111000
TaxonE   1111111000

;

XREAD and NSTATES commands are case insensitive.

XREAD defines a set of taxa and their character data. If no NSTATES command is given prior to the XREAD command, the program assumes the characters are discrete data. In the absence of other commands, up to 16 states are allowed byXREAD. Symbols 0-9 to indicate states 0-9, and a-f indicate states 10-15. This can be modified by NSTATES (see below) with a maximum of 10 states.

The XREAD command is followed by modifiers as follows:

' ... '	- a title for the data set, enclosed in single quote characters. This argument is optional, but it must be the first argument if present.
nchar	- a positive integer indicating the number of characters in each data sequence. This must be the second argument if a title was given, the first otherwise.
ntax	- a positive integer indicating the number of taxa in the data set. This argument must be the third argument if a title was given, the second otherwise.

For non-interleaved data, the integers specifying nchar and ntax are followed by a carriage return, and a series of input strings that define the taxon and its associated data matrix. These arguments have the following format:

NAME CHARACTERS

where NAME is the name of the taxon and CHARACTERS is the associated data sequence.

The first character in the name of a taxon must be an alphabetical character, and subsequent characters are restricted to alphabetical characters, numeric characters, underscores, or periods. There is no limit on the length of a taxon name. Any number of spaces or carriage returns can be inserted between taxon names and character states, but the character string may not contain a space or a carriage return.

Polymorphisms in the data sequence are indicated by enclosing the states of a character in square brackets.

For interleaved data, the data is split into blocks, and each block must preceded by input that specifies the input type. This is done using an ampersand, and a specification enclosed in square brackets, as follows:

&[TYPE]

where TYPE is one of the following: NUMERIC, DNA, PROTEINS, or CONTINUOUS to indicates the type of character data in the block. A valid file using these commands looks like this:

xread
'an optional text string in single quote chars'
20 5

&
TaxonA 0000000000
TaxonB 0010111000
TaxonC 1011110000
TaxonD 1111111000
TaxonE 1111111000

&[dna]
TaxonA TGAGCAGGAA
TaxonB GTTGGAACAT
TaxonC TCTTTAAGTC
TaxonD TGAGCCGGTA
TaxonE GGAACTTCTC
;

A single ampersand character with no [TYPE] statement indicates that the data is of the default type, NUMERIC, unless if has been specified by a previous NSTATES command. If the data in the block differs from the default type, then the ampersand must be followed by the type enclosed in square brackets.

Taxa in each block must have the same exact names (case sensitive). The order of taxa in each block does not matter, all taxa in each block must have the same number of characters. Each row of characters must end with a carriage return.

NSTATES

A file that uses the NSTATES command looks like this:

nstates dna;
xread
'an optional text string in single quote chars'
10 5
TaxonA TGAGCAGGAA
TaxonB GTTGGAACAT
TaxonC TCTTTAAGTC
TaxonD TGAGCCGGTA
TaxonE GGAACTTCTC

;

XREAD alone specifies the default data type: discrete characters. The NSTATES command is used to modify the default. NSTATES determines how character data in subsequent commands are to be interpreted. The NSTATES command is followed by a data specification term, which may be one of the following:

DNA - characters are DNA, represented by IUPAC nucleotide symbols
PROT - characters are amino acids, represented by IUPAC amino acid symbols
NUM N - characters are unspecified discrete data, with N being a positive integer indicating the maximum number of possible states.

For NUMN, states are represented by the alpha-numeric symbols 0-9 for states 0-9; and alphabetic characters A-V to indicate states 10-31. The maximum number of states allowed is 32. Note that NUM is optional; 'NSTATES 16' would set the data type to unspecified alpha-numeric, and the maximum number of states to 16.

The NSTATES command must be followed by a semicolon, and then followed by XREAD, as described above.

If there is a tool or a feature you need, please let us know.

Cipres Web Portal