|
|
clustal |
In brief, the multiple alignment is carried out in 3 stages :
You can instead of letting the program make pairwise alignments provide yourself an existing tree and let the program use this as "guide tree" (option -usertreefile -treefile=<file with "guide tree">, see Output file formats for the format). You could use a tree file generated by a previous run of clustal and modified with a tree editor like njplot.
To increase the chance of finding the correct alignment clustal uses a lot of tricks, especially for proteins. For more details, see below.
For nucleic acids the scores for transitions (A<-->G or C<-->T i.e. purine-purine or pyrimidine-pyrimidine substitutions) in the nucleotide comparison matrix are multiplied with a Transition Weight between 0 and 1. A weight of zero means that the transitions are scored as mismatches, while a weight of 1 gives the transitions the match score. For distantly related nucleic acid sequences, the weight should be near to zero ; for closely related sequences it can be useful to assign a higher score. The default is set to 0.5.
For proteins different amino acid comparison matrices are used depending on the mean percent identity of the sequences to be aligned. Although the input matrix can contain positive as well as negative values, the values are rescaled to all positive unless you switch this off with -norescale (this sometimes gives better results if the sequences are of uneven length).
The default or user selected gap penalties are not used as such, but are
adapted. It has been shown that varying the gap penalties used with
different weight matrices can improve the accuracy of sequence
alignments. The average score for two mismatched residues is used as
scaling factor. Furthermore, the percent identity of the two (groups of)
sequences to be aligned is used to increase the gap penalty for closely
related sequences and decrease it for more divergent sequences. Also,
the scores for both true and false sequence alignments grow with the
length of the sequences. So, the logarithm of the length of the shorter
sequence is used to increase the gap penalty :
<corrected gap penalty> = (<gap penalty>
+ log(min(N,M))) * <average residue mismatch score>
* <percent identity scaling factor>
where N and M are the lengths of the two sequences.
The penalty is also modified depending on the difference between the
lengths of the two sequences to be aligned. If one sequence is much
shorter than the other, it is increased to inhibit too many long gaps in
the shorter sequence :
<corrected gap lenght penalty> = <gap lenght
penalty> * (1.0 + |log(N/M)|)
acgtacgtacgtacgt acgtacgtacgtacgt a----cgtacgtacgt gets the same score as ----acgtacgtacgtNOW, terminal gaps are free. This is better on average and stops silly effects like single residues jumping to the edge of the alignment. However, it is not perfect. It does mean that if there should be a gap near the end of the alignment, the program may be reluctant to insert it i.e.
cccccgggccccc cccccgggccccc ccccc---ccccc may be considered worse (lower score) than cccccccccc---In the right hand case above, the terminal gap is free and may score higher than the laft hand alignment. This can be prevented by lowering the gap and gap length penalties. It is difficult to get this right all thetime. Please watch the ends of your alignments.
The alignment of the most distantly related sequences is delayed until after the most closely related sequences have been aligned. The Max. % identity required to delay the addition of a sequence can be set ; sequences that are less identical than this level to any other sequences will be aligned later.
Hydrophylic gap penalties are used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids ; these are likely to be loop or random coil regions where gaps are more common. The residues that are "considered" to be hydrophilic can be entered as -hgapresidues=GPSNDQEKR.
The Gap Separation Distance tries to decrease the chances of gaps being too close to each other. Gaps that are less than this distance apart are penalised more than other gaps. This does not prevent close gaps ; it makes them less frequent, promoting a block-like appearance of the alignment. By default, end gaps are ignored for this purpose, but you can request to treat end gaps just as internal gaps.
> clustal Global multiple alignment of sequences Input sequence(s): list::ADH.list Multiple sequence alignment USA [yahk_ecoli.fasta]: msf::ADH.msf Guide tree output filename [yahk_ecoli.dnd]: ADH.dnd |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
[-seqs] seqall Sequence(s) filename and optional format, or
reference (input USA)
* -usertreefile infile User provided guide tree. Only required if
you put -usertree
[-outseqs] seqoutall [<sequence>lt;format> Multiple sequence
alignment USA
* -treefile outfile [*.clustal] Guide tree output filename
Additional (Optional) qualifiers (* if not always prompted):
* -pwa menu [slow] Algorithm for pairwise alignments
(Values: fast (fast - approximate
(Wilbur-Lipman)); slow (slow - accurate
(Needleman-Wunsch)))
* -pwdnamatrix menu [IUB] Nucleotide comparison matrix for
pairwise alignment (Values: IUB (1.9/0.0
matrix with handling of ambiguities);
CLUSTALW (1.0/0.0 matrix); own (user
provided matrix))
* -pwmatrix menu [Gonnet] Amino acid comparison matrix for
pairwise alignment (Values: BLOSUM (BLOSUM
series 80, 62, 45, 30); PAM (PAM series 20,
60, 120, 350); Gonnet (Gonnet series 80,
120, 160, 250 and 350); id (identity matrix
10.0/0.0); own (user provided matrix))
* -pwusermatrix infile User provided symbol comparison matrix (in
BLAST format) for pairwise alignment
* -pwgappenalty float [15.0 for nucleic, 10.0 for protein] Gap
penalty for pairwise alignment (Number from
0.000 to 100.000)
* -pwgaplength float [6.66 for nucleic, 0.1 for protein] Gap
length penalty for pairwise alignment.
CLUSTAL subtracts from the similarity score
for each gap a penalty of type <Gap penalty>
+ <Gap length penalty> * n (Number from
0.000 to 10.000)
* -ktuple integer [2 for nucleic, 1 for protein] Wilbur-Lipman
ktup size. Decrease for sensitivity,
increase for speed (Integer from 1 to 4 for
nucleic, 2 for protein)
* -topdiags integer [4 for nucleic, 5 for protein] Wilbur-Lipman
number of best diagonals to consider
(Integer from 1 to 50)
* -window integer [4 for nucleic, 5 for protein] Wilbur-Lipman
window size for looking at diagonals around
best diagonals (Integer from 1 to 50)
* -joinw integer [5 for nucleic, 3 for protein] Wilbur-Lipman
penalty for joining different diagonals
(Integer from 1 to 500)
* -dnamatrix menu [IUB] Nucleotide comparison matrix for
multiple alignment (Values: IUB (1.9/0.0
matrix with handling of ambiguities);
CLUSTALW (1.0/0.0 matrix); own (user
provided matrix))
* -matrix menu [Gonnet] Amino acid comparison matrix for
multiple alignment (Values: BLOSUM (BLOSUM
series 80, 62, 45, 30); PAM (PAM series 20,
60, 120, 350); Gonnet (Gonnet series 80,
120, 160, 250 and 350); id (identity matrix
10.0/0.0); own (user provided matrix))
* -usermatrix infile User provided symbol comparison matrix (in
BLAST format) for multiple alignment
* -[no]rescale boolean [Y] Rescale amino acid comparison matrix to
all positive values or use negative values
(for proteins only). Option -norescale could
be useful if proteins are of very uneven
length.
* -transitionw float [0.5] Transition weight : proportion between
score of AG or CT pair and pair of
identical bases (Number from 0.000 to 1.000)
-gappenalty float [15.0 for nucleic, 10.0 for protein] Gap
penalty for multiple alignment (Number from
0.000 to 100.000)
-gaplength float [6.66 for nucleic, 0.2 for protein] Gap
length penalty for multiple alignment.
CLUSTAL subtracts from the similarity score
for each gap a penalty of type <Gap penalty>
+ <Gap length penalty> * n (Number from
0.000 to 10.000)
-delay integer [30] Max. % identity for delay of divergent
sequences (Integer from 0 to 100)
* -[no]pgap boolean [Y] Use gap penalties dependant on amino
acids at edge (for proteins only)
* -[no]hgap toggle [Y] Use lower gap penalties in strings of at
least 5 hydrophylic amino acids (for
proteins only)
* -hgapresidues string [GPSNDQEKR] Hydrophylic amino acids (Any
string is accepted)
* -gapdist integer [4] Gap Separation Distance. Use higher gap
penalty for gaps separated by less than n
amino acids (for proteins only) (Integer
from 0 to 100)
* -endgaps boolean [N] Use higher gap penalty also for gaps at
ends
-outorder menu [aligned] Order of sequences in output
(Values: input (same as input); aligned
(according to order of progressive
alignment))
Advanced (Unprompted) qualifiers:
-usertree toggle [N] Use user provided guide tree
Associated qualifiers:
"-seqs" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-outseqs" associated qualifiers
-osformat2 string Output seq format
-osextension2 string File name extension
-osname2 string Base file name
-osdirectory2 string Output directory
-osdbname2 string Database name to add
-ossingle2 boolean Separate file for each entry
-oufo2 string UFO features
-offormat2 string Features format
-ofname2 string Features file name
-ofdirectory2 string Output directory
"-treefile" associated qualifiers
-odirectory string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [-seqs] (Parameter 1) |
Sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required | ||||||||||
| -usertreefile | User provided guide tree. Only required if you put -usertree | Input file | Required | ||||||||||
| [-outseqs] (Parameter 2) |
Multiple sequence alignment USA | Writeable sequence(s) | <sequence>.format | ||||||||||
| -treefile | Guide tree output filename | Output file | <sequence>.dnd | ||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||
| -pwa | Algorithm for pairwise alignments |
|
slow | ||||||||||
| -pwdnamatrix | Nucleotide comparison matrix for pairwise alignment |
|
IUB | ||||||||||
| -pwmatrix | Amino acid comparison matrix for pairwise alignment |
|
Gonnet | ||||||||||
| -pwusermatrix | User provided symbol comparison matrix (in BLAST format) for pairwise alignment | Input file | Required | ||||||||||
| -pwgappenalty | Gap penalty for pairwise alignment | Number from 0.000 to 100.000 | 15.0 for nucleic, 10.0 for protein | ||||||||||
| -pwgaplength | Gap length penalty for pairwise alignment. CLUSTAL subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n | Number from 0.000 to 10.000 | 6.66 for nucleic, 0.1 for protein | ||||||||||
| -ktuple | Wilbur-Lipman ktup size. Decrease for sensitivity, increase for speed | Integer from 1 to 4 for nucleic, 2 for protein | 2 for nucleic, 1 for protein | ||||||||||
| -topdiags | Wilbur-Lipman number of best diagonals to consider | Integer from 1 to 50 | 4 for nucleic, 5 for protein | ||||||||||
| -window | Wilbur-Lipman window size for looking at diagonals around best diagonals | Integer from 1 to 50 | 4 for nucleic, 5 for protein | ||||||||||
| -joinw | Wilbur-Lipman penalty for joining different diagonals | Integer from 1 to 500 | 5 for nucleic, 3 for protein | ||||||||||
| -dnamatrix | Nucleotide comparison matrix for multiple alignment |
|
IUB | ||||||||||
| -matrix | Amino acid comparison matrix for multiple alignment |
|
Gonnet | ||||||||||
| -usermatrix | User provided symbol comparison matrix (in BLAST format) for multiple alignment | Input file | Required | ||||||||||
| -[no]rescale | Rescale amino acid comparison matrix to all positive values or use negative values (for proteins only). Option -norescale could be useful if proteins are of very uneven length. | Boolean value Yes/No | Yes | ||||||||||
| -transitionw | Transition weight : proportion between score of AG or CT pair and pair of identical bases | Number from 0.000 to 1.000 | 0.5 | ||||||||||
| -gappenalty | Gap penalty for multiple alignment | Number from 0.000 to 100.000 | 15.0 for nucleic, 10.0 for protein | ||||||||||
| -gaplength | Gap length penalty for multiple alignment. CLUSTAL subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n | Number from 0.000 to 10.000 | 6.66 for nucleic, 0.2 for protein | ||||||||||
| -delay | Max. % identity for delay of divergent sequences | Integer from 0 to 100 | 30 | ||||||||||
| -[no]pgap | Use gap penalties dependant on amino acids at edge (for proteins only) | Boolean value Yes/No | Yes | ||||||||||
| -[no]hgap | Use lower gap penalties in strings of at least 5 hydrophylic amino acids (for proteins only) | Toggle value Yes/No | Yes | ||||||||||
| -hgapresidues | Hydrophylic amino acids | Any string is accepted | GPSNDQEKR | ||||||||||
| -gapdist | Gap Separation Distance. Use higher gap penalty for gaps separated by less than n amino acids (for proteins only) | Integer from 0 to 100 | 4 | ||||||||||
| -endgaps | Use higher gap penalty also for gaps at ends | Boolean value Yes/No | No | ||||||||||
| -outorder | Order of sequences in output |
|
aligned | ||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||
| -usertree | Use user provided guide tree | Toggle value Yes/No | No | ||||||||||
SW:YAHK_ECOLI SW:ADH2_BACST SW:ADHC_MYCTU SW:ADH7_YEAST SW:CADH2_EUCGU SW:MTDH2_ARATH |
The multiple sequence alignment is written as a standard EMBOSS sequence
file. Note that by default the output format is fastA (with '-' gap
characters introduced). Note that if you wish to visualize the alignment
it can be profitable to request a format better suited for this purpose,
like MSF format.
By default the sequences are written according to the order
determined by the "guide tree", that is, the most similar sequences are
written adjacent to each other. You can request that the sequences are
instead written in the same order as in the input file with
-outorder=input.
The "guide tree" is written in "nested parentheses" format. This format is taken as input by a lot of software, like NJplot, TreeView or the PHYLIP programs drawgram and drawtree. Note however that you should not use this "guide tree" as phylogenetic tree.
!!AA_MULTIPLE_ALIGNMENT 1.0
ADH.list MSF: 371 Type: P 04/03/05 CompCheck: 357 ..
Name: YAHK_ECOLI Len: 371 Check: 5277 Weight: 15.70
Name: ADHC_MYCTU Len: 371 Check: 819 Weight: 14.60
Name: CADH2_EUCGU Len: 371 Check: 5012 Weight: 16.20
Name: MTDH2_ARATH Len: 371 Check: 2155 Weight: 14.60
Name: ADH7_YEAST Len: 371 Check: 8445 Weight: 19.60
Name: ADH2_BACST Len: 371 Check: 8649 Weight: 19.10
//
1 50
YAHK_ECOLI ~~~~~~~MKIKAVGAYSAKQPLEPMDITRREPGPNDVKIEIAYCGVCHSD
ADHC_MYCTU ~~~~~~MSTVAAYAAMSATEPLTKTTITRRDPGPHDVAIDIKFAGICHSD
CADH2_EUCGU MGSLEKERTTTGWAARDPSGVLSPYTYSLRNTGPEDLYIKVLSCGVCHSD
MTDH2_ARATH MGKVLQ.KEAFGLAAKDNSGVLSPFSFTRRETGEKDVRFKVLFCGICHSD
ADH7_YEAST ~MLYPEKFQGIGISNAKDWKHPKLVSFDPKPFGDHDVDVEIEACGICGSD
ADH2_BACST ~~~~~~~~~MKAAVVNEFKKALEIKEVERPKLEEGEVLVKIEACGVCHTD
51 100
YAHK_ECOLI LHQVRSEWAGT.VYPCVPGHEIVGRVVAVGDQVEK.YAPGDLVGVGCIVD
ADHC_MYCTU IHTVKAEWGQP.NYPVVPGHEIAGVVTAVGSEVTK.YRQGDRVGVGCFVD
CADH2_EUCGU IHQIKNDLGMS.HYPMVPGHEVVGEVLEVGSEVTK.YRVGDRVGTGIVVG
MTDH2_ARATH LHMVKNEWGMS.TYPLVPGHEIVGVVTEVGAKVTK.FKTGEKVGVGCLVS
ADH7_YEAST FHIAVGNWGPV.PENQILGHEIIGRVVKVGSKCHTGVKIGDRVGVGAQAL
ADH2_BACST LHAAHGDWPIKPKLPLIPGHEGVGIVVEVAKGVKS.IKVGDRVGIPWLYS
101 150
YAHK_ECOLI SCKHCEECEDGLENYCDHMTG.TYNSPTPDEPGHTLGGYSQQIVVHERYV
ADHC_MYCTU SCRECNSCTRGIEQYCKPGANFTYNSIGKDGQ.PTQGGYSEAIVVDENYV
CADH2_EUCGU CCRSCSPCNSDQEQYCNKKIW.NYNDVYTDGK.PTQGGFAGEIVVGERFV
MTDH2_ARATH SCGSCDSCTEGMENYCPKSIQ.TYGFPYYDNT.ITYGGYSDHMVCEEGFV
ADH7_YEAST ACFECERCKSDNEQYCTNDHVLTMWTPYKDGY.ISQGGFASHVRLHEHFA
ADH2_BACST ACGECEYCLTGQETLCPHQLN.........GGYSVDGGYAEYCKAPADYV
151 200
YAHK_ECOLI LRIRHPQEQLAAVAPLLCAGITTYSPLRHWQAG.PGKKVGVVGIGGLGHM
ADHC_MYCTU LRIPDVLP.LDVAAPLLCAGITLYSPLRHWNAG.ANTRVAIIGLGGLGHM
CADH2_EUCGU VKIPDGLE.SEQAAPLMCAGVTVYSPLVRFGLKQSGLRGGILGLGGVGHM
MTDH2_ARATH IRIPDNLP.LDAAAPLLCAGITVYSPMKYHGLDKPGMHIGVVGLGGLGHV
ADH7_YEAST IQIPENIP.SPLAAPLLCGGITVFSPLLRNGCG.PGKRVGIVGIGGIGHM
ADH2_BACST AKIPDNLD.PVEVAPILCAGVTTYKALKVSGAR.PGEWVAIYGIGGLGHI
201 250
YAHK_ECOLI GIKLAHAMGAHVVAFTTSEAKR.EAAKALGADEVVNSRNADEMAAHLK..
ADHC_MYCTU GVKLGAAMGADVTVLSQSLKKM.EDGLRLGAKSYYATADPDTFRKLRG..
CADH2_EUCGU GVKIAKAMGHHVTVISSSDKKRTEALEHLGADAYLVSSDENGMKEATD..
MTDH2_ARATH GVKFAKAMGTKVTVISTSEKKRDEAINRLGADAFLVSRDPKQIKDAMG..
ADH7_YEAST GILLAKAMGAEVYAFSRGHSKR.EDSMKLGADHYIAMLEDKGWTEQYSNA
ADH2_BACST ALQYAKAMGLNVVAVDISDEKS.KLAKDLGADIAINGLKEDPVKAIHDQV
251 300
YAHK_ECOLI .SFDFILNTVAAPHNLDDFTTLLKRDGTMTLVGAPATPHKSPEVFNLIMK
ADHC_MYCTU .GFDLILNTVSANLDLGQYLNLLDVDGTLVELGIPEHPMAVP.AFALALM
CADH2_EUCGU .SLDYIFDTIPVVHPLEPYLALLKLDGKLILTGVINAPLQFI.SPMVMLG
MTDH2_ARATH .TMDGIIDTVSATHSLLPLLGLLKHKGKLVMVGAPEKPLELP.VMPLIFE
ADH7_YEAST LDLLVVCSSSLSKVNFDSIVKIMKIGGSIVSIAAPEVNEKLV.LKPLGLM
ADH2_BACST GGVHAAISVAVNKKAFEQAYQSVKRGGTLVVVGLPNADLPIP.IFDTVLN
301 350
YAHK_ECOLI RRAIAGSMIGGIPETQEMLDFCAEHGIVADIEMIRADQ..INEAYERMLR
ADHC_MYCTU RRSLAGSNIGGIAETQEMLNFCAEHGVTPEIELIEPDY..INDAYERVLA
CADH2_EUCGU RKSITGSFIGSMKETEEMLEFCKEKGLTSQIEVIKMDY..VNTALERLEK
MTDH2_ARATH RKMVMGSMIGGIKETQEMIDMAGKHNITADIELISADY..VNTAMERLEK
ADH7_YEAST GVSISSSAIGSRKEIEQLLKLVSEKNVKIWVEKLPISEEGVSHAFTRMES
ADH2_BACST GVSVKGSIVGTRKDMQEALDFAARGKVRPIVETAELEE..INEVFERMEK
351 371
YAHK_ECOLI GDVKYRFVIDNRTLTD~~~~~
ADHC_MYCTU SDVRYRFVIDISAL~~~~~~~
CADH2_EUCGU NDVRYRFVVDVVGSKLD~~~~
MTDH2_ARATH ADVRYRFVIDVANTLKPNPNL
ADH7_YEAST GDVKYRFTLVDYDKKFHK~~~
ADH2_BACST GKINGRIVLKLKED~~~~~~~
|
( ( ( YAHK_ECOLI:0.25360, ADHC_MYCTU:0.23773) :0.02527, ( ADH2_BACST:0.35494, ADH7_YEAST:0.35302) :0.05239) :0.02435, CADH2_EUCGU:0.26941, MTDH2_ARATH:0.23340); |
# Matrix made by matblas from blosum62.iij # * column uses minimum score # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 62 # Entropy = 0.6979, Expected = -0.5209 A R N D C Q E G H I L K M F P S T W Y V B Z X * A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 -4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 -4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 -4 B -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 -4 Z -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 -4 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 -4 * -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 1 |
CLUSTAL_SERIES MATRIX 81 100 blosum80 MATRIX 61 80 blosum62 MATRIX 31 60 blosum45 MATRIX 0 30 blosum30 |
You should not use the "guide tree" as a phylogenetic tree. Give the alignment produced by clustal to a phylogeny program, e.g. clustalnj or the programs of the PHYLIP package.
No two sequences should have the same name. Only the first 30 characters of the sequence name are used. Therefore no name should be longer than 30 characters, or at least the first 30 characters should be different.
| Program name | Description |
|---|---|
| edialign | Local multiple alignment of sequences |
| infoalign | Information on a multiple sequence alignment |
| mkdom | Local multiple alignment of proteins, makes file for XDOM |
| mse | Multiple Sequence Editor |
| muscle | Multiple alignment of sequences by global optimization |
| plotcon | Plot quality of conservation of a sequence alignment |
| prettyplot | Displays aligned sequences, with colouring and boxing |
| showalign | Displays a multiple sequence alignment |
| tranalign | Align nucleic coding regions given the aligned proteins |
| clustalnj | Neighbor-Joining phylogenetic tree from multiple alignment |
The program clustalw itself was written by :
Julie Thompson (Thompson@EMBL-Heidelberg.DE)
Toby Gibson (Gibson@EMBL-Heidelberg.DE)
European Molecular Biology Laboratory, Meyerhofstrasse 1, D 69117 Heidelberg, Germany
Des Higgins (Higgins@ucc.ie)
University of County Cork, Cork, Ireland