fasts

 

Function

Protein identification from peptides using fastA algorithm

Description

fasts is an EMBOSS "wrapper" program for a number of programs from Pearson's fastA package version 3.
fasts/tfasts
Compares set of (presumably non contiguous) short peptide fragments, as would be obtained from mass-spec. analysis of a protein, against a protein (fasts) or DNA (tfasts) database.
fastf/tfastf
Compares an ordered peptide mixture, as would be obtained by Edman degredation of a CNBr cleavage of a protein, against a protein (fastf) or DNA (tfastf) database.

Algorithm

The programs under the "wrapper" fasts use grossly the same algorithm as the programs under the "wrapper" fasta. A few differences are :

Usage

Here is a sample session with fasts

> fasts
Protein identification from peptides using fastA algorithm
         1 : fasts (peptides against prot db)
         2 : tfasts (peptides against nuc db translated)
         3 : fastf (mixed peptide against prot db)
         4 : tfastf (mixed peptide against nuc db translated)
Select type of search you want to run [1]:
Peptides file: mgstm1.pep
         1 : standard set
         2 : user defined set
         3 : user provided fastA databank
Select search set type [1]:
        sw : SwissProt (highly annotated protein databank)
        up : UniProt (SwissProt + TrEMBL, EMBL ORF translations)
  uniref100 : UniRef100 (UniProt nonredundant subset)
  uniref90 : UniRef90 (UniRef100 subset with no more than 90% identity)
  uniref50 : UniRef50 (UniRef100 subset with no more than 50% identity)
      remt : REM-TrEMBL (old EMBL ORF translations not incl. in UniProt)
       pir : PIR (old general protein databank)
        gp : GenPept (GenBank ORF translations)
   refseqp : RefSeq (NCBI reference protein sequences)
       pdb : PDB (proteins with known 3D structure)
    gpcrdb : G protein coupled receptors
Standard protein search set [up]: sw
E() value cutoff [5.0]:
Output file [mgstm1.fasts]:

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
   -program            menu       [1] Search type : separate or mixed
                                  peptides, prot. or nuc. db (Values: 1 (fasts
                                  (peptides against prot db)); 2 (tfasts
                                  (peptides against nuc db translated)); 3
                                  (fastf (mixed peptide against prot db)); 4
                                  (tfastf (mixed peptide against nuc db
                                  translated)))
  [-peps]              infile     Peptides file. See on-line manual for
                                  format.
   -dbtype             menu       [1] Search set type : public databank or
                                  databank provided by user (Values: 1
                                  (standard set); 2 (user defined set); 3
                                  (user provided fastA databank))
*  -nucdb              menu       [emblnontags] Standard nucleic acid search
                                  set (Values: em (EMBL (general nucleic acid
                                  databank)); emblnontags (EMBL without EST
                                  and GSS); hum (EMBL humans); mus (EMBL
                                  mice); rod (EMBL other rodents); mam (EMBL
                                  other mammals); vrt (EMBL other
                                  vertebrates); inv (EMBL invertebrates); pln
                                  (EMBL plants); fun (EMBL fungi); pro (EMBL
                                  bacteria); phg (EMBL bacteriophages); vrl
                                  (EMBL other viruses); est (EMBL Expressed
                                  Sequence Tags); gss (EMBL Genome Survey
                                  Sequences); sts (EMBL Sequence Tagged
                                  Sites); htg (EMBL High Throughput Genomic);
                                  htc (EMBL High Throughput cDNA); env (EMBL
                                  environmental samples); pat (EMBL patents);
                                  tgn (EMBL transgenic); syn (EMBL synthetic);
                                  unc (EMBL unclassified); new (EMBL updates
                                  since last release); wgs (EMBL Whole Genome
                                  Shotgun); refseq (RefSeq (NCBI reference
                                  sequences)); refseqwgs (RefSeq Whole Genome
                                  Shotgun); refseqgen (RefSeq other genomic);
                                  refseqrna (RefSeq transcripts); vec
                                  (Intelligenetics vector databank); emvec
                                  (EMBL vector subset); epd (Eukaryotic
                                  Promoter Database); ligm (ImMunoGeneTics
                                  databank Igg. + TcR genes); hla
                                  (ImMunoGeneTics databank human MHC genes);
                                  pdbn (PDB (nucleic acids with known 3D
                                  structure)))
*  -protdb             menu       [up] Standard protein search set (Values: sw
                                  (SwissProt (highly annotated protein
                                  databank)); up (UniProt (SwissProt + TrEMBL,
                                  EMBL ORF translations)); uniref100
                                  (UniRef100 (UniProt nonredundant subset));
                                  uniref90 (UniRef90 (UniRef100 subset with no
                                  more than 90% identity)); uniref50
                                  (UniRef50 (UniRef100 subset with no more
                                  than 50% identity)); remt (REM-TrEMBL (old
                                  EMBL ORF translations not incl. in
                                  UniProt)); pir (PIR (old general protein
                                  databank)); gp (GenPept (GenBank ORF
                                  translations)); refseqp (RefSeq (NCBI
                                  reference protein sequences)); pdb (PDB
                                  (proteins with known 3D structure)); gpcrdb
                                  (G protein coupled receptors))
*  -userdb             seqall     User defined search set
*  -userfastadb        infile     User provided fastA format databank (you can
                                  make one using seqret)
   -expect             float      [5.0 for fasts or fastf, 2.0 for tfasts or
                                  tfastf] E() value = number of databank
                                  sequences with same or higher Z-score that
                                  you expect to find by chance. fastA lists
                                  sequences with an E() value lower than the
                                  cutoff. (Number 0.000 or more)
  [-outfile]           outfile    [*.fasts] Output file name

   Additional (Optional) qualifiers (* if not always prompted):
*  -[no]reverse        boolean    [Y] Search also complementary strand (is
                                  default). If you switch this off fasts will
                                  search only the forward strand of the search
                                  set sequences.
   -matrix             menu       [M20 for fasts or fastf, M10 for tfasts or
                                  tfastf] Amino acid comparison matrix
                                  (Values: BL50 (BLOSUM50); BL62 (BLOSUM62);
                                  BL80 (BLOSUM80); P120 (PAM120); P250
                                  (PAM250); M10 (Jones, Taylor, Thornton
                                  PAM10); M20 (Jones, Taylor, Thornton PAM20);
                                  M40 (Jones, Taylor, Thornton PAM40); VT160
                                  (Vingron resolvent PAM160); OPT5 (OPTIMA 5))
*  -gencode            menu       [1] Genetic code for translating sequences
                                  (Values: 1 (Standard); 2 (Vertebrate
                                  Mitochondrial); 3 (Yeast Mitochondrial); 4
                                  (Mold, Protozoan, Coelenterate Mitochondrial
                                  and Mycoplasma/Spiroplasma); 5
                                  (Invertebrate Mitochondrial); 6 (Ciliate,
                                  Dasycladacean and Hexamita); 9
                                  (Echinodermate Mitochondrial); 10
                                  (Euplotid); 11 (Eubacterial); 12
                                  (Alternative Yeast); 13 (Ascidian
                                  Mitochondrial); 14 (Flatworm Mitochondrial);
                                  15 (Blepharisma); 16 (Chlorophycean
                                  Mitochondrial); 21 (Trematode
                                  Mitochondrial); 22 (Scenedesmus obliquus
                                  Mitochondrial); 23 (Thraustochytrium
                                  Mitochondrial))
   -format             menu       [0] Alignment format (Values: 0 (standard);
                                  1 (x = conservative replacements, X =
                                  non-conservative substitutions); 2 (show
                                  only residues in sequence 2 that differ from
                                  sequence 1); 9 (long format best scores
                                  report); 10 (write alignments in parsable
                                  format))

   Advanced (Unprompted) qualifiers:
   -zscore             boolean    write Z-score instead of bit score in list
   -[no]histogram      boolean    [Y] Show histogram (is default)
   -listsize           integer    [0] Show only the n best scoring sequences
                                  that satisfy E() cutoff (Integer 0 or more)
   -align              integer    [0] Show only alignments for the n first
                                  sequences (Integer 0 or more)
   -linesize           integer    [60] Number of residues per line of the
                                  alignment (Integer from 10 to 200)

   Associated qualifiers:

   "-userdb" associated qualifiers
   -sbegin             integer    Start of each sequence to be used
   -send               integer    End of each sequence to be used
   -sreverse           boolean    Reverse (if DNA)
   -sask               boolean    Ask for begin/end/reverse
   -snucleotide        boolean    Sequence is nucleotide
   -sprotein           boolean    Sequence is protein
   -slower             boolean    Make lower case
   -supper             boolean    Make upper case
   -sformat            string     Input sequence format
   -sdbname            string     Database name
   -sid                string     Entryname
   -ufo                string     UFO features
   -fformat            string     Features format
   -fopenfile          string     Features file name

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
-program Search type : separate or mixed peptides, prot. or nuc. db
1 (fasts (peptides against prot db))
2 (tfasts (peptides against nuc db translated))
3 (fastf (mixed peptide against prot db))
4 (tfastf (mixed peptide against nuc db translated))
1
[-peps]
(Parameter 1)
Peptides file. See on-line manual for format. Input file Required
-dbtype Search set type : public databank or databank provided by user
1 (standard set)
2 (user defined set)
3 (user provided fastA databank)
1
-nucdb Standard nucleic acid search set
em (EMBL (general nucleic acid databank))
emblnontags (EMBL without EST and GSS)
hum (EMBL humans)
mus (EMBL mice)
rod (EMBL other rodents)
mam (EMBL other mammals)
vrt (EMBL other vertebrates)
inv (EMBL invertebrates)
pln (EMBL plants)
fun (EMBL fungi)
pro (EMBL bacteria)
phg (EMBL bacteriophages)
vrl (EMBL other viruses)
est (EMBL Expressed Sequence Tags)
gss (EMBL Genome Survey Sequences)
sts (EMBL Sequence Tagged Sites)
htg (EMBL High Throughput Genomic)
htc (EMBL High Throughput cDNA)
env (EMBL environmental samples)
pat (EMBL patents)
tgn (EMBL transgenic)
syn (EMBL synthetic)
unc (EMBL unclassified)
new (EMBL updates since last release)
wgs (EMBL Whole Genome Shotgun)
refseq (RefSeq (NCBI reference sequences))
refseqwgs (RefSeq Whole Genome Shotgun)
refseqgen (RefSeq other genomic)
refseqrna (RefSeq transcripts)
vec (Intelligenetics vector databank)
emvec (EMBL vector subset)
epd (Eukaryotic Promoter Database)
ligm (ImMunoGeneTics databank Igg. + TcR genes)
hla (ImMunoGeneTics databank human MHC genes)
pdbn (PDB (nucleic acids with known 3D structure))
emblnontags
-protdb Standard protein search set
sw (SwissProt (highly annotated protein databank))
up (UniProt (SwissProt + TrEMBL, EMBL ORF translations))
uniref100 (UniRef100 (UniProt nonredundant subset))
uniref90 (UniRef90 (UniRef100 subset with no more than 90% identity))
uniref50 (UniRef50 (UniRef100 subset with no more than 50% identity))
remt (REM-TrEMBL (old EMBL ORF translations not incl. in UniProt))
pir (PIR (old general protein databank))
gp (GenPept (GenBank ORF translations))
refseqp (RefSeq (NCBI reference protein sequences))
pdb (PDB (proteins with known 3D structure))
gpcrdb (G protein coupled receptors)
up
-userdb User defined search set Readable sequence(s) Required
-userfastadb User provided fastA format databank (you can make one using seqret) Input file Required
-expect E() value = number of databank sequences with same or higher Z-score that you expect to find by chance. fastA lists sequences with an E() value lower than the cutoff. Number 0.000 or more 5.0 for fasts or fastf, 2.0 for tfasts or tfastf
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.<program>
Additional (Optional) qualifiers Allowed values Default
-[no]reverse Search also complementary strand (is default). If you switch this off fasts will search only the forward strand of the search set sequences. Boolean value Yes/No Yes
-matrix Amino acid comparison matrix
BL50 (BLOSUM50)
BL62 (BLOSUM62)
BL80 (BLOSUM80)
P120 (PAM120)
P250 (PAM250)
M10 (Jones, Taylor, Thornton PAM10)
M20 (Jones, Taylor, Thornton PAM20)
M40 (Jones, Taylor, Thornton PAM40)
VT160 (Vingron resolvent PAM160)
OPT5 (OPTIMA 5)
M20 for fasts or fastf, M10 for tfasts or tfastf
-gencode Genetic code for translating sequences
1 (Standard)
2 (Vertebrate Mitochondrial)
3 (Yeast Mitochondrial)
4 (Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma)
5 (Invertebrate Mitochondrial)
6 (Ciliate, Dasycladacean and Hexamita)
9 (Echinodermate Mitochondrial)
10 (Euplotid)
11 (Eubacterial)
12 (Alternative Yeast)
13 (Ascidian Mitochondrial)
14 (Flatworm Mitochondrial)
15 (Blepharisma)
16 (Chlorophycean Mitochondrial)
21 (Trematode Mitochondrial)
22 (Scenedesmus obliquus Mitochondrial)
23 (Thraustochytrium Mitochondrial)
1
-format Alignment format
0 (standard)
1 (x = conservative replacements, X = non-conservative substitutions)
2 (show only residues in sequence 2 that differ from sequence 1)
9 (long format best scores report)
10 (write alignments in parsable format)
0
Advanced (Unprompted) qualifiers Allowed values Default
-zscore write Z-score instead of bit score in list Boolean value Yes/No No
-[no]histogram Show histogram (is default) Boolean value Yes/No Yes
-listsize Show only the n best scoring sequences that satisfy E() cutoff Integer 0 or more 0
-align Show only alignments for the n first sequences Integer 0 or more 0
-linesize Number of residues per line of the alignment Integer from 10 to 200 60

Input file format

fasts searches a set of peptides againts a search sequence set. Unlike the traditional fastA search, which uses a protein or DNA sequence, fasts and tfasts work with an input file of the form:

> fragments from mgstm1
MLLE,
MILGYW,
MGADP,
MLCYNP

The file contains sequences or partial sequences of peptides. Note the comment line preceded by a ">". The commas "," are required to indicate the number of fragments in the mixture, but there should be no comma after the last residue.

fastf and tfasf work with an input file of the form:

> N_terminal sequence from mgstm1
MGCEN,
MIDYP,
MLLAY,
MLLGY

This indicates that a mixture of four peptides has been found, with "M" in the first position of each one, in the second position "G", "I", or "L" (twice), at the third position "C", "D", or "L" (twice), etc. Note that the sequences are required to have the same length.

The current version of the fastA package allows for a set of up to 50 peptides.

You can select your search set in three different ways :

  1. a standard set is a copy of a public databank or a local databank installed by the managers of the server computer and "visible" to all users. You must choose from a selector.
  2. a user defined set is a set of sequences (public and/or private) that you can specify using a normal sequence USA. It is convenient to use a List File. The set will be transformed on-the-fly into a temporary fastA format databank.
  3. a user provided fastA databank is a databank in fastA format (usually private), that you can specify by the name of the file.
Note that if you want search a selected set of sequences taken from the public databanks and/or want to search a set of your own private sequences you can choose between options 2 and 3. If your set is small or if you want to search it just once, the "a user defined set" is good, otherwise, to save time, it is recommended to make a databank in fastA format using the program seqret and choose "a user provided fastA databank".

Output file format

Output files for usage example

File: mgstm1.fasts

# /opt/sw/fasta/bin/fasts -q -L -T 2 -l /opt/sw/fasta/fastlibs -E 5 -s M20 -m 0 -w 60 mgstm1.pep %+sw
FASTS compares linked peptides to a protein data bank
 version 35.04 Oct. 7, 2008
Please cite:
 Mackey et al. Mol. Cell. Proteomics  (2002) 1:139-147

Query: mgstm1.pep
  1>>> fragments from mgstm1 - 24 aa
Library: SwissProt (highly annotated protein databank) 159870284 residues in 424932 sequences

159870284 residues in 424932 sequences
Statistics: scaled Tatusov statistics (55900): tat_a: 1.0320 tat_b: 2.5239 tat_c: -1.3342
Algorithm: FASTS (4.32 Feb 2007)
Parameters: MD20 matrix (18:-29) ktup=1
 Scan time: 13.750

The best scores are:                             initn init1 bits E(424932) sn  sl
sw:GSTM1_MOUSE RecName: Full=Glutathione  ( 218)  229  229 58.4 5.8e-09  4  21
sw:GSTMU_CRILO RecName: Full=Glutathione  ( 218)  212  212 52.9 2.6e-07  4  21
sw:GSTM1_RAT RecName: Full=Glutathione S- ( 218)  212  212 52.9 2.6e-07  4  21
sw:GSTM1_HUMAN RecName: Full=Glutathione  ( 218)  198  198 48.7 4.8e-06  4  21
sw:GSTM5_MOUSE RecName: Full=Glutathione  ( 224)  156  156 42.1 0.00047  3  16
sw:GSTM5_RAT RecName: Full=Glutathione S- ( 225)  156  156 42.1 0.00048  3  16
sw:GSTMU_CAVPO RecName: Full=Glutathione  ( 217)  153  153 40.9  0.0011  3  17
sw:GSTM4_RAT RecName: Full=Glutathione S- ( 218)  170  170 40.8  0.0012  4  21
sw:GSTM2_PONAB RecName: Full=Glutathione  ( 218)  170  170 40.8  0.0012  4  21
sw:GSTM2_MOUSE RecName: Full=Glutathione  ( 218)  167  167 40.0  0.0021  4  21
sw:GSTM2_HUMAN RecName: Full=Glutathione  ( 218)  158  158 37.5   0.011  4  21
sw:GSTMU_RABIT RecName: Full=Glutathione  ( 218)  157  157 37.3   0.013  4  21
sw:GSTM2_MACFA RecName: Full=Glutathione  ( 218)  156  156 37.0   0.016  4  21
sw:GSTM2_MACFU RecName: Full=Glutathione  ( 218)  156  156 37.0   0.016  4  21
sw:GSTM2_RAT RecName: Full=Glutathione S- ( 218)  138  138 36.9   0.018  3  17
sw:GSTM1_BOVIN RecName: Full=Glutathione  ( 218)  136  136 36.6   0.021  3  16
sw:GSTM5_HUMAN RecName: Full=Glutathione  ( 218)  134  134 36.1   0.031  3  16
sw:GSTMU_MESAU RecName: Full=Glutathione  ( 218)  150  150 35.4   0.048  4  21
sw:GSTM6_MOUSE RecName: Full=Glutathione  ( 218)  148  148 34.9   0.069  4  21
upv:Q03013-2 GSTM4_HUMAN Isoform 2 of Glu ( 195)  141  141 33.6    0.15  4  21
sw:GSTM3_HUMAN RecName: Full=Glutathione  ( 225)  125  125 33.6    0.18  3  16
sw:GSTM3_MACFU RecName: Full=Glutathione  ( 225)  125  125 33.6    0.18  3  16
sw:GSTM7_MOUSE RecName: Full=Glutathione  ( 218)  124  124 33.3    0.22  3  17
sw:GSTM1_MACFA RecName: Full=Glutathione  ( 218)  141  141 33.1    0.24  4  21
sw:GSTM4_HUMAN RecName: Full=Glutathione  ( 218)  141  141 33.1    0.24  4  21
sw:GSTM2_CHICK RecName: Full=Glutathione  ( 220)  117  117 31.5    0.72  3  16
upv:Q5TIE3-5 YA019_HUMAN Isoform 5 of Put (1189)  113  113 33.2     1.2  2  11
upv:Q5TIE3-2 YA019_HUMAN Isoform 2 of Put (1214)  113  113 33.2     1.3  2  11
sw:YA019_HUMAN RecName: Full=Putative VWF (1220)  113  113 33.2     1.3  2  11
sw:CC2H2_TRYBB RecName: Full=Cell divisio ( 345)  100  100 31.0     1.6  2  12
sw:GSTM4_MOUSE RecName: Full=Glutathione  ( 218)  115  115 29.9     2.3  3  15
sw:RPAB5_SCHPO RecName: Full=DNA-directed (  71)   77   77 27.8     3.2  2  12

>>sw:GSTM1_MOUSE RecName: Full=Glutathione S-transferase Mu 1; EC=2.5.1.18; AltName: Full=GST class-mu 1; AltName: Full=Glutathione S-transferase GT8.7; AltName: Full=pmGT10; AltName: Full=GST 1-1; (218 aa)
 initn: 229 init1:  81 opt: 229  Z-score: 46.1  bits: 58.4 E(): 5.8e-09
global/local score: 229; 90.5% identity (90.5% similar) in 21 aa overlap (1-21:3-118)

                           10                                      
         MILGYW----------MLLE------------MGADP---------------------
         ::::::          ::::            ::  :                     
sw:GST MPMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKLGLDFPNL
               10        20        30        40        50        60

                                                              20   
       ----------------------------------------------------MLCYNP  
                                                           ::::::  
sw:GST PYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVENQVMDTRMQLIMLCYNPDF
               70        80        90       100       110       120

sw:GST EKQKPEFLKTIPEKMKLYSEFLGKRPWFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPN
              130       140       150       160       170       180

>>sw:GSTMU_CRILO RecName: Full=Glutathione S-transferase Y1; EC=2.5.1.18; AltName: Full=Chain 3; AltName: Full=GST class-mu; (218 aa)
 initn: 212 init1:  81 opt: 212  Z-score: 40.6  bits: 52.9 E(): 2.6e-07
global/local score: 212; 85.7% identity (85.7% similar) in 21 aa overlap (1-21:3-118)

                           10                                      
         MILGYW----------MLLE------------MGADP---------------------
         ::::::           :::            ::  :                     
sw:GST MPMILGYWNVRGLTNPIRLLLEYTDSSYEEKKYTMGDAPDSDRSQWLNEKFKLGLDFPNL
               10        20        30        40        50        60

                                                              20   
       ----------------------------------------------------MLCYNP  
                                                           ::::::  
sw:GST PYLIDGSHKITQSNAILRYIARKHNLCGETEEERIRVDIVENQAMDTRMQLIMLCYNPDF
               70        80        90       100       110       120

sw:GST EKQKPEFLKTIPEKMKMYSEFLGKRPWFAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPN
              130       140       150       160       170       180


  [Part of this file has been deleted for brevity]


>>sw:RPAB5_SCHPO RecName: Full=DNA-directed RNA polymerases I, II, and III subunit RPABC5; Short=RNA polymerases I, II, and III subunit ABC5; AltName: Full=DNA-directed RNA polymerases I, II, and III 8.3 kDa polypeptide; AltName: Full=ABC10-beta; AltName: Full=RPC8; (71 aa)
 initn:  77 init1:  64 opt:  77  Z-score: 17.0  bits: 27.8 E():  3.2
global/local score: 77; 62.5% identity (68.8% similar) in 16 aa overlap (1-16:23-64)

                                                          10       
                             MLLE---------------------MILGYW-----ML
                              ::                      ::: .       :
sw:RPA MIIPIRCFSCGKVIGDKWDTYLTLLQEDNTEGEALDKLGLQRYCCRRMILTHVDLIEKLL
               10        20        30        40        50        60

                  
       CYNP       
       ::::       
sw:RPA CYNPLSKQKNL
               70 



24 residues in 1 query   sequences
159870284 residues in 424932 library sequences
 Tcomplib [35.04] (2 proc)
 start: Wed Oct 15 16:04:28 2008 done: Wed Oct 15 16:04:35 2008
 Total Scan time: 13.750 Total Display time:  0.010

Function used was FASTS [version 35.04 Oct. 7, 2008]

Data files

The amino acid comparison matrices used to compare proteins are hard coded in the program and cannot be changed.

Notes

None.

References

Mackey, A. J., Haystead, T. A. and Pearson, W.R. (2002). Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences. Mol. Cell. Proteomics 1(2), 139-147.

see also references for fasta.

Warnings

None.

Diagnostic Error Messages

If a "User provided fastA format databank" that is supposed to be a nucleic acid databank contains characters only allowed in protein sequences, the program exits with error message :
  <databank> contains non-nucleic acid characters ! The first one occurs in sequence <sequence name>

Exit status

It exits prematurely with status 255 and an error message if a "User provided fastA format databank" that is supposed to be a nucleic acid databank contains characters only allowed in protein sequences, otherwise it exits with status 0.

Known bugs

None.

See also

Program nameDescription
backtranambig Back translate a protein sequence to ambiguous codons
backtranseq Back translate a protein sequence
blast BLAST search of query sequence(s) against sequence search set
charge Protein charge plot
checktrans Reports STOP codons and ORF statistics of a protein
compseq Count composition of dimer/trimer/etc words in a sequence
ebi_blast WU-BLAST search of query sequence against sequence databank using EBI Web Services
ebi_fasta fastA search of query sequence against sequence databank using EBI Web Services
emowse Protein identification by mass spectrometry
fasta fastA search of query sequence(s) against sequence search set
freak Residue/base frequency table or plot
iep Calculates the isoelectric point of a protein
mwcontam Shows molwts that match across a set of files
mwfilter Filter noisy molwts from mass spec output
octanol Displays protein hydropathy
pepinfo Plots simple amino acid properties in parallel
pepstats Protein statistics
pepwindow Displays protein hydropathy
pepwindowall Displays protein hydropathy of a set of sequences
phiblast Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match
psiblast Iterative BLAST search with generation of profile of protein sequence against protein sequence set
lfasta Finds local alignments between two sequences, using fastA

Author(s)

The wrapper application fasts was written by Guy Bottu (gbottu@vub.ac.be)
BEN, ULB, Brussels, Belgium

The programs fasts,... themselves were written by
William R. Pearson
Department of Biochemistry
Box 440, Jordan Hall
U. of Virginia
Charlottesville, VA

wrp@virginia.EDU

History

Completed 28 August 2002
Modified 19 March 2003 - adapted to fastA version 3.4t21
Modified 20 February 2004 - adapted to fastA version 3.4t23 and added option to search only forward strand
Modified 19 September 2007 - adapted to fastA version 34.26.4
Modified 24 April 2008 - adapted to fastA version 35.1.6
Modified 15 October 2008 - adapted to fastA version 35.4.1 and added complete description lines in output

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.