|
|
fasts |
> fasts
Protein identification from peptides using fastA algorithm
1 : fasts (peptides against prot db)
2 : tfasts (peptides against nuc db translated)
3 : fastf (mixed peptide against prot db)
4 : tfastf (mixed peptide against nuc db translated)
Select type of search you want to run [1]:
Peptides file: mgstm1.pep
1 : standard set
2 : user defined set
3 : user provided fastA databank
Select search set type [1]:
sw : SwissProt (highly annotated protein databank)
up : UniProt (SwissProt + TrEMBL, EMBL ORF translations)
uniref100 : UniRef100 (UniProt nonredundant subset)
uniref90 : UniRef90 (UniRef100 subset with no more than 90% identity)
uniref50 : UniRef50 (UniRef100 subset with no more than 50% identity)
remt : REM-TrEMBL (old EMBL ORF translations not incl. in UniProt)
pir : PIR (old general protein databank)
gp : GenPept (GenBank ORF translations)
refseqp : RefSeq (NCBI reference protein sequences)
pdb : PDB (proteins with known 3D structure)
gpcrdb : G protein coupled receptors
Standard protein search set [up]: sw
E() value cutoff [5.0]:
Output file [mgstm1.fasts]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
-program menu [1] Search type : separate or mixed
peptides, prot. or nuc. db (Values: 1 (fasts
(peptides against prot db)); 2 (tfasts
(peptides against nuc db translated)); 3
(fastf (mixed peptide against prot db)); 4
(tfastf (mixed peptide against nuc db
translated)))
[-peps] infile Peptides file. See on-line manual for
format.
-dbtype menu [1] Search set type : public databank or
databank provided by user (Values: 1
(standard set); 2 (user defined set); 3
(user provided fastA databank))
* -nucdb menu [emblnontags] Standard nucleic acid search
set (Values: em (EMBL (general nucleic acid
databank)); emblnontags (EMBL without EST
and GSS); hum (EMBL humans); mus (EMBL
mice); rod (EMBL other rodents); mam (EMBL
other mammals); vrt (EMBL other
vertebrates); inv (EMBL invertebrates); pln
(EMBL plants); fun (EMBL fungi); pro (EMBL
bacteria); phg (EMBL bacteriophages); vrl
(EMBL other viruses); est (EMBL Expressed
Sequence Tags); gss (EMBL Genome Survey
Sequences); sts (EMBL Sequence Tagged
Sites); htg (EMBL High Throughput Genomic);
htc (EMBL High Throughput cDNA); env (EMBL
environmental samples); pat (EMBL patents);
tgn (EMBL transgenic); syn (EMBL synthetic);
unc (EMBL unclassified); new (EMBL updates
since last release); wgs (EMBL Whole Genome
Shotgun); refseq (RefSeq (NCBI reference
sequences)); refseqwgs (RefSeq Whole Genome
Shotgun); refseqgen (RefSeq other genomic);
refseqrna (RefSeq transcripts); vec
(Intelligenetics vector databank); emvec
(EMBL vector subset); epd (Eukaryotic
Promoter Database); ligm (ImMunoGeneTics
databank Igg. + TcR genes); hla
(ImMunoGeneTics databank human MHC genes);
pdbn (PDB (nucleic acids with known 3D
structure)))
* -protdb menu [up] Standard protein search set (Values: sw
(SwissProt (highly annotated protein
databank)); up (UniProt (SwissProt + TrEMBL,
EMBL ORF translations)); uniref100
(UniRef100 (UniProt nonredundant subset));
uniref90 (UniRef90 (UniRef100 subset with no
more than 90% identity)); uniref50
(UniRef50 (UniRef100 subset with no more
than 50% identity)); remt (REM-TrEMBL (old
EMBL ORF translations not incl. in
UniProt)); pir (PIR (old general protein
databank)); gp (GenPept (GenBank ORF
translations)); refseqp (RefSeq (NCBI
reference protein sequences)); pdb (PDB
(proteins with known 3D structure)); gpcrdb
(G protein coupled receptors))
* -userdb seqall User defined search set
* -userfastadb infile User provided fastA format databank (you can
make one using seqret)
-expect float [5.0 for fasts or fastf, 2.0 for tfasts or
tfastf] E() value = number of databank
sequences with same or higher Z-score that
you expect to find by chance. fastA lists
sequences with an E() value lower than the
cutoff. (Number 0.000 or more)
[-outfile] outfile [*.fasts] Output file name
Additional (Optional) qualifiers (* if not always prompted):
* -[no]reverse boolean [Y] Search also complementary strand (is
default). If you switch this off fasts will
search only the forward strand of the search
set sequences.
-matrix menu [M20 for fasts or fastf, M10 for tfasts or
tfastf] Amino acid comparison matrix
(Values: BL50 (BLOSUM50); BL62 (BLOSUM62);
BL80 (BLOSUM80); P120 (PAM120); P250
(PAM250); M10 (Jones, Taylor, Thornton
PAM10); M20 (Jones, Taylor, Thornton PAM20);
M40 (Jones, Taylor, Thornton PAM40); VT160
(Vingron resolvent PAM160); OPT5 (OPTIMA 5))
* -gencode menu [1] Genetic code for translating sequences
(Values: 1 (Standard); 2 (Vertebrate
Mitochondrial); 3 (Yeast Mitochondrial); 4
(Mold, Protozoan, Coelenterate Mitochondrial
and Mycoplasma/Spiroplasma); 5
(Invertebrate Mitochondrial); 6 (Ciliate,
Dasycladacean and Hexamita); 9
(Echinodermate Mitochondrial); 10
(Euplotid); 11 (Eubacterial); 12
(Alternative Yeast); 13 (Ascidian
Mitochondrial); 14 (Flatworm Mitochondrial);
15 (Blepharisma); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus
Mitochondrial); 23 (Thraustochytrium
Mitochondrial))
-format menu [0] Alignment format (Values: 0 (standard);
1 (x = conservative replacements, X =
non-conservative substitutions); 2 (show
only residues in sequence 2 that differ from
sequence 1); 9 (long format best scores
report); 10 (write alignments in parsable
format))
Advanced (Unprompted) qualifiers:
-zscore boolean write Z-score instead of bit score in list
-[no]histogram boolean [Y] Show histogram (is default)
-listsize integer [0] Show only the n best scoring sequences
that satisfy E() cutoff (Integer 0 or more)
-align integer [0] Show only alignments for the n first
sequences (Integer 0 or more)
-linesize integer [60] Number of residues per line of the
alignment (Integer from 10 to 200)
Associated qualifiers:
"-userdb" associated qualifiers
-sbegin integer Start of each sequence to be used
-send integer End of each sequence to be used
-sreverse boolean Reverse (if DNA)
-sask boolean Ask for begin/end/reverse
-snucleotide boolean Sequence is nucleotide
-sprotein boolean Sequence is protein
-slower boolean Make lower case
-supper boolean Make upper case
-sformat string Input sequence format
-sdbname string Database name
-sid string Entryname
-ufo string UFO features
-fformat string Features format
-fopenfile string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -program | Search type : separate or mixed peptides, prot. or nuc. db |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-peps] (Parameter 1) |
Peptides file. See on-line manual for format. | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbtype | Search set type : public databank or databank provided by user |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -nucdb | Standard nucleic acid search set |
|
emblnontags | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -protdb | Standard protein search set |
|
up | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userdb | User defined search set | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userfastadb | User provided fastA format databank (you can make one using seqret) | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -expect | E() value = number of databank sequences with same or higher Z-score that you expect to find by chance. fastA lists sequences with an E() value lower than the cutoff. | Number 0.000 or more | 5.0 for fasts or fastf, 2.0 for tfasts or tfastf | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.<program> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]reverse | Search also complementary strand (is default). If you switch this off fasts will search only the forward strand of the search set sequences. | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -matrix | Amino acid comparison matrix |
|
M20 for fasts or fastf, M10 for tfasts or tfastf | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gencode | Genetic code for translating sequences |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -format | Alignment format |
|
0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -zscore | write Z-score instead of bit score in list | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]histogram | Show histogram (is default) | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -listsize | Show only the n best scoring sequences that satisfy E() cutoff | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -align | Show only alignments for the n first sequences | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -linesize | Number of residues per line of the alignment | Integer from 10 to 200 | 60 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> fragments from mgstm1 MLLE, MILGYW, MGADP, MLCYNP |
The file contains sequences or partial sequences of peptides. Note the comment line preceded by a ">". The commas "," are required to indicate the number of fragments in the mixture, but there should be no comma after the last residue.
fastf and tfasf work with an input file of the form:
> N_terminal sequence from mgstm1 MGCEN, MIDYP, MLLAY, MLLGY |
This indicates that a mixture of four peptides has been found, with "M" in the first position of each one, in the second position "G", "I", or "L" (twice), at the third position "C", "D", or "L" (twice), etc. Note that the sequences are required to have the same length.
The current version of the fastA package allows for a set of up to 50 peptides.
You can select your search set in three different ways :
# /opt/sw/fasta/bin/fasts -q -L -T 2 -l /opt/sw/fasta/fastlibs -E 5 -s M20 -m 0 -w 60 mgstm1.pep %+sw
FASTS compares linked peptides to a protein data bank
version 35.04 Oct. 7, 2008
Please cite:
Mackey et al. Mol. Cell. Proteomics (2002) 1:139-147
Query: mgstm1.pep
1>>> fragments from mgstm1 - 24 aa
Library: SwissProt (highly annotated protein databank) 159870284 residues in 424932 sequences
159870284 residues in 424932 sequences
Statistics: scaled Tatusov statistics (55900): tat_a: 1.0320 tat_b: 2.5239 tat_c: -1.3342
Algorithm: FASTS (4.32 Feb 2007)
Parameters: MD20 matrix (18:-29) ktup=1
Scan time: 13.750
The best scores are: initn init1 bits E(424932) sn sl
sw:GSTM1_MOUSE RecName: Full=Glutathione ( 218) 229 229 58.4 5.8e-09 4 21
sw:GSTMU_CRILO RecName: Full=Glutathione ( 218) 212 212 52.9 2.6e-07 4 21
sw:GSTM1_RAT RecName: Full=Glutathione S- ( 218) 212 212 52.9 2.6e-07 4 21
sw:GSTM1_HUMAN RecName: Full=Glutathione ( 218) 198 198 48.7 4.8e-06 4 21
sw:GSTM5_MOUSE RecName: Full=Glutathione ( 224) 156 156 42.1 0.00047 3 16
sw:GSTM5_RAT RecName: Full=Glutathione S- ( 225) 156 156 42.1 0.00048 3 16
sw:GSTMU_CAVPO RecName: Full=Glutathione ( 217) 153 153 40.9 0.0011 3 17
sw:GSTM4_RAT RecName: Full=Glutathione S- ( 218) 170 170 40.8 0.0012 4 21
sw:GSTM2_PONAB RecName: Full=Glutathione ( 218) 170 170 40.8 0.0012 4 21
sw:GSTM2_MOUSE RecName: Full=Glutathione ( 218) 167 167 40.0 0.0021 4 21
sw:GSTM2_HUMAN RecName: Full=Glutathione ( 218) 158 158 37.5 0.011 4 21
sw:GSTMU_RABIT RecName: Full=Glutathione ( 218) 157 157 37.3 0.013 4 21
sw:GSTM2_MACFA RecName: Full=Glutathione ( 218) 156 156 37.0 0.016 4 21
sw:GSTM2_MACFU RecName: Full=Glutathione ( 218) 156 156 37.0 0.016 4 21
sw:GSTM2_RAT RecName: Full=Glutathione S- ( 218) 138 138 36.9 0.018 3 17
sw:GSTM1_BOVIN RecName: Full=Glutathione ( 218) 136 136 36.6 0.021 3 16
sw:GSTM5_HUMAN RecName: Full=Glutathione ( 218) 134 134 36.1 0.031 3 16
sw:GSTMU_MESAU RecName: Full=Glutathione ( 218) 150 150 35.4 0.048 4 21
sw:GSTM6_MOUSE RecName: Full=Glutathione ( 218) 148 148 34.9 0.069 4 21
upv:Q03013-2 GSTM4_HUMAN Isoform 2 of Glu ( 195) 141 141 33.6 0.15 4 21
sw:GSTM3_HUMAN RecName: Full=Glutathione ( 225) 125 125 33.6 0.18 3 16
sw:GSTM3_MACFU RecName: Full=Glutathione ( 225) 125 125 33.6 0.18 3 16
sw:GSTM7_MOUSE RecName: Full=Glutathione ( 218) 124 124 33.3 0.22 3 17
sw:GSTM1_MACFA RecName: Full=Glutathione ( 218) 141 141 33.1 0.24 4 21
sw:GSTM4_HUMAN RecName: Full=Glutathione ( 218) 141 141 33.1 0.24 4 21
sw:GSTM2_CHICK RecName: Full=Glutathione ( 220) 117 117 31.5 0.72 3 16
upv:Q5TIE3-5 YA019_HUMAN Isoform 5 of Put (1189) 113 113 33.2 1.2 2 11
upv:Q5TIE3-2 YA019_HUMAN Isoform 2 of Put (1214) 113 113 33.2 1.3 2 11
sw:YA019_HUMAN RecName: Full=Putative VWF (1220) 113 113 33.2 1.3 2 11
sw:CC2H2_TRYBB RecName: Full=Cell divisio ( 345) 100 100 31.0 1.6 2 12
sw:GSTM4_MOUSE RecName: Full=Glutathione ( 218) 115 115 29.9 2.3 3 15
sw:RPAB5_SCHPO RecName: Full=DNA-directed ( 71) 77 77 27.8 3.2 2 12
>>sw:GSTM1_MOUSE RecName: Full=Glutathione S-transferase Mu 1; EC=2.5.1.18; AltName: Full=GST class-mu 1; AltName: Full=Glutathione S-transferase GT8.7; AltName: Full=pmGT10; AltName: Full=GST 1-1; (218 aa)
initn: 229 init1: 81 opt: 229 Z-score: 46.1 bits: 58.4 E(): 5.8e-09
global/local score: 229; 90.5% identity (90.5% similar) in 21 aa overlap (1-21:3-118)
10
MILGYW----------MLLE------------MGADP---------------------
:::::: :::: :: :
sw:GST MPMILGYWNVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKLGLDFPNL
10 20 30 40 50 60
20
----------------------------------------------------MLCYNP
::::::
sw:GST PYLIDGSHKITQSNAILRYLARKHHLDGETEEERIRADIVENQVMDTRMQLIMLCYNPDF
70 80 90 100 110 120
sw:GST EKQKPEFLKTIPEKMKLYSEFLGKRPWFAGDKVTYVDFLAYDILDQYRMFEPKCLDAFPN
130 140 150 160 170 180
>>sw:GSTMU_CRILO RecName: Full=Glutathione S-transferase Y1; EC=2.5.1.18; AltName: Full=Chain 3; AltName: Full=GST class-mu; (218 aa)
initn: 212 init1: 81 opt: 212 Z-score: 40.6 bits: 52.9 E(): 2.6e-07
global/local score: 212; 85.7% identity (85.7% similar) in 21 aa overlap (1-21:3-118)
10
MILGYW----------MLLE------------MGADP---------------------
:::::: ::: :: :
sw:GST MPMILGYWNVRGLTNPIRLLLEYTDSSYEEKKYTMGDAPDSDRSQWLNEKFKLGLDFPNL
10 20 30 40 50 60
20
----------------------------------------------------MLCYNP
::::::
sw:GST PYLIDGSHKITQSNAILRYIARKHNLCGETEEERIRVDIVENQAMDTRMQLIMLCYNPDF
70 80 90 100 110 120
sw:GST EKQKPEFLKTIPEKMKMYSEFLGKRPWFAGDKVTLCGFLAYDVLDQYQMFEPKCLDPFPN
130 140 150 160 170 180
[Part of this file has been deleted for brevity]
>>sw:RPAB5_SCHPO RecName: Full=DNA-directed RNA polymerases I, II, and III subunit RPABC5; Short=RNA polymerases I, II, and III subunit ABC5; AltName: Full=DNA-directed RNA polymerases I, II, and III 8.3 kDa polypeptide; AltName: Full=ABC10-beta; AltName: Full=RPC8; (71 aa)
initn: 77 init1: 64 opt: 77 Z-score: 17.0 bits: 27.8 E(): 3.2
global/local score: 77; 62.5% identity (68.8% similar) in 16 aa overlap (1-16:23-64)
10
MLLE---------------------MILGYW-----ML
:: ::: . :
sw:RPA MIIPIRCFSCGKVIGDKWDTYLTLLQEDNTEGEALDKLGLQRYCCRRMILTHVDLIEKLL
10 20 30 40 50 60
CYNP
::::
sw:RPA CYNPLSKQKNL
70
24 residues in 1 query sequences
159870284 residues in 424932 library sequences
Tcomplib [35.04] (2 proc)
start: Wed Oct 15 16:04:28 2008 done: Wed Oct 15 16:04:35 2008
Total Scan time: 13.750 Total Display time: 0.010
Function used was FASTS [version 35.04 Oct. 7, 2008]
|
see also references for fasta.
<databank> contains non-nucleic acid characters ! The first one occurs in sequence <sequence name>
| Program name | Description |
|---|---|
| backtranambig | Back translate a protein sequence to ambiguous codons |
| backtranseq | Back translate a protein sequence |
| blast | BLAST search of query sequence(s) against sequence search set |
| charge | Protein charge plot |
| checktrans | Reports STOP codons and ORF statistics of a protein |
| compseq | Count composition of dimer/trimer/etc words in a sequence |
| ebi_blast | WU-BLAST search of query sequence against sequence databank using EBI Web Services |
| ebi_fasta | fastA search of query sequence against sequence databank using EBI Web Services |
| emowse | Protein identification by mass spectrometry |
| fasta | fastA search of query sequence(s) against sequence search set |
| freak | Residue/base frequency table or plot |
| iep | Calculates the isoelectric point of a protein |
| mwcontam | Shows molwts that match across a set of files |
| mwfilter | Filter noisy molwts from mass spec output |
| octanol | Displays protein hydropathy |
| pepinfo | Plots simple amino acid properties in parallel |
| pepstats | Protein statistics |
| pepwindow | Displays protein hydropathy |
| pepwindowall | Displays protein hydropathy of a set of sequences |
| phiblast | Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match |
| psiblast | Iterative BLAST search with generation of profile of protein sequence against protein sequence set |
| lfasta | Finds local alignments between two sequences, using fastA |
The programs fasts,... themselves were written by
William R. Pearson
Department of Biochemistry
Box 440, Jordan Hall
U. of Virginia
Charlottesville, VA
wrp@virginia.EDU