|
|
fasta |
When all of the search set sequences have been compared to the query,
the list of best scores is printed. If alignments were requested, the alignments
are also printed. For searches with a protein query sequence against a
protein search set, a full Smith-Waterman local alignment (not restricted
to a band, and therefore allowing unlimited gap lengths) is performed,
and a Smith-Waterman score is reported along with the other scores and
the alignment itself. (This alignment may not be the same alignment that
the "local alignment in a band" algorithm used to calculate the opt score
during the search.). By default, the alignment for nucleic acid searches
and TFastA is the same "local alignment in a band" that was performed to
calculate the opt score.
In evaluating the E() scores, the following rules of thumb can
be used : for searches of a protein database of 10,000 sequences, sequences
with E() less than 0.01 are almost always found to be homologous. Sequences
with E() between 1 and 10 frequently turn out to be related as well.
> fasta
fastA search of query sequence(s) against sequence search set
1 : fasta nuc against nuc
2 : fasta prot against prot
3 : fastx (nuc against prot with codon/aa alignment)
4 : fasty (idem + allowing intracodon gaps)
5 : tfastx (prot against nuc with codon/aa alignment)
6 : tfasty (idem + allowing intracodon gaps)
7 : ssearch (using SW instead) nuc against nuc
8 : ssearch (using SW instead) prot against prot
9 : ggsearch (NW global/global alignment) nuc against nuc
10 : ggsearch (NW global/global alignment) prot against prot
11 : glsearch (global/local alignment) nuc against nuc
12 : glsearch (global/local alignment) prot against prot
Select type of search you want to run [1]:
Query sequence(s): embl:m25165
1 : standard set
2 : user defined set
3 : user provided fastA databank
Select search set type [1]:
em : EMBL (general nucleic acid databank)
emblnontags : EMBL without EST and GSS
hum : EMBL humans
mus : EMBL mice
rod : EMBL other rodents
mam : EMBL other mammals
vrt : EMBL other vertebrates
inv : EMBL invertebrates
pln : EMBL plants
fun : EMBL fungi
pro : EMBL bacteria
phg : EMBL bacteriophages
vrl : EMBL other viruses
est : EMBL Expressed Sequence Tags
gss : EMBL Genome Survey Sequences
sts : EMBL Sequence Tagged Sites
htg : EMBL High Throughput Genomic
htc : EMBL High Throughput cDNA
env : EMBL environmental samples
pat : EMBL patents
tgn : EMBL transgenic
syn : EMBL synthetic
unc : EMBL unclassified
new : EMBL updates since last release
wgs : EMBL Whole Genome Shotgun
refseq : RefSeq (NCBI reference sequences)
refseqwgs : RefSeq Whole Genome Shotgun
refseqgen : RefSeq other genomic
refseqrna : RefSeq transcripts
vec : Intelligenetics vector databank
emvec : EMBL vector subset
epd : Eukaryotic Promoter Database
ligm : ImMunoGeneTics databank Igg. + TcR genes
hla : ImMunoGeneTics databank human MHC genes
pdbn : PDB (nucleic acids with known 3D structure)
Standard nucleic acid search set [emblnontags]: phg
Word (ktup) size [6]:
E() value cutoff [2.0]:
Output file [m25165.fasta]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
-program menu [1] Search type : fastA or optimal
alignment, nuc. or prot. (Values: 1 (fasta
nuc against nuc); 2 (fasta prot against
prot); 3 (fastx (nuc against prot with
codon/aa alignment)); 4 (fasty (idem +
allowing intracodon gaps)); 5 (tfastx (prot
against nuc with codon/aa alignment)); 6
(tfasty (idem + allowing intracodon gaps));
7 (ssearch (using SW instead) nuc against
nuc); 8 (ssearch (using SW instead) prot
against prot); 9 (ggsearch (NW global/global
alignment) nuc against nuc); 10 (ggsearch
(NW global/global alignment) prot against
prot); 11 (glsearch (global/local alignment)
nuc against nuc); 12 (glsearch
(global/local alignment) prot against prot))
[-seqs] seqall Query sequence(s)
-dbtype menu [1] Search set type : public databank or
databank provided by user (Values: 1
(standard set); 2 (user defined set); 3
(user provided fastA databank))
* -nucdb menu [emblnontags] Standard nucleic acid search
set (Values: em (EMBL (general nucleic acid
databank)); emblnontags (EMBL without EST
and GSS); hum (EMBL humans); mus (EMBL
mice); rod (EMBL other rodents); mam (EMBL
other mammals); vrt (EMBL other
vertebrates); inv (EMBL invertebrates); pln
(EMBL plants); fun (EMBL fungi); pro (EMBL
bacteria); phg (EMBL bacteriophages); vrl
(EMBL other viruses); est (EMBL Expressed
Sequence Tags); gss (EMBL Genome Survey
Sequences); sts (EMBL Sequence Tagged
Sites); htg (EMBL High Throughput Genomic);
htc (EMBL High Throughput cDNA); env (EMBL
environmental samples); pat (EMBL patents);
tgn (EMBL transgenic); syn (EMBL synthetic);
unc (EMBL unclassified); new (EMBL updates
since last release); wgs (EMBL Whole Genome
Shotgun); refseq (RefSeq (NCBI reference
sequences)); refseqwgs (RefSeq Whole Genome
Shotgun); refseqgen (RefSeq other genomic);
refseqrna (RefSeq transcripts); vec
(Intelligenetics vector databank); emvec
(EMBL vector subset); epd (Eukaryotic
Promoter Database); ligm (ImMunoGeneTics
databank Igg. + TcR genes); hla
(ImMunoGeneTics databank human MHC genes);
pdbn (PDB (nucleic acids with known 3D
structure)))
* -protdb menu [up] Standard protein search set (Values: sw
(SwissProt (highly annotated protein
databank)); up (UniProt (SwissProt + TrEMBL,
EMBL ORF translations)); uniref100
(UniRef100 (UniProt nonredundant subset));
uniref90 (UniRef90 (UniRef100 subset with no
more than 90% identity)); uniref50
(UniRef50 (UniRef100 subset with no more
than 50% identity)); remt (REM-TrEMBL (old
EMBL ORF translations not incl. in
UniProt)); pir (PIR (old general protein
databank)); gp (GenPept (GenBank ORF
translations)); refseqp (RefSeq (NCBI
reference protein sequences)); pdb (PDB
(proteins with known 3D structure)); gpcrdb
(G protein coupled receptors))
* -userdb seqall User defined search set
* -userfastadb infile User provided fastA format databank (you can
make one using seqret)
* -wordsize integer [6 for fasta nucleic, 2 for other search
types] Word (ktup) size (1 to 6 for fasta
nucleic, 1 or 2 for other search types)
-expect float [2.0 for nucleic, 10.0 for protein, 5.0 for
mixed] E() value = number of databank
sequences with same or higher Z-score that
you expect to find by chance. fastA lists
sequences with an E() value lower than the
cutoff. (Number 0.000 or more)
[-outfile] outfile [*.fasta] Output file name
Additional (Optional) qualifiers (* if not always prompted):
* -[no]reverse boolean [Y] Search also complementary strand (is
default). If you switch this off fasta will
search only the forward strand of the query
sequence or of the search set sequences.
* -match integer [5] Nucleotide match reward (Integer 0 or
more)
* -mismatch integer [-4] Nucleotide mismatch penalty (Integer up
to 0)
* -matrix menu [BL50] Amino acid comparison matrix (Values:
BL50 (BLOSUM50); BL62 (BLOSUM62); BL80
(BLOSUM80); P120 (PAM120); P250 (PAM250);
M10 (Jones, Taylor, Thornton PAM10); M20
(Jones, Taylor, Thornton PAM20); M40 (Jones,
Taylor, Thornton PAM40); VT160 (Vingron
resolvent PAM160); OPT5 (OPTIMA 5))
* -intercodon integer [20] Frame shift penalty between codons
(Integer 0 or more)
* -intracodon integer [24] Frame shift penalty inside codons
(Integer 0 or more)
-gappenalty integer [14 for nucleic, 10 for protein, 12 for
fastx/y, 14 for tfastx/y] Gap penalty
(Integer 0 or more)
-gaplength integer [4 for nucleic, 2 for protein, 2 for mixed]
Gap length penalty. fastA subtracts from the
similarity score for each gap a penalty of
type <Gap penalty> + <Gap length penalty> *
n (Integer 0 or more)
* -gencode menu [1] Genetic code for translating sequences
(Values: 1 (Standard); 2 (Vertebrate
Mitochondrial); 3 (Yeast Mitochondrial); 4
(Mold, Protozoan, Coelenterate Mitochondrial
and Mycoplasma/Spiroplasma); 5
(Invertebrate Mitochondrial); 6 (Ciliate,
Dasycladacean and Hexamita); 9
(Echinodermate Mitochondrial); 10
(Euplotid); 11 (Eubacterial); 12
(Alternative Yeast); 13 (Ascidian
Mitochondrial); 14 (Flatworm Mitochondrial);
15 (Blepharisma); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus
Mitochondrial); 23 (Thraustochytrium
Mitochondrial))
-format menu [0] Alignment format (Values: 0 (standard);
1 (x = conservative replacements, X =
non-conservative substitutions); 2 (show
only residues in sequence 2 that differ from
sequence 1); 4 (alignment map); 5 (standard
with added alignment map); 9 (long format
best scores report); 10 (write alignments in
parsable format))
Advanced (Unprompted) qualifiers:
-[no]stat boolean [Y] Compute statistics (is default). If you
switch this off, fastA will sort on opt
instead off bit score and report no more
than 20 best scoring sequences.
-shuffle boolean Use shuffled databank sequences for
statistics. Useful if search set contains no
sequences unrelated to query sequence.
-hide float [0.000] Do not show sequences with E() value
lower than f. (Number 0.000 or more)
-zscore boolean write Z-score instead of bit score in list
-[no]histogram boolean [Y] Show histogram (is default)
-listsize integer [0] Show only the n best scoring sequences
that satisfy E() cutoff (Integer 0 or more)
-align integer [0] Show only alignments for the n first
sequences (Integer 0 or more)
-linesize integer [60] Number of residues per line of the
alignment (Integer from 10 to 200)
Associated qualifiers:
"-seqs" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-userdb" associated qualifiers
-sbegin integer Start of each sequence to be used
-send integer End of each sequence to be used
-sreverse boolean Reverse (if DNA)
-sask boolean Ask for begin/end/reverse
-snucleotide boolean Sequence is nucleotide
-sprotein boolean Sequence is protein
-slower boolean Make lower case
-supper boolean Make upper case
-sformat string Input sequence format
-sdbname string Database name
-sid string Entryname
-ufo string UFO features
-fformat string Features format
-fopenfile string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -program | Search type : fastA or optimal alignment, nuc. or prot. |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-seqs] (Parameter 1) |
Query sequence(s) | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbtype | Search set type : public databank or databank provided by user |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -nucdb | Standard nucleic acid search set |
|
emblnontags | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -protdb | Standard protein search set |
|
up | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userdb | User defined search set | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userfastadb | User provided fastA format databank (you can make one using seqret) | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -wordsize | Word (ktup) size | 1 to 6 for fasta nucleic, 1 or 2 for other search types | 6 for fasta nucleic, 2 for other search types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -expect | E() value = number of databank sequences with same or higher Z-score that you expect to find by chance. fastA lists sequences with an E() value lower than the cutoff. | Number 0.000 or more | 2.0 for nucleic, 10.0 for protein, 5.0 for mixed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.<program> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]reverse | Search also complementary strand (is default). If you switch this off fasta will search only the forward strand of the query sequence or of the search set sequences. | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -match | Nucleotide match reward | Integer 0 or more | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -mismatch | Nucleotide mismatch penalty | Integer up to 0 | -4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -matrix | Amino acid comparison matrix |
|
BL50 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -intercodon | Frame shift penalty between codons | Integer 0 or more | 20 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -intracodon | Frame shift penalty inside codons | Integer 0 or more | 24 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gappenalty | Gap penalty | Integer 0 or more | 14 for nucleic, 10 for protein, 12 for fastx/y, 14 for tfastx/y | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gaplength | Gap length penalty. fastA subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n | Integer 0 or more | 4 for nucleic, 2 for protein, 2 for mixed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gencode | Genetic code for translating sequences |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -format | Alignment format |
|
0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]stat | Compute statistics (is default). If you switch this off, fastA will sort on opt instead off bit score and report no more than 20 best scoring sequences. | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -shuffle | Use shuffled databank sequences for statistics. Useful if search set contains no sequences unrelated to query sequence. | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -hide | Do not show sequences with E() value lower than f. | Number 0.000 or more | 0.000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -zscore | write Z-score instead of bit score in list | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]histogram | Show histogram (is default) | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -listsize | Show only the n best scoring sequences that satisfy E() cutoff | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -align | Show only alignments for the n first sequences | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -linesize | Number of residues per line of the alignment | Integer from 10 to 200 | 60 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You can select your search set in three different ways :
The first part of the output file contains a histogram showing the distribution
of the opt scores between the query and search set sequences. (See the
algorithm
topic for an explanation of opt score.) The histogram is composed of bins
of size 2 that are labeled according to the higher score for that bin (the
leftmost column of the histogram). For example, the bin labeled 24 stores
the number of sequence pairs that had scores of 23 or 24.
The next two columns of the histogram list the number of opt scores
that fell within each bin. The second column lists the number of opt scores
observed in the search and the third column lists the number of opt scores
that were expected.
The body of the histogram displays a graphical representation of the
score distributions. Equal signs (=) indicate the number of scores of that
magnitude that were observed during the search, while asterisks (*) plot
the number of scores of that magnitude that were expected.
At the bottom of the histogram is a list of some of the parameters
pertaining to the search.
Below the histogram, FastA displays a listing of the best scores. [r]
after
the sequence name in this list indicates that the match was found between
the search set sequence and the reverse complement of the query sequence.
Following the list of best scores, FastA displays the alignments of
the regions of best overlap between the query and search sequences. Note
that for the purpose of clarity the "wrapper" makes sure the "naked"
program outputs complete description lines rather than truncated ones
and, for the purpose of avoiding automated output parsers to break, the
"wrapper" makes sure the description line is concatenated on a single
line rather than being wrapped on several lines.
# /opt/sw/fasta/bin/fasta -q -L -T 2 -l /opt/sw/fasta/fastlibs -n -E 2 -r 5/-4 -f 14 -g 4 -m 0 -w 60 embl-id:M25165 %+phg 6
FASTA searches a protein or DNA sequence data bank
version 35.04 Oct. 7, 2008
Please cite:
W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
Query: embl-id:M25165
1>>>M25165 M25165.1 Bacteriophage lambda P(R) promoter, RNA polymerase bin - 94 nt
Library: EMBL bacteriophages 31460588 residues in 4076 sequences
opt E()
< 20 86 0:==========
22 0 0: one = represents 9 library sequences
24 0 0:
26 0 0:
28 0 1:*
30 0 6:*
32 10 22:==*
34 22 59:=== *
36 49 121:====== *
38 154 200:================== *
40 206 279:======================= *
42 291 341:================================= *
44 524 376:=========================================*=================
46 384 383:==========================================*
48 376 367:========================================*=
50 440 335:=====================================*===========
52 309 294:================================*==
54 245 251:===========================*
56 209 210:=======================*
58 170 172:===================*
60 137 140:===============*
62 75 112:========= *
64 69 89:======== *
66 86 70:=======*==
68 61 55:======*
70 33 43:====*
72 22 34:===*
74 37 26:==*==
76 24 21:==*
78 19 16:=*=
80 6 12:=*
82 10 9:*=
84 10 8:*=
86 2 6:*
88 2 5:* inset = represents 1 library sequences
90 1 3:*
92 4 3:* :==*=
94 0 2:* : *
96 1 2:* :=*
98 0 1:* :*
100 0 1:* :*
102 0 1:* :*
104 0 1:* :*
106 1 0:= *=
108 0 0: *
110 2 0:= *==
112 1 0:= *=
114 0 0: *
116 0 0: *
118 0 0: *
>120 22 0:=== *======================
31462844 residues in 4100 sequences
Statistics: Expectation_n fit: rho(ln(x))= 4.7414+/- 0.001; mu= 14.3669+/- 0.070
mean_var=89.9386+/-13.435, 0's: 0 Z-trim: 108 B-trim: 80 in 2/84
Lambda= 0.135239
Kolmogorov-Smirnov statistic: 0.0738 (N=29) at 42
Algorithm: FASTA (3.5 Sept 2006) [optimized]
Parameters: 5/-4 matrix (5:-4) ktup: 6
join: 46, opt: 31, open/ext: -14/-4, width: 16
Scan time: 2.770
The best scores are: opt bits E(4100)
em:J02459 [Enterobacteria phage lambda] Entero (48502) [f] 470 101.9 6.4e-22
embl:DJ344380 Method for preparation of artifi (48502) [f] 470 101.9 6.4e-22
embl:DJ347897 Method for preparation of artifi (48502) [f] 470 101.9 6.4e-22
embl:J02459 Enterobacteria phage lambda, compl (48502) [f] 470 101.9 6.4e-22
embl:EF120455 Enterobacteria phage HK544 restr (3564) [f] 470 100.5 1.6e-21
embl:AF069529 Bacteriophage HK97, complete gen (39732) [f] 461 100.0 2.3e-21
embl:EF120461 Enterobacteria phage HK106 conse (4202) [f] 461 98.8 5.2e-21
embl:EF120460 Enterobacteria phage mEp234 cons (4202) [f] 461 98.8 5.2e-21
embl:M25165 Bacteriophage lambda P(R) promoter ( 94) [f] 470 98.6 6e-21
embl:EF120459 Enterobacteria phage CL707 IS10 (5055) [f] 452 97.2 1.6e-20
embl:EF120458 Enterobacteria phage HK542 hypot (3972) [f] 452 97.1 1.8e-20
embl:EF120457 Enterobacteria phage HK244 hypot (3921) [f] 452 97.1 1.8e-20
embl:DQ372059 Enterobacteria phage lambda clon (1206) [f] 445 95.1 7.1e-20
embl:DQ372056 Enterobacteria phage lambda clon (1206) [f] 445 95.1 7.1e-20
embl:DQ372058 Enterobacteria phage lambda clon (1206) [f] 445 95.1 7.1e-20
embl:DQ372057 Enterobacteria phage lambda clon (1206) [f] 445 95.1 7.1e-20
embl:DQ372060 Enterobacteria phage lambda clon (1206) [f] 436 93.3 2.4e-19
embl:EF120456 Enterobacteria phage mEp332 rest (4251) [f] 429 92.6 3.9e-19
embl:M29179 Bacteriophage lambda O-R region pr ( 88) [f] 400 84.9 7.9e-17
embl:X70116 Bacteriophage lambda promoter regi ( 86) [f] 395 84.0 1.6e-16
embl:M25081 Bacteriophage lambda O-R operator, ( 76) [f] 380 81.0 1.3e-15
em:J02459 [Enterobacteria phage lambda] Entero (48502) [r] 126 34.8 0.1
embl:DJ347897 Method for preparation of artifi (48502) [r] 126 34.8 0.1
embl:DJ344380 Method for preparation of artifi (48502) [r] 126 34.8 0.1
embl:J02459 Enterobacteria phage lambda, compl (48502) [r] 126 34.8 0.1
embl:AF069529 Bacteriophage HK97, complete gen (39732) [r] 126 34.6 0.11
embl:EF120459 Enterobacteria phage CL707 IS10 (5055) [r] 126 33.6 0.23
embl:EF120460 Enterobacteria phage mEp234 cons (4202) [r] 126 33.5 0.25
embl:EF120461 Enterobacteria phage HK106 conse (4202) [r] 126 33.5 0.25
embl:EF120458 Enterobacteria phage HK542 hypot (3972) [r] 126 33.5 0.25
embl:EF120457 Enterobacteria phage HK244 hypot (3921) [r] 126 33.5 0.25
embl:EF120455 Enterobacteria phage HK544 restr (3564) [r] 126 33.4 0.26
embl:EF120456 Enterobacteria phage mEp332 rest (4251) [r] 123 32.9 0.37
embl:DQ372056 Enterobacteria phage lambda clon (1206) [r] 126 32.8 0.38
embl:DQ372060 Enterobacteria phage lambda clon (1206) [r] 126 32.8 0.38
embl:DQ372057 Enterobacteria phage lambda clon (1206) [r] 126 32.8 0.38
embl:DQ372058 Enterobacteria phage lambda clon (1206) [r] 126 32.8 0.38
embl:DQ372059 Enterobacteria phage lambda clon (1206) [r] 126 32.8 0.38
embl:M25165 Bacteriophage lambda P(R) promoter ( 94) [r] 126 31.5 0.96
embl:M29179 Bacteriophage lambda O-R region pr ( 88) [r] 126 31.5 0.98
embl:X70116 Bacteriophage lambda promoter regi ( 86) [r] 126 31.5 0.99
embl:M25081 Bacteriophage lambda O-R operator, ( 76) [r] 126 31.4 1
embl:V00638 Lambda genome from map unit 74 bac (3400) [r] 113 30.8 1.5
>>em:J02459 [Enterobacteria phage lambda] Enterobacteria phage lambda, complete genome. (48502 nt)
initn: 470 init1: 470 opt: 470 Z-score: 476.5 bits: 101.9 E(): 6.4e-22
banded Smith-Waterman score: 470; 100.0% identity (100.0% similar) in 94 nt overlap (1-94:37946-38039)
10 20 30
M25165 AAATCTATCACCGCAAGGGATAAATATCTA
::::::::::::::::::::::::::::::
em:J02 TTAATGGTTTCTTTTTTGTGCTCATACGTTAAATCTATCACCGCAAGGGATAAATATCTA
37920 37930 37940 37950 37960 37970
40 50 60 70 80 90
M25165 ACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
em:J02 ACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAG
37980 37990 38000 38010 38020 38030
M25165 GTTG
::::
em:J02 GTTGTATGGAACAACGCATAACCCTGAAAGATTATGCAATGCGCTTTGGGCAAACCAAGA
38040 38050 38060 38070 38080 38090
>>embl:DJ344380 Method for preparation of artificial positive control DNAs used for multiplex PCR. (48502 nt)
initn: 470 init1: 470 opt: 470 Z-score: 476.5 bits: 101.9 E(): 6.4e-22
banded Smith-Waterman score: 470; 100.0% identity (100.0% similar) in 94 nt overlap (1-94:37946-38039)
10 20 30
M25165 AAATCTATCACCGCAAGGGATAAATATCTA
::::::::::::::::::::::::::::::
embl:D TTAATGGTTTCTTTTTTGTGCTCATACGTTAAATCTATCACCGCAAGGGATAAATATCTA
37920 37930 37940 37950 37960 37970
40 50 60 70 80 90
M25165 ACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAG
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
embl:D ACACCGTGCGTGTTGACTATTTTACCTCTGGCGGTGATAATGGTTGCATGTACTAAGGAG
37980 37990 38000 38010 38020 38030
M25165 GTTG
::::
embl:D GTTGTATGGAACAACGCATAACCCTGAAAGATTATGCAATGCGCTTTGGGCAAACCAAGA
38040 38050 38060 38070 38080 38090
[Part of this file has been deleted for brevity]
>>embl:V00638 Lambda genome from map unit 74 backward to map unit 67. (3400 nt)
rev-comp initn: 96 init1: 96 opt: 113 Z-score: 113.3 bits: 30.8 E(): 1.5
banded Smith-Waterman score: 113; 69.6% identity (69.6% similar) in 56 nt overlap (72-17:3086-3138)
90 80 70 60 50
M2516- CAACCTCCTTAGTACATGCAACCATTATCACCGCCAGAGGTAAAATAGTCAA
:: :::::::::::: :::: :::::
embl:V ATGGTGGTCAGTGCGTCCTGCTGATGTGCTCAGTATCACCGCCAGTGGTATTTATGTCAA
3060 3070 3080 3090 3100 3110
40 30 20 10
M2516- CACGCACGGTGTTAGATATTTATCCCTTGCGGTGATAGATTT
::: : : : :: ::::::: :
embl:V CACCGCCAGAGATA---ATTTATCACCGCAGATGGTTATCTGTATGTTTTTTATATGAAT
3120 3130 3140 3150 3160 3170
94 residues in 1 query sequences
31460588 residues in 4076 library sequences
Tcomplib [35.04] (2 proc)
start: Wed Oct 15 16:18:06 2008 done: Wed Oct 15 16:18:08 2008
Total Scan time: 2.770 Total Display time: 0.030
Function used was FASTA [version 35.04 Oct. 7, 2008]
|
If the search set contains no sequences that are unrelated to the query sequence the statistics will be flawed, because fasta uses low scoring databank sequences as model for false positives. You can mend this by requesting to use instead shuffled versions of databank sequences (option -shuffle).
The sensitivity of the search depends on the chosen scoring scheme and it is important to choose an appropriate gap penalty. Prof. Pearson recommends the following gap penalties for different protein scoring matrices :
| scoring matrix | recommended gap penalty | recommended gap length penalty |
|---|---|---|
| BLOSUM50 | 12 | 2 |
| BLOSUM62 | 12 | 1 |
| BLOSUM80 | 12 | 2 |
| PAM120 | 20 | 3 |
| PAM250 | 12 | 2 |
| Jones, Taylor, Thornton PAM10 | 27 | 4 |
| Jones, Taylor, Thornton PAM20 | 26 | 4 |
| Jones, Taylor, Thornton PAM40 | 25 | 4 |
| Vingron resolvent PAM160 | 12 | 2 |
| OPTIMA 5 | 20 | 2 |
Note that it is important to choose a gap penalty that is adapted to the choses scoring matrix. For protein searches, see the Notes for recommended gap penalties.
<databank> contains non-nucleic acid characters ! The first one occurs in sequence <sequence name>
| Program name | Description |
|---|---|
| blast | BLAST search of query sequence(s) against sequence search set |
| ebi_blast | WU-BLAST search of query sequence against sequence databank using EBI Web Services |
| ebi_fasta | fastA search of query sequence against sequence databank using EBI Web Services |
| fasts | Protein identification from peptides using fastA algorithm |
| phiblast | Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match |
| psiblast | Iterative BLAST search with generation of profile of protein sequence against protein sequence set |
| lfasta | Finds local alignments between two sequences, using fastA |
The programs fasta,... themselves were written by
William R. Pearson
Department of Biochemistry
Box 440, Jordan Hall
U. of Virginia
Charlottesville, VA
wrp@virginia.EDU