|
|
blast |
To describe the algorithm briefly, BLAST compares a query sequence with a database sequence by first locating two non-overlapping sequence segments in common within a certain distance of each other, and then attempting to extend these so-called "hits" into locally optimal alignments between the sequences being compared. We provide a more detailed description below.
It is also possible to request "soft filtering" (-seqsoftfilter), that means that masking should only be done when searching for the initial "hits" but that growing alignments are allowed to extend into a region of low compositional complexity.






| A given class of alignments is best distinguished from chance by the substitution matrix whose target frequencies characterize the class. |
> blast
BLAST search of query sequence(s) against sequence search set
1 : blastn (nuc against nuc)
2 : blastp (prot against prot)
3 : blastx (nuc translated against prot)
4 : tblastn (prot against nuc translated)
5 : tblastx (nuc translated against nuc translated)
Select type of search you want to run [2]:
Query sequence(s): sw:papa1_carpa
1 : standard set
2 : user defined set
3 : user provided BLAST databank
Select search set type [1]:
sw : SwissProt (highly annotated protein databank)
up : UniProt (SwissProt + TrEMBL, EMBL ORF translations)
uniref100 : UniRef100 (UniProt nonredundant subset)
uniref90 : UniRef90 (UniRef100 subset with no more than 90% identity)
uniref50 : UniRef50 (UniRef100 subset with no more than 50% identity)
remt : REM-TrEMBL (old EMBL ORF translations not incl. in UniProt)
pir : PIR (old general protein databank)
gp : GenPept (GenBank ORF translations)
refseqp : RefSeq (NCBI reference protein sequences)
pdb : PDB (proteins with known 3D structure)
gpcrdb : G protein coupled receptors
Standard protein search set [up]: sw
Word size [3]:
E() value cutoff [10.0]:
Output file [papa1_carpa.blastp]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
-program menu [2] Search type : nuc. or prot. (Values: 1
(blastn (nuc against nuc)); 2 (blastp (prot
against prot)); 3 (blastx (nuc translated
against prot)); 4 (tblastn (prot against nuc
translated)); 5 (tblastx (nuc translated
against nuc translated)))
[-seqs] seqall Query sequence(s)
-dbtype menu [1] Search set type : public databank or
databank provided by user (Values: 1
(standard set); 2 (user defined set); 3
(user provided BLAST databank))
* -nucdb menu [emblnontags] Standard nucleic acid search
set (Values: em (EMBL (general nucleic acid
databank)); emblnontags (EMBL without EST
and GSS); hum (EMBL humans); mus (EMBL
mice); rod (EMBL other rodents); mam (EMBL
other mammals); vrt (EMBL other
vertebrates); inv (EMBL invertebrates); pln
(EMBL plants); fun (EMBL fungi); pro (EMBL
bacteria); phg (EMBL bacteriophages); vrl
(EMBL other viruses); est (EMBL Expressed
Sequence Tags); gss (EMBL Genome Survey
Sequences); sts (EMBL Sequence Tagged
Sites); htg (EMBL High Throughput Genomic);
htc (EMBL High Throughput cDNA); env (EMBL
environmental samples); pat (EMBL patents);
tgn (EMBL transgenic); syn (EMBL synthetic);
unc (EMBL unclassified); new (EMBL updates
since last release); wgs (EMBL Whole Genome
Shotgun); refseq (RefSeq (NCBI reference
sequences)); refseqwgs (RefSeq Whole Genome
Shotgun); refseqgen (RefSeq other genomic);
refseqrna (RefSeq transcripts); vec
(Intelligenetics vector databank); emvec
(EMBL vector subset); epd (Eukaryotic
Promoter Database); ligm (ImMunoGeneTics
databank Igg. + TcR genes); hla
(ImMunoGeneTics databank human MHC genes);
pdbn (PDB (nucleic acids with known 3D
structure)))
* -protdb menu [up] Standard protein search set (Values: sw
(SwissProt (highly annotated protein
databank)); up (UniProt (SwissProt + TrEMBL,
EMBL ORF translations)); uniref100
(UniRef100 (UniProt nonredundant subset));
uniref90 (UniRef90 (UniRef100 subset with no
more than 90% identity)); uniref50
(UniRef50 (UniRef100 subset with no more
than 50% identity)); remt (REM-TrEMBL (old
EMBL ORF translations not incl. in
UniProt)); pir (PIR (old general protein
databank)); gp (GenPept (GenBank ORF
translations)); refseqp (RefSeq (NCBI
reference protein sequences)); pdb (PDB
(proteins with known 3D structure)); gpcrdb
(G protein coupled receptors))
* -userdb seqall User defined search set
* -userblastdb infile User provided BLAST format databank (you can
make one using makeblastdb)
-wordsize integer [11 for blastn, 3 for other search types]
Word size (7 or more for blastn, 2 or 3 for
other search types)
-expect float [10.0] E() value = number of databank
sequences with same or higher bit score that
you expect to find by chance. BLAST lists
sequences with an E() value lower than the
cutoff. (Number 0.000 or more)
[-outfile] outfile [*.blast] Output file name
Additional (Optional) qualifiers (* if not always prompted):
* -strand selection [both] Strand to search. By default BLAST
searches both strands, but for blastn and
(t)blastx you can choose to search only the
top or bottom strand of the databank
respectively query sequence.
* -match integer [1] Nucleotide match reward (Integer 0 or
more)
* -mismatch integer [-3] Nucleotide mismatch penalty (Integer up
to 0)
* -matrix selection [3] Amino acid comparison matrix
* -gappenalty integer [5 for blastn, 11 for other search types]
Gap penalty (Integer 0 or more)
* -gaplength integer [2 for blastn, 1 for other search types] Gap
length penalty. BLAST subtracts from the
similarity score for each gap a penalty of
type <Gap penalty> + <Gap length penalty> *
n. Only certain combinations of matrix and
gap penalty are allowed, see on-line manual.
(Integer 0 or more)
* -seqgencode menu [1] Genetic code for translating query
sequence(s) (Values: 1 (Standard); 2
(Vertebrate Mitochondrial); 3 (Yeast
Mitochondrial); 4 (Mold, Protozoan,
Coelenterate Mitochondrial and
Mycoplasma/Spiroplasma); 5 (Invertebrate
Mitochondrial); 6 (Ciliate, Dasycladacean
and Hexamita); 9 (Echinoderm Mitochondrial);
10 (Euplotid); 11 (Bacterial); 12
(Alternative Yeast); 13 (Ascidian
Mitochondrial); 14 (Flatworm Mitochondrial);
15 (Blepharisma); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus
mitochondrial); 23 (Thraustochytrium
mitochondrial))
* -dbgencode menu [1] Genetic code for translating databank
sequences (Values: 1 (Standard); 2
(Vertebrate Mitochondrial); 3 (Yeast
Mitochondrial); 4 (Mold, Protozoan,
Coelenterate Mitochondrial and
Mycoplasma/Spiroplasma); 5 (Invertebrate
Mitochondrial); 6 (Ciliate, Dasycladacean
and Hexamita); 9 (Echinoderm Mitochondrial);
10 (Euplotid); 11 (Bacterial); 12
(Alternative Yeast); 13 (Ascidian
Mitochondrial); 14 (Flatworm Mitochondrial);
15 (Blepharisma); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus
mitochondrial); 23 (Thraustochytrium
mitochondrial))
* -compstats menu [0] The E() value can be computed more
accurately if the composition of the
sequences being compared is taken into
account. For blastp and tblastn you can
choose to adjust or rescale the scoring
scheme, as is done for PSI-BLAST (Values: 0
(none); 1 (scale); 2 (adjust conditionally,
otherwise scale); 3 (adjust))
-format menu [0] Alignment format (Values: 0 (pairwise);
1 (query-anchored, showing identities); 2
(query-anchored, no identities); 3 (flat
query-anchored, show identities); 4 (flat
query-anchored, no identities); 5
(query-anchored, no identities and blunt
ends); 6 (flat query-anchored, no identities
and blunt ends); 7 (XML Blast output); 8
(tabular); 9 (tabular, with comment lines);
10 (ASN.1))
Advanced (Unprompted) qualifiers:
-[no]gaps toggle [Y] Make gapped alignments (is default)
-[no]seqfilter boolean [Y] Filter low complexity segments out of
query sequence(s) (is default)
-seqcoilfilter boolean Filter coiled coils out of query sequence(s)
-seqsoftfilter boolean Use soft filtering, that is, filter only at
initial hit searching, not at hit extension
-[no]doublehit boolean [Y] Try to extend hit only if there is a
second hit (not for blastn, is default for
other search types)
-window integer [not for blastn, 40 for other search types]
Multiple hits window size (Integer 0 or
more)
-effdbsize float [0.000] Effective databank size for
statistical calculations (Number 0.000 or
more)
-keep integer [0] Keep only n best hits from same region.
Default is to show them all. If you use this
option, a value of 100 is recommended.
(Integer 0 or more)
-listsize integer [50] Show only the n best scoring sequences
that satisfy E() cutoff (Integer 0 or more)
-align integer [25] Show only alignments for the n first
sequences (Integer 0 or more, but not >
listsize)
Associated qualifiers:
"-seqs" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-userdb" associated qualifiers
-sbegin integer Start of each sequence to be used
-send integer End of each sequence to be used
-sreverse boolean Reverse (if DNA)
-sask boolean Ask for begin/end/reverse
-snucleotide boolean Sequence is nucleotide
-sprotein boolean Sequence is protein
-slower boolean Make lower case
-supper boolean Make upper case
-sformat string Input sequence format
-sdbname string Database name
-sid string Entryname
-ufo string UFO features
-fformat string Features format
-fopenfile string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -program | Search type : nuc. or prot. |
|
2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-seqs] (Parameter 1) |
Query sequence(s) | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbtype | Search set type : public databank or databank provided by user |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -nucdb | Standard nucleic acid search set |
|
emblnontags | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -protdb | Standard protein search set |
|
up | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userdb | User defined search set | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userblastdb | User provided BLAST format databank (you can make one using makeblastdb) | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -wordsize | Word size | 7 or more for blastn, 2 or 3 for other search types | 11 for blastn, 3 for other search types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -expect | E() value = number of databank sequences with same or higher bit score that you expect to find by chance. BLAST lists sequences with an E() value lower than the cutoff. | Number 0.000 or more | 10.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.<program> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -strand | Strand to search. By default BLAST searches both strands, but for blastn and (t)blastx you can choose to search only the top or bottom strand of the databank respectively query sequence. | Choose from selection list of values | both | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -match | Nucleotide match reward | Integer 0 or more | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -mismatch | Nucleotide mismatch penalty | Integer up to 0 | -3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -matrix | Amino acid comparison matrix | Choose from selection list of values | BLOSUM62 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gappenalty | Gap penalty | Integer 0 or more | 5 for blastn, 11 for other search types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gaplength | Gap length penalty. BLAST subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n. Only certain combinations of matrix and gap penalty are allowed, see on-line manual. | Integer 0 or more | 2 for blastn, 1 for other search types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqgencode | Genetic code for translating query sequence(s) |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbgencode | Genetic code for translating databank sequences |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -compstats | The E() value can be computed more accurately if the composition of the sequences being compared is taken into account. For blastp and tblastn you can choose to adjust or rescale the scoring scheme, as is done for PSI-BLAST |
|
0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -format | Alignment format |
|
0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]gaps | Make gapped alignments (is default) | Toggle value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]seqfilter | Filter low complexity segments out of query sequence(s) (is default) | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqcoilfilter | Filter coiled coils out of query sequence(s) | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqsoftfilter | Use soft filtering, that is, filter only at initial hit searching, not at hit extension | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]doublehit | Try to extend hit only if there is a second hit (not for blastn, is default for other search types) | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -window | Multiple hits window size | Integer 0 or more | not for blastn, 40 for other search types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -effdbsize | Effective databank size for statistical calculations | Number 0.000 or more | 0.000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -keep | Keep only n best hits from same region. Default is to show them all. If you use this option, a value of 100 is recommended. | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -listsize | Show only the n best scoring sequences that satisfy E() cutoff | Integer 0 or more | 50 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -align | Show only alignments for the n first sequences | Integer 0 or more, but not > listsize | 25 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You can select your search set in three different ways :
BLASTP 2.2.18 [Mar-02-2008]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= PAPA1_CARPA P00784 Papain precursor (EC 3.4.22.2) (Papaya
proteinase I) (PPI) (Allergen Car p 1).
(345 letters)
Database: SwissProt (manually annotated part of UniProt, including
splice variants)
387,470 sequences; 145,667,171 total letters
Score E
Sequences producing significant alignments: (bits) Value
sw:PAPA1_CARPA Papain precursor (EC 3.4.22.2) (Papaya proteinase... 721 0.0
sw:PAPA3_CARPA Caricain precursor (EC 3.4.22.30) (Papaya protein... 514 e-145
sw:PAPA4_CARPA Papaya proteinase 4 precursor (EC 3.4.22.25) (Pap... 487 e-137
sw:PAPA2_CARPA Chymopapain precursor (EC 3.4.22.6) (Papaya prote... 451 e-126
sw:XCP1_ARATH Xylem cysteine proteinase 1 precursor (EC 3.4.22.-... 333 1e-90
sw:XCP2_ARATH Xylem cysteine proteinase 2 precursor (EC 3.4.22.-... 322 2e-87
sw:RD21A_ARATH Cysteine proteinase RD21a precursor (EC 3.4.22.-)... 280 1e-74
[Part of this file has been deleted for brevity]
sw:CATL1_RAT Cathepsin L1 precursor (EC 3.4.22.15) (Major excret... 192 2e-48
sw:BROM2_ANACO Stem bromelain (EC 3.4.22.32). 192 3e-48
sw:CATL_SARPE Cathepsin L precursor (EC 3.4.22.15) [Contains: Ca... 191 4e-48
sw:CATK_HUMAN Cathepsin K precursor (EC 3.4.22.38) (Cathepsin O)... 191 5e-48
sw:CATH_RAT Cathepsin H precursor (EC 3.4.22.16) (Cathepsin B3) ... 191 5e-48
sw:CATK_RABIT Cathepsin K precursor (EC 3.4.22.38) (OC-2 protein). 189 2e-47
sw:CYSP2_HOMAM Digestive cysteine proteinase 2 precursor (EC 3.4... 189 2e-47
>sw:PAPA1_CARPA Papain precursor (EC 3.4.22.2) (Papaya proteinase I)
(PPI) (Allergen Car p 1).
Length = 345
Score = 721 bits (1861), Expect = 0.0
Identities = 345/345 (100%), Positives = 345/345 (100%)
Query: 1 MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK 60
MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK
Sbjct: 1 MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK 60
Query: 61 NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE 120
NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE
Sbjct: 61 NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE 120
Query: 121 LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE 180
LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE
Sbjct: 121 LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE 180
Query: 181 YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT 240
YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT
Sbjct: 181 YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT 240
Query: 241 DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG 300
DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG
Sbjct: 241 DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG 300
Query: 301 PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN
Sbjct: 301 PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
>sw:PAPA3_CARPA Caricain precursor (EC 3.4.22.30) (Papaya proteinase
omega) (Papaya proteinase III) (PPIII) (Papaya peptidase
A).
Length = 348
Score = 514 bits (1325), Expect = e-145
Identities = 253/350 (72%), Positives = 286/350 (81%), Gaps = 7/350 (2%)
Query: 1 MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK 60
MAMIPSISKLLFVAICLFV+M +SFGDFSIVGYSQ+DLTSTERLIQLF SWML HNK Y+
Sbjct: 1 MAMIPSISKLLFVAICLFVHMSVSFGDFSIVGYSQDDLTSTERLIQLFNSWMLNHNKFYE 60
Query: 61 NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE 120
N+DEK+YRFEIFKDNL YIDETNKKNNSYWLGLN FAD+SNDEF EKY GS+ T E
Sbjct: 61 NVDEKLYRFEIFKDNLNYIDETNKKNNSYWLGLNEFADLSNDEFNEKYVGSLID--ATIE 118
Query: 121 LSY-EEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLN 179
SY EE +N+ VN+PE VDWR+KGAVTPV++QGSCGSCWAFSAV T+EGI KIRTG L
Sbjct: 119 QSYDEEFINEDTVNLPENVDWRKKGAVTPVRHQGSCGSCWAFSAVATVEGINKIRTGKLV 178
Query: 180 EYSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAK 239
E SEQEL+DC+RRS+GC GGYP AL+ VA+ GIH R+ YPY+ Q CR+++ G K
Sbjct: 179 ELSEQELVDCERRSHGCKGGYPPYALEYVAKNGIHLRSKYPYKAKQGTCRAKQVGGPIVK 238
Query: 240 TDGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGY 299
T GV +VQP NEG LL +IA QPVSVV+E+ G+ FQLY+GGIF GPCG KVDHAV AVGY
Sbjct: 239 TSGVGRVQPNNEGNLLNAIAKQPVSVVVESKGRPFQLYKGGIFEGPCGTKVDHAVTAVGY 298
Query: 300 GPN----YILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
G + YILIKNSWGT WGE GYIRIKR GNS GVCGLY SS+YP KN
Sbjct: 299 GKSGGKGYILIKNSWGTAWGEKGYIRIKRAPGNSPGVCGLYKSSYYPTKN 348
[Part of this file has been deleted for brevity]
>sw:CYSP1_ORYSJ Cysteine protease 1 precursor (EC 3.4.22.-) (OsCP1).
Length = 490
Score = 236 bits (603), Expect = 1e-61
Identities = 131/317 (41%), Positives = 180/317 (56%), Gaps = 23/317 (7%)
Query: 48 FESWMLKHNKIYKN------IDEKIYRFEIFKDNLKYIDETNKK---NNSYWLGLNVFAD 98
++ W+ +H + I E RF +F DNLK++D N + + LG+N FAD
Sbjct: 62 YDLWLARHRRGGGGGSRNGFIGEHERRFRVFWDNLKFVDAHNARADERGGFRLGMNRFAD 121
Query: 99 MSNDEFKEKYTGSI-AGNYTTTELSYEEVLNDGDVNIPEYVDWRQKGAVT-PVKNQGSCG 156
++N EF+ Y G+ AG +Y +DG +P+ VDWR KGAV PVKNQG CG
Sbjct: 122 LTNGEFRATYLGTTPAGRGRRVGEAYR---HDGVEALPDSVDWRDKGAVVAPVKNQGQCG 178
Query: 157 SCWAFSAVVTIEGIIKIRTGNLNEYSEQELLDCDR--RSYGCNGGYPWSALQLVAQYG-I 213
SCWAFSAV +EGI KI TG L SEQEL++C R ++ GCNGG A +A+ G +
Sbjct: 179 SCWAFSAVAAVEGINKIVTGELVSLSEQELVECARNGQNSGCNGGIMDDAFAFIARNGGL 238
Query: 214 HYRNTYPYEGVQRYCRSREKGPYAAKTDGVRQVQPYNEGALLYSIANQPVSVVLEAAGKD 273
YPY + C ++ DG V +E +L ++A+QPVSV ++A G++
Sbjct: 239 DTEEDYPYTAMDGKCNLAKRSRKVVSIDGFEDVPENDELSLQKAVAHQPVSVAIDAGGRE 298
Query: 274 FQLYRGGIFVGPCGNKVDHAVAAVGYGPN------YILIKNSWGTGWGENGYIRIKRGTG 327
FQLY G+F G CG +DH V AVGYG + Y ++NSWG WGENGYIR++R
Sbjct: 299 FQLYDSGVFTGRCGTNLDHGVVAVGYGTDAATGAAYWTVRNSWGPDWGENGYIRMERNVT 358
Query: 328 NSYGVCGLYTSSFYPVK 344
G CG+ + YP+K
Sbjct: 359 ARTGKCGIAMMASYPIK 375
Database: SwissProt (manually annotated part of UniProt, including
splice variants)
Posted date: Apr 23, 2008 4:36 PM
Number of letters in database: 145,667,171
Number of sequences in database: 387,470
Lambda K H
0.318 0.138 0.428
Gapped
Lambda K H
0.267 0.0410 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 387470
Number of Hits to DB: 107,330,994
Number of extensions: 4951578
Number of successful extensions: 11493
Number of sequences better than 10.0: 226
Number of HSP's gapped: 11018
Number of HSP's successfully gapped: 243
Length of query: 345
Length of database: 145,667,171
Length adjustment: 117
Effective length of query: 228
Effective length of database: 100,333,181
Effective search space: 22875965268
Effective search space used: 22875965268
Neighboring words threshold: 11
Window for multiple hits: 40
X1: 16 ( 7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.7 bits)
S2: 69 (31.2 bits)
|
If you want to search a short peptide against a protein databank for
nearly exact matches, you can :
set "Word size" to 2
set E() value to 1000
choose as scoring scheme PAM30, gap penalty 9,
gap length penalty 1.
If you want to search a short oligonucleotide against a nucleic acid
databank for nearly exact matches, you can :
set "Word size" to 7
set E() value to 1000
If you want to search a nucleic acid sequence against a nucleic acid
databank just to check whether it is already in the databank, you can
(for increased speed) :
set "Word size" to a higher value (e.g. 28)
If the query sequence contains a region that gives a "hit" with a great number of databank sequences and a second region that gives a weaker "hit", this second region can be overlooked because it appears a long way down in the list of "hits" or is even not reported at all because of the built-in limit of by default 500 reported "hits". You can mend this with the -keep=n option, which limits the number of reported "hits" against the same region of the query sequence.
| scoring matrix | gap penalty | gap length penalty | recommended |
|---|---|---|---|
| BLOSUM90 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | * | |
| 11 | 1 | ||
| BLOSUM80 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | * | |
| 11 | 1 | ||
| 13 | 2 | ||
| 25 | 2 | ||
| BLOSUM62 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | ||
| 2 | |||
| 11 | 1 | * (the default) | |
| 2 | |||
| 12 | 1 | ||
| 13 | 1 | ||
| BLOSUM50 | 9 | 3 | |
| 10 | 3 | ||
| 11 | 3 | ||
| 12 | 2 | ||
| 3 | |||
| 13 | 2 | * | |
| 3 | |||
| 14 | 2 | ||
| 15 | 1 | ||
| 2 | |||
| 16 | 1 | ||
| 2 | |||
| 17 | 1 | ||
| 18 | 1 | ||
| 1 | |||
| BLOSUM45 | 10 | 3 | |
| 11 | 3 | ||
| 12 | 2 | ||
| 3 | |||
| 13 | 2 | ||
| 3 | |||
| 14 | 2 | * | |
| 15 | 2 | ||
| 16 | 1 | ||
| 2 | |||
| 17 | 1 | ||
| 18 | 1 | ||
| 19 | 1 | ||
| PAM30 | 5 | 2 | |
| 6 | 2 | ||
| 7 | 2 | ||
| 8 | 1 | ||
| 9 | 1 | * | |
| 10 | 1 | ||
| PAM70 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 10 | 1 | * | |
| 11 | 1 | ||
| PAM250 | 11 | 3 | |
| 12 | 3 | ||
| 13 | 2 | ||
| 3 | |||
| 14 | 2 | ||
| 3 | |||
| 15 | 2 | * | |
| 3 | |||
| 16 | 2 | ||
| 17 | 1 | ||
| 2 | |||
| 18 | 1 | ||
| 19 | 1 | ||
| 20 | 1 | ||
| 21 | 1 |
Similarly, blastn supports only certain combinations of match reward, mismatch penalty and gap penalty. If both gap penalty and gap length penalty are above the maximum in following table blastn will shift to statistics for gapless alignments.
| match reward/mismatch penalty | gap penalty | gap length penalty |
|---|---|---|
| 2 / -7 | 0 | 4 |
| 2 | 2 | |
| 4 | ||
| 4 | 2 | |
| 4 | ||
| 1 / -3 | 0 | 2 |
| 1 | 1 | |
| 2 | ||
| 2 | 1 | |
| 2 | ||
| 2 / -5 | 0 | 4 |
| 2 | 2 | |
| 4 | ||
| 4 | 2 | |
| 4 | ||
| 1 / -2 | 0 | 2 |
| 1 | 1 | |
| 2 | ||
| 2 | 1 | |
| 2 | ||
| 3 | 1 | |
| 2 / -3 | 0 | 4 |
| 2 | 2 | |
| 4 | ||
| 3 | 3 | |
| 4 | 2 | |
| 4 | ||
| 5 | 2 | |
| 6 | 2 | |
| 4 | ||
| 4 / -5 | 3 | 5 |
| 4 | 5 | |
| 5 | 5 | |
| 6 | 5 | |
| 12 | 8 | |
| 1 / -1 | 0 | 2 |
| 1 | 2 | |
| 2 | 1 | |
| 2 | ||
| 3 | 1 | |
| 2 | ||
| 4 | 1 | |
| 2 | ||
| 5 / -4 | 8 | 6 |
| 10 | 6 | |
| 25 | 10 |
If you want to use a protein "user provided BLAST databank" and you have in the same directory a nucleic acid databank with same basename you should not use the basename but one of the file names to point to the databank, because otherwise the program would assume you point to the nucleic acid databank and give an error message.
<databank> cannot be accessed or is not BLAST format databank
<databank> is not a nucleic acid databank !
<databank> is not a protein databank !
| Program name | Description |
|---|---|
| ebi_blast | WU-BLAST search of query sequence against sequence databank using EBI Web Services |
| ebi_fasta | fastA search of query sequence against sequence databank using EBI Web Services |
| fasta | fastA search of query sequence(s) against sequence search set |
| fasts | Protein identification from peptides using fastA algorithm |
| phiblast | Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match |
| psiblast | Iterative BLAST search with generation of profile of protein sequence against protein sequence set |
| blast2seq | Finds local alignments between two sequences, using BLAST |
| makeblastdb | Make BLAST format sequence database |
The program blastall itself was written by a team of developers working at the National Center for Biotechnology Information, Bethesda MD, U.S.A., comprising among others Stephen Altschul, David Lipman, Tom Madden, Alex Schaffer, Sergei Shavirin and Jinghui Zhang.
You can contact the BLAST development team at blast-help@ncbi.nlm.nih.gov