|
|
psiblast |
The blastpgp program can do an iterative search in which sequences found in one round of searching are used to build a score model for the next round of searching. In this usage, the program is called Position-Specific Iterated BLAST, or PSI-BLAST. The BLAST algorithm is not tied to a specific score matrix. Traditionally, it has been implemented using an AxA substitution matrix where A is the alphabet size. PSI-BLAST instead uses a QxA matrix, where Q is the length of the query sequence; at each position the cost of a letter depends on the position w.r.t. the query and the letter in the subject sequence.
The position-specific matrix for round i+1 is built from a constrained multiple alignment among the query and the sequences found with sufficiently low e-value in round i. The top part of the output for each round distinguishes the sequences into sequences found previously and used in the score model and sequences not used in the score model. PSI-BLAST "converges" and stops if all sequences found at round i+1 below the e-value threshold were already in the model at the beginning of the round. To prevent the program from running indefinitively, the psiblast "wrapper" has a built-in limit of by default 10 rounds, but you can change this.
The -makecheckout -checkout=xxx and -checkin=xxx options provide a "checkpointing" facility whereby a score model can be stored and later reused. When using -checkin, it is required that the query sequence specified on the command line match exactly the query sequence in the restart file. You can however search a different databank.
PSI-TBLASTN searches a protein query sequence against a nucleotide sequence database using a position specific matrix created by PSI-BLAST. The nucleotide sequence database is dynamically translated in all reading frames during PSI-TBLASTN search. Using a position specific matrix may enable finding more distantly related sequences. You can access this facility by running the "wrapper" program psiblast with options -psitblastn and -checkin=xxx.
| (1) PSI-BLAST takes as an input a single protein sequence and compares it to a protein database, using the gapped BLAST program. |
| (2) The program constructs a multiple alignment, and then a profile, from any significant local alignments found. The original query sequence serves as a template for the multiple alignment and profile, whose lengths are identical to that of the query. Different numbers of sequences can be aligned in different template positions. |
| (3) The profile is compared to the protein database, again seeking local alignments. After a few minor modifications, the BLAST algorithm can be used for this directly. |
| (4) PSI-BLAST estimates the statistical significance of the local alignments found. Because profile substitution scores are constructed to a fixed scale, and gap scores remain independent of position, the statistical theory and parameters for gapped BLAST alignments remain applicable to profile alignments. |
| (5) Finally, PSI-BLAST iterates, by returning to step (2), an arbitrary number of times or until convergence. |



> psiblast
Iterative BLAST search with generation of profile of protein sequence
against protein sequence set
Query sequence: sw:papa1_carpa
1 : standard set
2 : user defined set
3 : user provided BLAST databank
Select search set type [1]:
sw : SwissProt (highly annotated protein databank)
up : UniProt (SwissProt + TrEMBL, EMBL ORF translations)
uniref100 : UniRef100 (UniProt nonredundant subset)
uniref90 : UniRef90 (UniRef100 subset with no more than 90% identity)
uniref50 : UniRef50 (UniRef100 subset with no more than 50% identity)
remt : REM-TrEMBL (old EMBL ORF translations not incl. in UniProt)
pir : PIR (old general protein databank)
gp : GenPept (GenBank ORF translations)
refseqp : RefSeq (NCBI reference protein sequences)
pdb : PDB (proteins with known 3D structure)
gpcrdb : G protein coupled receptors
Standard protein search set [up]: sw
Word size [3]:
E() value cutoff for display [10.0]:
Maximum number of rounds [10]:
E() value threshold for inclusion in model [0.002]:
Output file [papa1_carpa.psiblast]:
PSI-BLAST converged after 4 rounds
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
[-seq] sequence Query sequence
-dbtype menu [1] Search set type : public databank or
databank provided by user (Values: 1
(standard set); 2 (user defined set); 3
(user provided BLAST databank))
* -nucdb menu [emblnontags] Standard nucleic acid search
set (Values: em (EMBL (general nucleic acid
databank)); emblnontags (EMBL without EST
and GSS); hum (EMBL humans); mus (EMBL
mice); rod (EMBL other rodents); mam (EMBL
other mammals); vrt (EMBL other
vertebrates); inv (EMBL invertebrates); pln
(EMBL plants); fun (EMBL fungi); pro (EMBL
bacteria); phg (EMBL bacteriophages); vrl
(EMBL other viruses); est (EMBL Expressed
Sequence Tags); gss (EMBL Genome Survey
Sequences); sts (EMBL Sequence Tagged
Sites); htg (EMBL High Throughput Genomic);
htc (EMBL High Throughput cDNA); env (EMBL
environmental samples); pat (EMBL patents);
tgn (EMBL transgenic); syn (EMBL synthetic);
unc (EMBL unclassified); new (EMBL updates
since last release); wgs (EMBL Whole Genome
Shotgun); refseq (RefSeq (NCBI reference
sequences)); refseqwgs (RefSeq Whole Genome
Shotgun); refseqgen (RefSeq other genomic);
refseqrna (RefSeq transcripts); vec
(Intelligenetics vector databank); emvec
(EMBL vector subset); epd (Eukaryotic
Promoter Database); ligm (ImMunoGeneTics
databank Igg. + TcR genes); hla
(ImMunoGeneTics databank human MHC genes);
pdbn (PDB (nucleic acids with known 3D
structure)))
* -protdb menu [up] Standard protein search set (Values: sw
(SwissProt (highly annotated protein
databank)); up (UniProt (SwissProt + TrEMBL,
EMBL ORF translations)); uniref100
(UniRef100 (UniProt nonredundant subset));
uniref90 (UniRef90 (UniRef100 subset with no
more than 90% identity)); uniref50
(UniRef50 (UniRef100 subset with no more
than 50% identity)); remt (REM-TrEMBL (old
EMBL ORF translations not incl. in
UniProt)); pir (PIR (old general protein
databank)); gp (GenPept (GenBank ORF
translations)); refseqp (RefSeq (NCBI
reference protein sequences)); pdb (PDB
(proteins with known 3D structure)); gpcrdb
(G protein coupled receptors))
* -userdb seqall User defined search set
* -userblastdb infile User provided BLAST format databank (you can
make one using makeblastdb)
-wordsize integer [3] Word size (Integer from 2 to 3)
-expect float [10.0] E() value = number of databank
sequences with same or higher bit score that
you expect to find by chance. BLAST lists
sequences with an E() value lower than the
cutoff. (Number 0.000 or more)
* -rounds integer [10] Maximum number of rounds (Integer 1 or
more)
* -include float [0.002] PSI-BLAST uses sequences with an E()
value lower than this threshold for
generating a profile. (Number 0.000 or more)
[-outfile] outfile [*.psiblast] Output file name
* -asciiout outfile [*.psiblast] Name output file with profile
in ASCII format
* -checkout outfile [*.psiblast] Name output file with profile
in BLAST-readable format
Additional (Optional) qualifiers (* if not always prompted):
-matrix selection [3] Amino acid comparison matrix
-gappenalty integer [11] Gap penalty (Integer 0 or more)
-gaplength integer [1] Gap length penalty. BLAST subtracts from
the similarity score for each gap a penalty
of type <Gap penalty> + <Gap length
penalty> * n. Only certain combinations of
matrix and gap penalty are allowed, see
on-line manual. (Integer 0 or more)
* -dbgencode menu [1] Genetic code for translating databank
sequences (Values: 1 (Standard); 2
(Vertebrate Mitochondrial); 3 (Yeast
Mitochondrial); 4 (Mold, Protozoan,
Coelenterate Mitochondrial and
Mycoplasma/Spiroplasma); 5 (Invertebrate
Mitochondrial); 6 (Ciliate, Dasycladacean
and Hexamita); 9 (Echinoderm Mitochondrial);
10 (Euplotid); 11 (Bacterial); 12
(Alternative Yeast); 13 (Ascidian
Mitochondrial); 14 (Flatworm Mitochondrial);
15 (Blepharisma); 16 (Chlorophycean
Mitochondrial); 21 (Trematode
Mitochondrial); 22 (Scenedesmus obliquus
mitochondrial); 23 (Thraustochytrium
mitochondrial))
* -compstats menu [2] The E() value can be computed more
accurately if the composition of the
sequences being compared is taken into
account. By default the scoring scheme is
adjusted as explained in Bioinformatics
21:902 when an empiric test suggests this to
be beneficial, otherwise the scoring scheme
is rescaled as explained in NAR 29:2994.
You can force psiblast to always use any of
both methods or none. (Values: 0 (none); 1
(scale); 2 (adjust conditionally, otherwise
scale); 3 (adjust))
-format menu [0] Alignment format (Values: 0 (pairwise);
1 (query-anchored, showing identities); 2
(query-anchored, no identities); 3 (flat
query-anchored, show identities); 4 (flat
query-anchored, no identities); 5
(query-anchored, no identities and blunt
ends); 6 (flat query-anchored, no identities
and blunt ends); 7 (XML Blast output); 8
(tabular); 9 (tabular, with comment lines);
10 (ASN.1))
Advanced (Unprompted) qualifiers:
-psitblastn toggle Run instead search of protein profile
against translated nuc. db
-checkin infile Restart search with profile from previous
search. If you do this your query sequence
must be the same.
-seqfilter boolean Filter low complexity segments out of query
sequence
-seqcoilfilter boolean Filter coiled coils out of query sequence
-seqsoftfilter boolean Use soft filtering, that is, filter only at
initial hit searching, not at hit extension
-[no]doublehit boolean [Y] Try to extend hit only if there is a
second hit (is default)
-window integer [40] Multiple hits window size (Integer 0 or
more)
-effdbsize float [0.000] Effective databank size for
statistical calculations (Number 0.000 or
more)
-keep integer [0] Keep only n best hits from same region.
Default is to show them all. If you use this
option, a value of 100 is recommended.
(Integer 0 or more)
-listsize integer [500] Show only the n best scoring sequences
that satisfy E() cutoff (Integer 0 or more)
-align integer [250] Show only alignments for the n first
sequences (Integer 0 or more, but not >
listsize)
-fulloutput boolean Show output of all rounds
-makeasciiout toggle Make output file with profile in ASCII
format (can only be done if number of rounds
is 2 or more)
-makecheckout toggle Make output file with profile in
BLAST-readable format for restarting search
(can only be done if number of rounds is 2
or more)
Associated qualifiers:
"-seq" associated qualifiers
-sbegin1 integer Start of the sequence to be used
-send1 integer End of the sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-userdb" associated qualifiers
-sbegin integer Start of each sequence to be used
-send integer End of each sequence to be used
-sreverse boolean Reverse (if DNA)
-sask boolean Ask for begin/end/reverse
-snucleotide boolean Sequence is nucleotide
-sprotein boolean Sequence is protein
-slower boolean Make lower case
-supper boolean Make upper case
-sformat string Input sequence format
-sdbname string Database name
-sid string Entryname
-ufo string UFO features
-fformat string Features format
-fopenfile string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
"-asciiout" associated qualifiers
-odirectory string Output directory
"-checkout" associated qualifiers
-odirectory string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [-seq] (Parameter 1) |
Query sequence | Readable sequence | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbtype | Search set type : public databank or databank provided by user |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -nucdb | Standard nucleic acid search set |
|
emblnontags | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -protdb | Standard protein search set |
|
up | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userdb | User defined search set | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -userblastdb | User provided BLAST format databank (you can make one using makeblastdb) | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -wordsize | Word size | Integer from 2 to 3 | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -expect | E() value = number of databank sequences with same or higher bit score that you expect to find by chance. BLAST lists sequences with an E() value lower than the cutoff. | Number 0.000 or more | 10.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -rounds | Maximum number of rounds | Integer 1 or more | 10 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -include | PSI-BLAST uses sequences with an E() value lower than this threshold for generating a profile. | Number 0.000 or more | 0.002 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.<psiblast or tpsiblastn> | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -asciiout | Name output file with profile in ASCII format | Output file | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -checkout | Name output file with profile in BLAST-readable format | Output file | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -matrix | Amino acid comparison matrix | Choose from selection list of values | BLOSUM62 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gappenalty | Gap penalty | Integer 0 or more | 11 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -gaplength | Gap length penalty. BLAST subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n. Only certain combinations of matrix and gap penalty are allowed, see on-line manual. | Integer 0 or more | 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -dbgencode | Genetic code for translating databank sequences |
|
1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -compstats | The E() value can be computed more accurately if the composition of the sequences being compared is taken into account. By default the scoring scheme is adjusted as explained in Bioinformatics 21:902 when an empiric test suggests this to be beneficial, otherwise the scoring scheme is rescaled as explained in NAR 29:2994. You can force psiblast to always use any of both methods or none. |
|
2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -format | Alignment format |
|
0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -psitblastn | Run instead search of protein profile against translated nuc. db | Toggle value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -checkin | Restart search with profile from previous search. If you do this your query sequence must be the same. | Input file | Required | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqfilter | Filter low complexity segments out of query sequence | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqcoilfilter | Filter coiled coils out of query sequence | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -seqsoftfilter | Use soft filtering, that is, filter only at initial hit searching, not at hit extension | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -[no]doublehit | Try to extend hit only if there is a second hit (is default) | Boolean value Yes/No | Yes | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -window | Multiple hits window size | Integer 0 or more | 40 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -effdbsize | Effective databank size for statistical calculations | Number 0.000 or more | 0.000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -keep | Keep only n best hits from same region. Default is to show them all. If you use this option, a value of 100 is recommended. | Integer 0 or more | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -listsize | Show only the n best scoring sequences that satisfy E() cutoff | Integer 0 or more | 500 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -align | Show only alignments for the n first sequences | Integer 0 or more, but not > listsize | 250 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -fulloutput | Show output of all rounds | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -makeasciiout | Make output file with profile in ASCII format (can only be done if number of rounds is 2 or more) | Toggle value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| -makecheckout | Make output file with profile in BLAST-readable format for restarting search (can only be done if number of rounds is 2 or more) | Toggle value Yes/No | No | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BLASTP 2.2.18 [Mar-02-2008]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Reference for compositional score matrix adjustment: Altschul, Stephen F.,
John C. Wootton, E. Michael Gertz, Richa Agarwala, Aleksandr Morgulis,
Alejandro A. Schaffer, and Yi-Kuo Yu (2005) "Protein database searches
using compositionally adjusted substitution matrices", FEBS J. 272:5101-5109.
Reference for composition-based statistics starting in round 2:
Schaffer, Alejandro A., L. Aravind, Thomas L. Madden,
Sergei Shavirin, John L. Spouge, Yuri I. Wolf,
Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005.
Query= PAPA1_CARPA P00784 Papain precursor (EC 3.4.22.2) (Papaya
proteinase I) (PPI) (Allergen Car p 1).
(345 letters)
Database: SwissProt (manually annotated part of UniProt, including
splice variants)
387,470 sequences; 145,667,171 total letters
Results from round 4
Score E
Sequences producing significant alignments: (bits) Value
Sequences used in model and found again:
sw:PAPA1_CARPA Papain precursor (EC 3.4.22.2) (Papaya proteinase... 476 e-133
sw:XCP1_ARATH Xylem cysteine proteinase 1 precursor (EC 3.4.22.-... 455 e-127
sw:RD21A_ARATH Cysteine proteinase RD21a precursor (EC 3.4.22.-)... 454 e-127
sw:PAPA3_CARPA Caricain precursor (EC 3.4.22.30) (Papaya protein... 450 e-126
sw:PAPA2_CARPA Chymopapain precursor (EC 3.4.22.6) (Papaya prote... 445 e-124
[Part of this file has been deleted for brevity]
sw:CYSP1_CARCN Cysteine proteinase 1 (EC 3.4.22.-) (Cysteine pro... 78 7e-14
sw:THPA_THADA Thaumatopain (EC 3.4.22.-) (Fragment). 53 3e-06
sw:MDO2_PSEMR Macrodontain-2 (EC 3.4.22.-) (Macrodontain II) (Fr... 50 2e-05
sw:PHIG1_SARGI Philibertain g 1 (EC 3.4.22.-) (Philibertain g I)... 47 1e-04
sw:CATB_COTJA Cathepsin B (EC 3.4.22.1) (Cathepsin B1) (Fragments). 44 0.002
Sequences not found previously or not previously below threshold:
sw:PEPG_LACDL Aminopeptidase G (EC 3.4.22.-). 43 0.003
sw:PEPW_LACDL Aminopeptidase W (EC 3.4.22.-). 41 0.011
sw:PEPC_LISMO Aminopeptidase C (EC 3.4.22.40) (Bleomycin hydrola... 40 0.019
sw:PEPC_LISIN Aminopeptidase C (EC 3.4.22.40) (Bleomycin hydrola... 40 0.019
sw:PEPE_LACHE Aminopeptidase E (EC 3.4.22.-). 39 0.055
[Part of this file has been deleted for brevity]
sw:YOW4_SCHPO Uncharacterized RNA-binding protein C1861.04c. 32 6.8
sw:BLMH_RAT Bleomycin hydrolase (EC 3.4.22.40) (BLM hydrolase) (... 32 7.0
sw:K2C8_POTTR Keratin, type II cytoskeletal 8 (Cytokeratin-8) (C... 32 7.1
sw:BLMH_MOUSE Bleomycin hydrolase (EC 3.4.22.40) (BLM hydrolase)... 32 7.2
sw:MYBD_DICDI Myb-like protein D. 31 8.4
CONVERGED!
>sw:PAPA1_CARPA Papain precursor (EC 3.4.22.2) (Papaya proteinase I)
(PPI) (Allergen Car p 1).
Length = 345
Score = 476 bits (1225), Expect = e-133, Method: Composition-based stats.
Identities = 345/345 (100%), Positives = 345/345 (100%)
Query: 1 MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK 60
MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK
Sbjct: 1 MAMIPSISKLLFVAICLFVYMGLSFGDFSIVGYSQNDLTSTERLIQLFESWMLKHNKIYK 60
Query: 61 NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE 120
NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE
Sbjct: 61 NIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYTTTE 120
Query: 121 LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE 180
LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE
Sbjct: 121 LSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNE 180
Query: 181 YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT 240
YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT
Sbjct: 181 YSEQELLDCDRRSYGCNGGYPWSALQLVAQYGIHYRNTYPYEGVQRYCRSREKGPYAAKT 240
Query: 241 DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG 300
DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG
Sbjct: 241 DGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVAAVGYG 300
Query: 301 PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN
Sbjct: 301 PNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
>sw:XCP1_ARATH Xylem cysteine proteinase 1 precursor (EC 3.4.22.-)
(AtXCP1).
Length = 355
Score = 455 bits (1171), Expect = e-127, Method: Composition-based stats.
Identities = 164/354 (46%), Positives = 233/354 (65%), Gaps = 9/354 (2%)
Query: 1 MAMI-PSISKL-LFVAICLFVYMGLSFG-DFSIVGYSQNDLTSTERLIQLFESWMLKHNK 57
MA PS+SK L VAI + +F DFSIVGY+ LT+T++L++LFESWM +H+K
Sbjct: 1 MAFSAPSLSKFSLLVAISASALLCCAFARDFSIVGYTPEHLTNTDKLLELFESWMSEHSK 60
Query: 58 IYKNIDEKIYRFEIFKDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKEKYTGSIAGNYT 117
YK+++EK++RFE+F++NL +ID+ N + NSYWLGLN FAD++++EFK +Y G ++
Sbjct: 61 AYKSVEEKVHRFEVFRENLMHIDQRNNEINSYWLGLNEFADLTHEEFKGRYLGLAKPQFS 120
Query: 118 TTELSYEEVLNDGDVNIPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGN 177
++P+ VDWR+KGAV PVK+QG CGSCWAFS V +EGI +I TGN
Sbjct: 121 RKRQPSANFRYRDITDLPKSVDWRKKGAVAPVKDQGQCGSCWAFSTVAAVEGINQITTGN 180
Query: 178 LNEYSEQELLDCDRR-SYGCNGGYPWSALQLVAQ-YGIHYRNTYPYEGVQRYCRSREKGP 235
L+ SEQEL+DCD + GCNGG A Q + G+H + YPY + C+ +++
Sbjct: 181 LSSLSEQELIDCDTTFNSGCNGGLMDYAFQYIISTGGLHKEDDYPYLMEEGICQEQKEDV 240
Query: 236 YAAKTDGVRQVQPYNEGALLYSIANQPVSVVLEAAGKDFQLYRGGIFVGPCGNKVDHAVA 295
G V ++ +L+ ++A+QPVSV +EA+G+DFQ Y+GG+F G CG +DH VA
Sbjct: 241 ERVTISGYEDVPENDDESLVKALAHQPVSVAIEASGRDFQFYKGGVFNGKCGTDLDHGVA 300
Query: 296 AVGY----GPNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN 345
AVGY G +Y+++KNSWG WGE G+IR+KR TG G+CG+ + YP K
Sbjct: 301 AVGYGSSKGSDYVIVKNSWGPRWGEKGFIRMKRNTGKPEGLCGINKMASYPTKT 354
[Part of this file has been deleted for brevity]
>sw:CATB_COTJA Cathepsin B (EC 3.4.22.1) (Cathepsin B1) (Fragments).
Length = 48
Score = 43.6 bits (102), Expect = 0.002, Method: Composition-based stats.
Identities = 14/70 (20%), Positives = 23/70 (32%), Gaps = 28/70 (40%)
Query: 134 IPEYVD----WRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNEYSEQELLDC 189
+P+ D W ++ +++QGS E S ++LL C
Sbjct: 1 LPDTFDSRKQWPNCPTISEIRDQGS----------------------VSVEVSAEDLLSC 38
Query: 190 D--RRSYGCN 197
GCN
Sbjct: 39 CGFECGMGCN 48
>sw:PEPG_LACDL Aminopeptidase G (EC 3.4.22.-).
Length = 437
Score = 42.8 bits (100), Expect = 0.003, Method: Composition-based stats.
Identities = 12/42 (28%), Positives = 19/42 (45%), Gaps = 5/42 (11%)
Query: 289 KVDHAVAAVGYGPN-----YILIKNSWGTGWGENGYIRIKRG 325
+V HA+ VG + ++NSWG GE G+ +
Sbjct: 358 EVSHAMTLVGVDEDKGDIRQWKVENSWGDKSGEKGFFVMSHN 399
[Part of this file has been deleted for brevity]
>sw:MYBD_DICDI Myb-like protein D.
Length = 595
Score = 31.3 bits (70), Expect = 8.4, Method: Composition-based stats.
Identities = 8/44 (18%), Positives = 19/44 (43%), Gaps = 2/44 (4%)
Query: 65 KIYRFEIF--KDNLKYIDETNKKNNSYWLGLNVFADMSNDEFKE 106
K + I+ +++ K NK S+ + F D + ++ +
Sbjct: 436 KHRKNAIWTQEEDEKMAQLYNKYGKSWKAIHSHFDDKTREQVQS 479
Database: SwissProt (manually annotated part of UniProt, including
splice variants)
Posted date: Apr 23, 2008 4:36 PM
Number of letters in database: 130,497,792
Number of sequences in database: 362,782
Database: /dbfb/uniprot/uniprot_sprot_varsplic
Posted date: Apr 23, 2008 4:36 PM
Number of letters in database: 15,169,379
Number of sequences in database: 24,688
Lambda K H
0.313 0.165 0.533
Lambda K H
0.267 0.0505 0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 502,717,500
Number of Sequences: 387470
Number of extensions: 26812597
Number of successful extensions: 64988
Number of sequences better than 10.0: 246
Number of HSP's better than 10.0 without gapping: 889
Number of HSP's successfully gapped in prelim test: 72
Number of HSP's that attempted gapping in prelim test: 59305
Number of HSP's gapped (non-prelim): 1151
length of query: 345
length of database: 145,667,171
effective HSP length: 117
effective length of query: 228
effective length of database: 100,333,181
effective search space: 22875965268
effective search space used: 22875965268
T: 11
A: 40
X1: 16 ( 7.2 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (21.1 bits)
S2: 70 (31.3 bits)
|
| scoring matrix | gap penalty | gap length penalty | recommended |
|---|---|---|---|
| BLOSUM90 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | * | |
| 11 | 1 | ||
| BLOSUM80 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | * | |
| 11 | 1 | ||
| 13 | 2 | ||
| 25 | 2 | ||
| BLOSUM62 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 2 | |||
| 10 | 1 | ||
| 2 | |||
| 11 | 1 | * (the default) | |
| 2 | |||
| 12 | 1 | ||
| 13 | 1 | ||
| BLOSUM50 | 9 | 3 | |
| 10 | 3 | ||
| 11 | 3 | ||
| 12 | 2 | ||
| 3 | |||
| 13 | 2 | * | |
| 3 | |||
| 14 | 2 | ||
| 15 | 1 | ||
| 2 | |||
| 16 | 1 | ||
| 2 | |||
| 17 | 1 | ||
| 18 | 1 | ||
| 1 | |||
| BLOSUM45 | 10 | 3 | |
| 11 | 3 | ||
| 12 | 2 | ||
| 3 | |||
| 13 | 2 | ||
| 3 | |||
| 14 | 2 | * | |
| 15 | 2 | ||
| 16 | 1 | ||
| 2 | |||
| 17 | 1 | ||
| 18 | 1 | ||
| 19 | 1 | ||
| PAM30 | 5 | 2 | |
| 6 | 2 | ||
| 7 | 2 | ||
| 8 | 1 | ||
| 9 | 1 | * | |
| 10 | 1 | ||
| PAM70 | 6 | 2 | |
| 7 | 2 | ||
| 8 | 2 | ||
| 9 | 1 | ||
| 10 | 1 | * | |
| 11 | 1 | ||
| PAM250 | 11 | 3 | |
| 12 | 3 | ||
| 13 | 2 | ||
| 3 | |||
| 14 | 2 | ||
| 3 | |||
| 15 | 2 | * | |
| 3 | |||
| 16 | 2 | ||
| 17 | 1 | ||
| 2 | |||
| 18 | 1 | ||
| 19 | 1 | ||
| 20 | 1 | ||
| 21 | 1 |
If you want to use a protein "user provided BLAST databank" and you have in the same directory a nucleic acid databank with same basename you should not use the basename but one of the file names to point to the databank, because otherwise the program would assume you point to the nucleic acid databank and give an error message.
If you want to use the "checkpointing" facility by giving a "check" file produced by phiblast.html or by a previous run of psiblast as input to psiblast, make sure that you have the "filter" settings the same as when you made the "check" file, otherwise the sequence that is actually used might not match.
PSI-BLAST converged after <number> roundsor
PSI-BLAST did not converge after <number> rounds
There are error messages specific to this program, which are issued when the user chooses to provide a databank in BLAST format of his own, but there are problems :
<databank> cannot be accessed or is not BLAST format databank
<databank> is not a nucleic acid databank !
<databank> is not a protein databank !
| Program name | Description |
|---|---|
| blast | BLAST search of query sequence(s) against sequence search set |
| ebi_blast | WU-BLAST search of query sequence against sequence databank using EBI Web Services |
| ebi_fasta | fastA search of query sequence against sequence databank using EBI Web Services |
| fasta | fastA search of query sequence(s) against sequence search set |
| fasts | Protein identification from peptides using fastA algorithm |
| iprscan | Scans proteins or nucleic acids for conserved motifs using Interpro tools |
| pfmake | Creates PROSITE style profile from protein multiple alignment |
| phiblast | Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match |
| profit | Scan a sequence or database with a matrix or profile |
| prophecy | Creates matrices/profiles from multiple alignments |
| prophet | Gapped alignment for profiles |
| ps_scan | Scans proteins for conserved motifs using PROSITE (patterns and profiles) |
| blast2seq | Finds local alignments between two sequences, using BLAST |
| makeblastdb | Make BLAST format sequence database |
The program blastpgp itself was written by by a team of developers working at the National Center for Biotechnology Information, Bethesda MD, U.S.A., comprising among others Stephen Altschul, David Lipman, Tom Madden, Alex Schaffer, Sergei Shavirin and Jinghui Zhang.
You can contact the BLAST development team at blast-help@ncbi.nlm.nih.gov