|
|
bscan |
The rationale behind searching a database of blocks is that information from multiply aligned sequences is present in a concentrated form, reducing background and increasing sensitivity to distant relationships. This information is represented in a position-specific scoring table or "profile", in which each column of the alignment is converted to a column of a table representing the frequency of occurrence of each of the 20 amino acids. For searching a database of blocks, the first position of the sequence is aligned with the first position of the first block, and a score for that amino acid is obtained from the profile column corresponding to that position. Scores are summed over the width of the alignment, and then the block is aligned with the next position. This procedure is carried out exhaustively for all positions of the sequence for all blocks in the database, and the best alignments between a sequence and entries in the BLOCKS database are noted. If a particular block scores highly, it is possible that the sequence is related to the group of sequences the block represents. Typically, a group of proteins has more than one region in common and their relationship is represented as a series of blocks separated by unaligned regions. If a second block for a group also scores highly in the search, the evidence that the sequence is related to the group is strengthened, and is further strengthened if a third block also scores it highly, and so on.
> bscan Scans proteins or nucleic acids for conserved motifs using Blocks Input sequence(s): sw:tpa_human Output file [tpa_human.bscan]: |
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers:
[-seqs] seqall Sequence(s) filename and optional format, or
reference (input USA)
[-outfile] outfile [*.bscan] Output file name
Additional (Optional) qualifiers (* if not always prompted):
-blocks infile [/opt/sw/blocks/blocks.dat] Blocks file.
Default is the Blocks databank, but you can
choose a personal file with protein motifs
in blocks format instead.
-expect float [1.0] Combined E() value = number of
multiple block hits that you expect (Number
from 0.000 to 100.000)
* -gencode menu [0] Genetic code for translating sequences
(Values: 0 (Standard); 1 (Vertebrate
Mitochondrial); 2 (Yeast Mitochondrial); 3
(Mold Mitochondrial and Mycoplasma); 4
(Invertebrate Mitochondrial); 5 (Ciliate
Nuclear); 6 (Echinoderm Mitochondrial); 7
(Euplotid Nuclear); 8 (Bacterial and Plant
Plastid); 9 (Alternative Yeast Nuclear); 10
(Ascidian Mitochondrial); 11 (Flatworm
Mitochondrial); 12 (Blepharisma
Macronuclear); 13 (Chlorophycean
Mitochondrial); 14 (Trematode
Mitochondrial); 15 (Scenedesmus obliquus
mitochondrial); 16 (Thraustochytrium
mitochondrial code))
-format menu [1] Output format (Values: 1 (Standard
(summary and alignments)); 2 (Summary only);
3 (GFF))
Advanced (Unprompted) qualifiers:
-raw boolean Store raw data of single block hits
-histogram boolean Show histogram in raw data
Associated qualifiers:
"-seqs" associated qualifiers
-sbegin1 integer Start of each sequence to be used
-send1 integer End of each sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-outfile" associated qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [-seqs] (Parameter 1) |
Sequence(s) filename and optional format, or reference (input USA) | Readable sequence(s) | Required | ||||||||||||||||||||||||||||||||||
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.bscan | ||||||||||||||||||||||||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||
| -blocks | Blocks file. Default is the Blocks databank, but you can choose a personal file with protein motifs in blocks format instead. | Input file | /opt/sw/blocks/blocks.dat | ||||||||||||||||||||||||||||||||||
| -expect | Combined E() value = number of multiple block hits that you expect | Number from 0.000 to 100.000 | 1.0 | ||||||||||||||||||||||||||||||||||
| -gencode | Genetic code for translating sequences |
|
0 | ||||||||||||||||||||||||||||||||||
| -format | Output format |
|
1 | ||||||||||||||||||||||||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||||||||||||||||||||||||
| -raw | Store raw data of single block hits | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||
| -histogram | Show histogram in raw data | Boolean value Yes/No | No | ||||||||||||||||||||||||||||||||||
BLKPROB Version 12/23/06.1
Database=/opt/sw/blocks/blocks.dat
Here are your search results. The database searched was BLOCKS 14.3 (Apr 2007)
consisting of 29,068 blocks representing 5900 nonredundant entries documented
in InterPro 14.0 keyed to Swiss-Prot 51.3 and TrEMBL 34.3.
If you found the Blocks Searcher useful, please cite:
S Henikoff & JG Henikoff, "Protein family classification
based on searching a database of blocks", Genomics 19:97-107 (1994).
==============================================================================
Each numbered result consists of one or more blocks from a InterPro entry found
in the query sequence. One set of the highest-scoring blocks that are in the
correct order and separated by distances comparable to the Blocks database is
selected for analysis. If this set includes multiple blocks the probability
that the lower scoring blocks support the highest scoring block is reported.
Maps of the database blocks and query sequence are shown:
AAA represents the first block roughly in proportion to its width.
: represents the minimum distance between blocks in the database.
. represents the maximum distance between blocks in the database.
< > indicate the sequence has been truncated to fit the page.
The query map is aligned on the highest scoring block. Multiple block hits
that are consistent with the highest scoring block are separated by colons.
Block hits that are not consistent are mapped below. The alignment of the
query sequence with the sequence closest to it in the BLOCKS database is
shown. The distance between detected blocks is listed as (min, max): for the
database entry followed by the distance in the query. Upper case in the query
indicates at least one occurrence of the residue in that column of the block.
For interpretation of block hits, you might find it worthwhile to obtain the
full set of blocks and documentation for an entry. For this you can use the
MRS server of BEN and "Search" in the Blocks database "for" e.g.
"sac:IPB000104".
=============================================================================
Note: For searches using DNA queries, "Location" refers to the position
in the query in base pairs from 5' to 3' on the + strand, whereas the map and
alignment show the translated position in amino acid residues as before.
=============================================================================
Query=TPA_HUMAN P00750 Tissue-type plasminogen activator precursor (EC 3.4.21.68) (tP
Size=562 Amino Acids
Blocks Searched=29068
Alignments Done= 17081713
Cutoff combined expected value for hits= 1
Cutoff block expected value for repeats/other= 1
==============================================================================
Combined
Family Strand Blocks E-value
IPB000177 Apple domain 1 4 of 15 4.7e-32
IPB003014 N/apple PAN 1 3 of 8 1.6e-31
IPB000083 Fibronectin, type I 1 3 of 3 3.4e-26
IPB001314 Chymotrypsin serine protease family 1 3 of 3 4.1e-24
IPB000001 Kringle 1 2 of 2 9.7e-22
IPB001254 Serine protease, trypsin family 1 2 of 2 1.4e-18
IPB002049 Laminin-type EGF-like domain 1 1 of 2 0.019
IPB003966 Prothrombin signature 1 1 of 8 0.099
IPB001438 Type II EGF-like signature 1 1 of 4 0.14
IPB001169 Integrin beta, C-terminal 1 1 of 8 0.73
==============================================================================
>IPB000177 4/15 blocks Combined E-value= 4.7e-32: Apple domain
Block Frame Location (aa) Block E-value
IPB000177K 0 344-376 0.0008
IPB000177L 0 377-415 0.62
IPB000177N 0 499-533 3.4e-12
IPB000177O 0 534-562 4.5e-09
Other reported alignments:
|--- 252 amino acids---|
IPB000177 AAAABBBCCCCDDDDDEEEEFFFGGGHHHHIIIIJJJJ:.KKKLLLLMMM:::NNNOOO
TPA_HUMAN ::::::::::::::::::::::::::::::::::KKKLLLL::::::::NNNOOO
IPB000177K <->K (368,432):343
Q5NTB3|FA11_BOVIN 418 GAIIGNQWILTAAHCFNEVKSPNVLRVYSGILN
| | ||| ||||| | | | | |
TPA_HUMAN 344 GiLISscWILsAAHCFqErfpPhhLtVilGrty
IPB000177L K<->L (-1,0):0
Q6AZS7|Q6AZS7_XENLA456 ILNITKSTPFSELEKIIIHPHYTGAGNGSDIALLKLKTP
| || | | ||||| ||
TPA_HUMAN 377 rvvpgEEeqkFEVEKyIVHkEfdddtydnDIALLqLKsd
IPB000177N L<->N (69,71):83
KLKB1_MOUSE|P26262 564 AGYKEGGTDACKGDSGGPLVCKHSGRWQLVGITSW
| ||| ||||||||| || |||| ||
TPA_HUMAN 499 gGpqanlhDACqGDSGGPLVClnDGRmtLVGIiSW
IPB000177O N<->O (-1,0):0
KLKB1_MOUSE|P26262 599 GEGCGRKDQPGVYTKVSEYMDWILEKTQS
| ||| || ||||||| | |||
TPA_HUMAN 534 GlGCGQKDvPGVYTKVtnYLDWIrdnmrp
------------------------------------------------------------------------------
>IPB003014 3/8 blocks Combined E-value= 1.6e-31: N/apple PAN
Block Frame Location (aa) Block E-value
IPB003014D 0 340-358 2.1e-07
IPB003014G 0 509-519 6.3e-07
IPB003014H 0 529-556 2.6e-13
Other reported alignments:
|--- 359 amino acids---|
IPB003014 AA::::...............B:::::::::..............CD:::::EF:G:HH
TPA_HUMAN ::::::::::::::::::::::::D::::::::::G:HH
IPB003014D <->D (247,721):339
Q5NTB3|FA11_BOVIN 414 HLCGGAIIGNQWILTAAHC
|||| | ||| ||||
TPA_HUMAN 340 fLCGGiLISScWILSAAHC
IPB003014G D<->G (125,139):150
P06868|PLMN_BOVIN 758 CQGDSGGPLVC
|||||||||||
TPA_HUMAN 509 CQGDSGGPLVC
IPB003014H G<->H (8,9):9
P26262|KLKB1_MOUSE 594 GITSWGEGCGRKDQPGVYTKVSEYMDWI
|| ||| ||| || ||||||| | |||
TPA_HUMAN 529 GIISWGLGCGQKDvPGVYTKVtnYLDWI
------------------------------------------------------------------------------
>IPB000083 3/3 blocks Combined E-value= 3.4e-26: Fibronectin, type I
Block Frame Location (aa) Block E-value
IPB000083A 0 41-58 1.4e-07
IPB000083B 0 342-361 1.8e-11
IPB000083C 0 510-519 0.004
Other reported alignments:
|--- 265 amino acids---|
IPB000083 AA:..........................BB:............................C
TPA_HUMAN AA:::::::::::::::::::::::::::BB::::::::::::::C
IPB000083A <->A (3,2300):40
TPA_HUMAN|P00750 41 CRDEKTQMIYQQHQSWLR
||||||||||||||||||
TPA_HUMAN 41 CRDEKTQMIYQQHQSWLR
IPB000083B A<->B (10,286):283
TPA_HUMAN|P00750 342 CGGILISSCWILSAAHCFQE
||||||||||||||||||||
TPA_HUMAN 342 CGGILISSCWILSAAHCFQE
IPB000083C B<->C (8,302):148
FA12_BOVIN|P98140 538 QGDSGGPLVC
||||||||||
TPA_HUMAN 510 QGDSGGPLVC
------------------------------------------------------------------------------
[Part of this file has been deleted for brevity]
------------------------------------------------------------------------------
>IPB001169 1/8 blocks Combined E-value= 0.73: Integrin beta, C-terminal
Block Frame Location (aa) Block E-value
IPB001169F 0 99-120 0.74
Other reported alignments:
|--- 382 amino acids---|
IPB001169 A::::...BBB..CC..DD:...EEEE:::::::......F:......G........H
TPA_HUMAN ::::::F
IPB001169F <->F (381,705):98
ITB1B_XENLA|P12607 480 GNGTFECGACRCNEGRIGKECE
| | | || || ||
TPA_HUMAN 99 qAlyFSdfVCQCpEGFAGKcCE
------------------------------------------------------------------------------
10 possible hits reported
|
Heading
Query = Description line from query sequence
Size = Number of amino acids for protein query or base pairs for DNA query.
Be sure this number is correct before interpreting your results.
Blocks searched = Number of blocks searched with query.
Alignments done = Number of alignments done between query and blocks searched.
This number is used to determine the expected value for each hit.
Cutoff expected value = Maximum combined E-value reported.
This is the number of matches expected to be found merely by chance.
Summary
One line is printed per hit, where a hit consists of blocks belonging
to a protein family represented in the database of blocks searched with
combined E-value less than or equal to the cutoff.
Details
Detailed information is printed for each hit, including alignments with
the most similar sequence in each block.
Note : For searches using DNA queries, "Location" refers to the position in the query in base pairs from 5' to 3' on the forward strand, whereas the map and alignments show the translated position in amino acid residues.
| Program name | Description |
|---|---|
| antigenic | Finds antigenic sites in proteins |
| digest | Protein proteolytic enzyme or reagent cleavage digest |
| emast | Motif detection |
| ememe | Motif detection |
| epestfind | Finds PEST motifs as potential proteolytic cleavage sites |
| fuzzpro | Protein pattern search |
| fuzztran | Protein pattern search after translation |
| genemark | Finds potential genes using a species specific HMM |
| getorf | Finds and extracts open reading frames (ORFs) |
| helixturnhelix | Report nucleic acid binding motifs |
| iprscan | Scans proteins or nucleic acids for conserved motifs using Interpro tools |
| marscan | Finds MAR/SAR sites in nucleic sequences |
| oddcomp | Find protein sequence regions with a biased composition |
| patmatdb | Search a protein sequence with a motif |
| patmatmotifs | Search a PROSITE motif database with a protein sequence |
| pepcoil | Predicts coiled coil regions |
| phiblast | Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match |
| plotorf | Plot potential open reading frames |
| preg | Regular expression search of a protein sequence |
| pscan | Scans proteins for conserved motifs using PRINTS |
| ps_scan | Scans proteins for conserved motifs using PROSITE (patterns and profiles) |
| showorf | Pretty output of DNA translations |
| sigcleave | Reports protein signal cleavage sites |
| sixpack | Display a DNA sequence with 6-frame translation and ORFs |
| syco | Synonymous codon usage Gribskov statistic plot |
| tcode | Fickett TESTCODE statistic to identify protein-coding DNA |
| wobble | Wobble base plot |
| ehmmpfam | Scans sequences using Pfam-A or other HMM database |
The BLIMPS software suite itself was written by Bill Alford (billa@willapabay.org)
For BLIMPS questions, please contact: Jorja Henikoff jorja@fhcrc.org Fred Hutchinson Cancer Research Center FAX: 206-667-5889 1100 Fairview AV N, A1-162, PO Box 19024 Seattle, WA 98109-1024