|
|
blastz |
BLASTZ allows for a greater trade-off between speed and sensitivity than the BLAST from the NCBI. The main changes in the algorithm are :
A C G T
A 91 -114 -31 -123
C -114 100 -125 -31
G -31 -125 100 -114
T -123 -31 -114 91
and a gap penalty which is by default equal to 400 + 30 * n
> blastz -secondscore=2200 -lat=S
Nonintersecting best local alignments, makes LAJ file
Reference sequence: embl:m13792
Test sequence(s): embl:u73107
Output file name [M13792.blastz]: ada.blastz
1 : none
2 : PostScript
3 : PDF
Graphic output format for dotplot and PIP files [3]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers:
[-refseq] sequence Reference sequence
[-testseqs] seqall Test sequence(s)
[-outfile] string [$(refseq.name).blastz] Output file name
(Any string is accepted)
-pip selection [3] Write graphic files with dotplot and
Percentage Identity Plot. These provide an
alternative to viewing the output with LAJ.
Additional (Optional) qualifiers (* if not always prompted):
* -maskrefseq range [(full sequence)] Regions of reference
sequence that must be masked (displayed for
sequence continuity but not aligned). Is not
used if you set option -masklowrefseq.
-chain boolean [N] Report only matching regions that have
same order and orientation in both sequences
-wordtype menu [1] Word type for initial search (Values: 0
(simple matching words); 1 (template
1110100110010101111, 1 transition allowed);
2 (template 1110100110010101111); 3
(template 1110101100110010101111, 1
transition allowed); 4
(1110101100110010101111))
-score integer [3000] (K) Threshold for accepting HSP
(Integer 1 or more)
-gapscore integer [equal to Threshold for accepting HSP] (L)
Threshold for starting HSP extension into
gapped alignment (Any integer value)
-gappenalty integer [400] (O) Gap penalty (Integer 0 or more)
-gaplength integer [30] (E) Gap length penalty. BLASTZ
subtracts from the similarity score for each
gap a penalty of type <Gap penalty> + <Gap
length penalty> * n (Integer 0 or more)
-secondscore integer [0] (H) Threshold for accepting HSP at
second pass. If you fill in a value BLASTZ
will perform a more sensitive second search
(using simple matching words) in regions
between adjacent matches (Integer 0 or more)
* -wordsize integer [8] Word size for initial search with simple
matching words and for second pass search
(Integer 1 or more)
-maskbase integer [0] (M) Mask regions in test sequence that
give at least n matches (Integer 0 or more)
-lat menu [0] Write second output file with alignments
in text format (Values: 0 (none); A (with
ticks relative to alignment); S (with ticks
relative to sequences))
* -awidth integer [50] Alignment width (Integer 1 or more)
Advanced (Unprompted) qualifiers:
-masklowrefseq toggle Use lowercase to mask reference sequence
-masklowtestseqs boolean Use lowercase to mask test sequence(s)
-[no]showsequences boolean [Y] Show sequences in output
-[no]showfeatures boolean [Y] Show exons and repeats in output
-[no]laj boolean [Y] Open output with LAJ
Associated qualifiers:
"-refseq" associated qualifiers
-sbegin1 integer Start of the sequence to be used
-send1 integer End of the sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-testseqs" associated qualifiers
-sbegin2 integer Start of each sequence to be used
-send2 integer End of each sequence to be used
-sreverse2 boolean Reverse (if DNA)
-sask2 boolean Ask for begin/end/reverse
-snucleotide2 boolean Sequence is nucleotide
-sprotein2 boolean Sequence is protein
-slower2 boolean Make lower case
-supper2 boolean Make upper case
-sformat2 string Input sequence format
-sdbname2 string Database name
-sid2 string Entryname
-ufo2 string UFO features
-fformat2 string Features format
-fopenfile2 string Features file name
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [-refseq] (Parameter 1) |
Reference sequence | Readable sequence | Required | ||||||||||
| [-testseqs] (Parameter 2) |
Test sequence(s) | Readable sequence(s) | Required | ||||||||||
| [-outfile] (Parameter 3) |
Output file name | Any string is accepted | <sequence.>.blastz | ||||||||||
| -pip | Write graphic files with dotplot and Percentage Identity Plot. These provide an alternative to viewing the output with LAJ. | none PostScript |
|||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||
| -maskrefseq | Regions of reference sequence that must be masked (displayed for sequence continuity but not aligned). Is not used if you set option -masklowrefseq. | Sequence range | by default is not set | ||||||||||
| -chain | Report only matching regions that have same order and orientation in both sequences | Boolean value Yes/No | No | ||||||||||
| -wordtype | Word type for initial search |
|
1 | ||||||||||
| -score | (K) Threshold for accepting HSP | Integer 1 or more | 3000 | ||||||||||
| -gapscore | (L) Threshold for starting HSP extension into gapped alignment | Any integer value | equal to Threshold for accepting HSP | ||||||||||
| -gappenalty | (O) Gap penalty | Integer 0 or more | 400 | ||||||||||
| -gaplength | (E) Gap length penalty. BLASTZ subtracts from the similarity score for each gap a penalty of type <Gap penalty> + <Gap length penalty> * n | Integer 0 or more | 30 | ||||||||||
| -secondscore | (H) Threshold for accepting HSP at second pass. If you fill in a value BLASTZ will perform a more sensitive second search (using simple matching words) in regions between adjacent matches | Integer 0 or more | 0 | ||||||||||
| -wordsize | Word size for initial search with simple matching words and for second pass search | Integer 1 or more | 8 | ||||||||||
| -maskbase | (M) Mask regions in test sequence that give at least n matches | Integer 0 or more | 0 | ||||||||||
| -lat | Write second output file with alignments in text format |
|
0 | ||||||||||
| -awidth | Alignment width | Integer 1 or more | 50 | ||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||
| -masklowrefseq | Use lowercase to mask reference sequence | Toggle value Yes/No | No | ||||||||||
| -masklowtestseqs | Use lowercase to mask test sequence(s) | Boolean value Yes/No | No | ||||||||||
| -[no]showsequences | Show sequences in output | Boolean value Yes/No | Yes | ||||||||||
| -[no]showfeatures | Show exons and repeats in output | Boolean value Yes/No | Yes | ||||||||||
| -[no]laj | Open output with LAJ | Boolean value Yes/No | Yes | ||||||||||
If the Reference Sequence is in EMBL or GenBank format and contains feature annotation, the "wrapper" blastz will make sure that information about coding sequence and repeat regions is extracted and written in the <output file name>.exons and <output file name>.repeats files, which are used by LAJ to display these features graphically.
It are the features with Feature Key "CDS", "exon", "gene", "mRNA", "prim_transcript" and "repeat_region" that are used. You can find more information about the exact syntax of the feature table in the DDBJ/EMBL/GenBank Feature Table document.
You can in the Reference Sequence as well as in the Test Sequences use a lowercase/uppercase coding. You must put the regions you want to "mask" (see Description) in lowercase. You must then use the optional parameters -masklowrefseq respectively -masklowtestseqs.
The <output file name>.dotplot.<extension> and the <output file name>.pip.<extension> files provide a static picture corresponding to what LAJ displays. They can be useful for documention or publication and also provide an alternative for the users who cannot afford working under X-windows. You can choose between PostScript and Adobe PDF format, or have no picture files created.
The <output file name>.lat file can directly be read by the user. Since it can be really huge, it is only created if you request it explictly.
#:lav
d {
"blastz.v7 ada.blastz.refseq ada.blastz.testseqs T=1 K=3000 L=3000 O=400 E=30 H=2200 W=8
A C G T
91 -114 -31 -123
-114 100 -125 -31
-31 -125 100 -114
-123 -31 -114 91
O = 400, E = 30, K = 3000, L = 3000, M = 0"
}
#:lav
s {
"ada.blastz.refseq" 1 36741 0 1
"ada.blastz.testseqs" 1 29807 0 1
}
h {
">HSADAG M13792.1 Human adenosine deaminase (ADA) gene, complete cds."
">MMU73107 U73107.1 Mus musculus adenosine deaminase (ADA) gene, complete cds."
}
a {
s 7377
b 17 1939
e 898 2638
l 17 1939 25 1947 78
l 26 1951 36 1961 100
l 46 1962 79 1995 74
l 86 1996 96 2006 73
l 123 2007 144 2028 55
l 156 2029 206 2079 61
l 210 2080 217 2087 63
l 220 2088 225 2093 83
l 227 2094 246 2113 60
l 247 2115 251 2119 100
l 270 2120 279 2129 60
l 298 2130 348 2180 61
l 365 2181 374 2190 60
l 375 2192 414 2231 73
l 417 2232 455 2270 51
l 456 2272 479 2295 58
l 482 2296 499 2313 50
l 505 2314 532 2341 71
l 548 2342 583 2377 67
l 587 2378 616 2407 67
l 635 2408 638 2411 100
l 646 2412 681 2447 61
l 689 2448 698 2457 60
l 699 2459 707 2467 56
l 729 2468 760 2499 69
l 761 2508 790 2537 67
l 792 2538 827 2573 58
l 831 2574 856 2599 65
l 859 2600 877 2618 68
l 879 2619 898 2638 80
}
a {
s 7957
b 1612 2789
e 2347 3398
l 1612 2789 1635 2812 63
l 1637 2813 1657 2833 67
[Part of this file has been deleted for brevity]
l 36671 28561 36679 28569 89
l 36684 28570 36713 28599 73
l 36715 28600 36736 28621 77
}
x {
n 0
}
m {
n 0
}
#:eof
|
>M13792 M13792.1 Human adenosine deaminase (ADA) gene, complete cds. GATCTGGGTAAAGGGTTTTCCAGGTGTCAGGATGGAAGTGACTAAGGTGCAGAGGCTGGA GGGCTGGGGCAGGTAGAAGCAAGCATTCCTGTTACCTACTGCTGTGTGACAATCTCCCCC [Part of this file has been deleted for brevity] TCCATATCTGCTGAAAAAAGGTTTAAAATTTTTAAAAAGTTTAAAAGTGTTTTCTAAAAA AGGGACAAGCAGGTCTGGACC |
>U73107 U73107.1 Mus musculus adenosine deaminase (ADA) gene, complete cds. GCCGACTTTAGATGTTCCTAAACTACATTTCCCAGCCCATTCCACCCCTCTCTGTCTCTG TGACCCCTGATCCAGCTCTACCCTACTACAATGACCCCTACTGACTTAAATGTGCTTCTT [Part of this file has been deleted for brevity] CAGCAATGCAGTCTCAAGCCCCAGGGATCTTGGTGCCACTGAAGGTGCATGTCATTCTGC CATTAGGCCTGTCTTTAAGTCTATGTTGAGCTCTGGGTCTGGAATTC |
M13792 # 1 genes; 12 exons; 1 cds; 0 mrna; 1 prim_transcript; # gene ADA 4031..35664 # exon <4031..4063 # exon 19230..19291 # exon 26344..26466 # exon 28908..29051 # exon 29823..29938 # exon 31176..31303 # exon 32425..32496 # exon 32573..32674 # exon 32851..32915 # exon 34354..34483 # exon 35100..35202 # exon 35651..>35664 # cds join(4031..4063,19230..19291,26344..26466,28908..29051,29823..29938,31176..31303,32425..32496,32573..32674,32851..32915,34354..34483,35100..35202,35651..35664) > 4031 35664 ADA + 4031 35664 4031 4063 19230 19291 26344 26466 28908 29051 29823 29938 31176 31303 32425 32496 32573 32674 32851 32915 34354 34483 35100 35202 35651 35664 # stray mRNA # stray CDs |
%:repeats 1362 1672 Right Other 2357 2903 Right Other 4907 5227 Right Other 5606 5908 Right Other 7582 8001 Right Other 8179 8484 Right Other 10005 10204 Right Other 10257 10534 Right Other 13452 13777 Right Other 14837 15386 Right Other 15806 16106 Right Other 16913 17224 Right Other 18414 18717 Right Other 19605 19902 Right Other 22523 22829 Right Other 24481 24773 Right Other 25143 25453 Right Other 26949 27269 Right Other 28032 28333 Right Other 31460 31867 Right Other |
------------------------------------------------------------
Seq 2 = ">U73107 U73107.1 Mus musculus adenosine deaminase (ADA) gene, complete cds."
Description:
"blastz.v7 /home/demo/ada.blastz.refseq /home/demo/ada.blastz.testseqs T=1 K=3000 L=3000 O=400 E=30 H=2200 W=8
A C G T
91 -114 -31 -123
-114 100 -125 -31
-31 -125 100 -114
-123 -31 -114 91
O = 400, E = 30, K = 3000, L = 3000, M = 0"
Local Alignment Number 1
Similarity Score: 7377
Match Percentage: 49 %
Number of Matches: 446
Number of Mismatches: 239
Total Length of Gaps: 212
Begins at (17,1939) and Ends at (898,2638)
: . : . : . : . :
17 TTTCCAGGT GTCAGGATGGAAGTGACTAAGGTGCAGAGGCTGGAGGG
|||||| :|---|||||||||||---------| ::|| | ||| ||||
1,939 TTTCCACATCTGGTCAGGATGGA GTCACACATTCTGCAGGG
: . : . : . : .
. : . : . : . : . :
64 CTGGGGCAGGTAGAAGCAAGCATTCCTGTTACCTACTGCTGTGTGACAAT
||| |||||||:||||------|:||||::|||-----------------
1,980 CTGTGGCAGGTGGAAG TCCCTGCCACC
: . : . : .
. : . : . : . : . :
114 CTCCCCCTAAAACACAATGGCTTAAAATAACATCCATTTCATTACATATC
---------|:|:||::|:||| | :| || -----------:|| |::|
2,007 AGATACGGTAGCTGATGAGAAA CACCTGCC
: . : . : .
. : . : . : . : . :
164 TCAATACTATAGGTCAGGAATTTGGGCTGGGCTTACTTGGGTAATTCTTC
||:: :|:||:|| ||| || :|||||||||| |:|||::|---||
2,037 ACAGCTTTGTAAGTAAGGTATGAAGGCTGGGCTTCCCTGGACA CTGA
: . : . : . : . :
. : . : . : . : . :
214 TGTCCCACATGGCATTGACCAAAGCCTGGTTTT CAGTGGGCAGCTGGGC
|||--|| |||-:||| : ||:| |||::|||-|||||-----------
2,084 AGTC ACCTGG GTTGCTGAAGGGCTGACTTTACAGTG
. : . : . : .
. : . : . : . : . :
263 TGGATGGCCCAACACAGCTTCGCTAACATGATTGCTGTCTTCGTAGGGAT
-------:|::|||:||------------------|| |: |||||:|
2,120 TCTGACATAG TGGGATTTTAGGGGT
: . : . :
. : . : . : . : . :
313 GGTGGAAGCCTGGGCTCAGTGGGACTGTCAACTGGAATGGCCATATGTGG
||: |||: |||||||::|: ::| ::||||||||--------------
2,145 GGCTGAAAGCTGGGCTTGGCTCAGCAACCAACTGGA
. : . : . : . :
. : . : . : . : . :
363 ACTCTCTTAGCA TGATGGTCTCTTCTAGAAGCTTGGGTTCCCAGAGAGA
--|:||::| ||-|||:||| |||||||| :|||:::|||||||||| :|
2,181 TTTCCCACCAGTGACGGTGTCTTCTAGTGGCTCAAGTTCCCAGAGCAA
. : . : . : . : .
. : . : . : . : . :
412 ATGTTCAAGAGGCCCCAAAGGACACCACAAAGCTTCTTTATGAC CAAGG
| :--|||||||:||||: ||| :::: ||:: ::|: || |-| ||:
2,229 AAA CAAGAGGTCCCAGTGGAAGTTGAAAGAGCCCCAAATCCCTCCAGA
: . : . : . : . : .
. : . : . : . : . :
461 CTCGGAAATCCAGGAAGCTTGCTCCCATCACGCTCTATTACTCCAACAAG
||||:|:||| |: ||--|::|:::|||::|||| -----|:||||
2,277 GAAGGAAGTTCAGTAGCCT CCTCTGCCACATTCTAAG AGCAAG
: . : . : . : .
. : . : . : . : . :
511 TCACTCAGGCCAGCCCAGGTCCAAGAGGAGGAAACCTAGACTCCATCTTG
|| :||||| | |||||| | ---------------|||||||||| |
2,320 TCCTTCAGGACTTCCCAGGGCA AGACTCCATCAGG
: . : . : . :
. : . : . : . : . :
561 CAATGTGAAGAATTGCAAATAATTTGTGTCACCCTTAAGCAACCAGCAAC
||| | |:|| : ||||| : |---||||||::||:: :||||| ||
2,355 CAAAGACAGGATCAGCAAAGGCT TGTCACTTTTGGCGGACCAGGAAG
. : . : . : . : . :
. : . : . : . : . :
611 TCATCTAGGTTGATTGGCATTTCAGCAATGTGGTGGGAAGTGGTGGGACT
||||:|------------------||||-------:||| :|| |||:
2,402 TCATTT GCAA AGAATGAGTTGGATG
. : . : .
. : . : . : . : .
661 GATGTTGAAGAGGGACTTGAATGTCATGAGAGGCTGGG GAGGCAATAAG
||: |||:|||::||:|||| -------| || |::||- :|| |||--
2,427 GACTTTGGAGAAAGATTTGAC ACAGCCCAGGCCCAGCTATA
: . : . : . : .
: . : . : . : . : .
710 GTGGGGAGTGAAGTTTCTCGAGTCAGATTCAAATTTAAACCCCAGTTTTG
-------------------|||||||| |||||||:||:: ||| ::|:
2,468 GAGTCAGAATCAAATTCAAGTGCCACCCTCC
: . : . : .
: . : . : . : . :
760 C CACTTACAACCCATGAGCCAAGCAGGCTGTCTCTCTATCTG
|--------|||| |: :||| ||||| ||::| ||||- |:|| |: |
2,499 CTTGCATTGCACTAATCTTCCAGGAGCCCAGTGGCCTGT GCCCTCTTGG
: . : . : . : . : .
. : . : . : . : . :
802 AACCTCAGTGTCCTCATCTGTAAAATGAGGAGAACACCTCCTACATCTGA
::||||:|: |:||||:||||:| |---|| |||| :|:||:| || |
2,548 GGCCTCGGCCACTTCATTTGTAGACT GATAACAGTTTCTGCTTCACA
: . : . : . : . :
. : . : . : . : .
852 GGATGACTGTAAAGATGAAATGGGATGGGTGCTTATAAAGTGCTTCC
||:||--|: | |||| :|||||:||-|:||:|| :|||||||||||
2,595 GGGTG TAGACAGATTGAATGGAAT GATGTTTTCAAAGTGCTTCC
. : . : . : . : .
Local Alignment Number 2
Similarity Score: 7957
Match Percentage: 48 %
Number of Matches: 370
Number of Mismatches: 217
Total Length of Gaps: 172
Begins at (1612,2789) and Ends at (2347,3398)
. : . : . : . : . :
1,612 ATCCTCCTGCCTTGGCCTCCCAAAGTCCTGGGATTACAGGCATGAACCAC
|||:|||||:||::||:| ::||-| ||::||||||||:||::|:----
2,789 ATCTTCCTGTCTCAGCTTGGTGAA TACTAAGATTACAGACACAAG
: . : . : . : . :
. : . : . : . : . :
1,662 TGCGCCCAGGCTCGGGTATGTCTTCATCAGTAGCATGAAAATAATGGACT
------------------|:|:::|||| |: | |::||:|| :: :|
2,834 TATTCCCATCCCTGTCTTAGAAGTACCATTTT
. : . : . : .
. : . : . : . : . :
1,712 AATACAGCCACCCTCTCCCTCACTCCCACATACAACCAAACCCCAAATCC
:||----||||||||||:|||||||||||| |||||::||:|||||||--
2,866 GAT CCACCCTCTCTCTCACTCCCACAAACAACTGAATCCCAAAT
: . : . : . : .
. : . : . : . : . :
1,762 AGCTGATTTTACACCCTAAATGCAGCTTGAATATGAGTTTCTCCACTTCC
--: |||:|:| |:|| | |:||| |:|| |||||:: ||| |||-
2,910 TGTATTCTGCTGCTTACAAGTAGCAGGGATCTGAGTCCATCCTCTT G
: . : . : . : . : .
. : . : . : . : . :
1,812 CCCACTGACATCACTATGCCCTACCCAGACCATGGCAGTTGCCTCCTTCC
|||| | |::||| :| :| :|||::||--------------||:||||
2,957 CCCAGAGTCGCCACGGTCTCACACCTGGA TCTTTCC
: . : . : . :
. : . : . : . : . :
1,862 TGGTATCCTGTCCTCCCTCACCCCCGCTGGCCCCCTGTAATGCCCTCCCC
||||:| |::: ||::||----|| |||-----||| | || :: |||||
2,993 TGGTGTACCACGCTTTCT CCAGCT CCTTTCATCTTGTCCCC
. : . : . : . :
. : . : . : . : .
1,912 TCACAGCAGGGAGCCCAGGCTT CTCAAAGTGCCCTGTGGGTGCG
|||:: ||||: :|||:||| |------|||||:||||-------|::|:
3,034 TCATGTCAGGACACCCGGGCATCCCCACCTCAAGGTGC GCACA
. : . : . : . : .
: . : . : . : .
1,956 AACCACCTGGGGGTCCTGTTTGTATAAAATACAGATTCT A
|:::||:|::||: |:||:| ||| |:: | | ||:----------|
3,077 AGTTACTTAAGGAACTTGCTACAATATAGCCCTGCATCCCGCCCCCAAAA
: . : . : . : . : .
: . : . : . : . : .
1,996 CTTCAGTAGGTCTGGGATGGGGTCTGAAAGTCTGCATTTGTAGTCAGCTC
:::|| :|:::||:|::|: |||:: |||---------------||||||
3,127 TCCCACCAAACCTAGAGTATGGTTCTAAA CAGCTC
: . : . : . :
: . : . : . : .
2,046 CCAGGTGATGTGGGTGCTGATGATCCCTGGAT CACACTTTCAG
| |||| ||----------- ||||||| |-------:| |||| ::
3,162 ACCTGTGAAGT CTCCCTGGCTAAATTTCTAGACTTGGGA
. : . : . : . :
: . : . : . : . : .
2,089 TAGCTGGAGAATATTTTTTCCAAATAAAAGGGTGATTTTGTCTCGCCTCC
:||||||||------|||||||||:|:||| ||||| :||| ::| ||
3,201 CAGCTGGAG TTTTCCAAACAGAAGTGTGATGCTGTACTGAAACC
. : . : . : . :
: . : . : . : . : .
2,139 ACTTAAAACACTCCACTGACTTCCTAGGAATCCCACACCATCGCTGGGTC
||| | || :|:|| |||| ||---------------------------
3,245 ACTGACAAACTTTCAGTGACATC
. : . : .
: . : . : . : . : .
2,189 CCACATCCCTGGCAGGATTCAGCTCCCATCAGACCTTCTAGCCCCTTGCT
-----||:|||---------||||:|||:| :||| ||:|||:|- |||
3,268 TCTCTG AGCTTCCACCTCGCCTGCTGGCCTC AGCT
: . : . : . :
: . : . : . : . : .
2,239 CTCCACTCTCCCACTCTCTCTTTCCCCCTTGTTTATGGGTTTGTTAATTT
|:|| |||:||: |:| ::|::|||:|:||:|||:|::||||||| | ||
3,303 CCCCTCTCCCCTTCCCATCCCCTCCTCTTTATTTGTAAGTTTGTTTAGTT
. : . : . : . : . :
: . : . : . : . : .
2,289 ATTTATGATGAAATGAAATGAAGCTACCATCCACCCCAGTACTGGAACAT
-------------| |:||||:: | ||||: ||:||| ||||:|:|:
3,353 TCAGATGAGAATTGCATCTCACCTAGTTCTGGGATAC
. : . : . : .
: .
2,339 TATCAATAA
:|:|| |||
3,390 CACCATTAA
: .
[Part of this file has been deleted for brevity]
Local Alignment Number 16
Similarity Score: 31563
Match Percentage: 47 %
Number of Matches: 1622
Number of Mismatches: 718
Total Length of Gaps: 1066
Begins at (33834,25779) and Ends at (36736,28621)
. : . : . : . : . :
33,834 AAGAAACAGTCAACAGTGTGAAATTCTGCTATGCAAGTCGATTATGGTCA
|| ||: | |::| ||:|| :|| ||||| | |:||-|:| | |||::
25,779 AATAAGAATTTGAAAGCGTTGAAAACTGCTCTCCGAG CAAGGAAGGTTG
: . : . : . : . : .
. : . : . : . : . :
33,884 GAGCTAGGAAAGATCCATTAGATACAACAAGATGGTGGTCAGGGATCGTG
|||||: |:|:|:| :|:|::| ||||||||||| || |||||-|:|||
25,828 GAGCTGCCAGAAACCAGTCAAGTTCAACAAGATGGAGGGCAGGG TTGTG
: . : . : . : . : .
. : . : . : . : . :
33,934 CCAAGAACAGCTTCCATGGTATGTTGGAGTAGCCAGCTCCCAGTGGGACT
|||:|||--|:||||||| |:|:|||||: |||| | ||:-||||||
25,877 CCAGGAA GTTTCCATGCTGTATTGGAAGAGCCTTGTGGCAA GGGACT
: . : . : . : . :
. : . : .
33,984 GAGGAACAA GCAGGGTAGGGTGC
||||| ::|--------| ||||||||:|||-------------------
25,924 GAGGACTGACTAGGAGAGGAGGGTAGGATGCTAGCCATGCTCCACCACTG
. : . : . : . : . :
: . : . : . : . : .
34,007 AGAGGGGAAGGCTGGAGAGGGTGGCAGCCGGAGGGGGATGTTGCTTTCT
-:::||||||||||:||||| :|||:|||: :|||---------------
25,974 AGAGGGGGAAGGCTAGAGAGCATGGTAGCTTAAGG
. : . : . : .
: . : . : . : . : .
34,056 TGGCTCCCACCCCCACGCCCCCACCGGCTGCCATTCTGCCTGGTTCCCAT
: |:: ||::|:|| :|:| || | |||||---||||||||---------
26,009 CTGTCACCGTCTCCCTGTCACCTCAGGCTG TTCTGCCT
: . : . : . : .
: . : . : . : . : .
34,106 GTCTGGCCCCTCTGCTGCCTTTGCCCAGCTCTGGTCTTCAGGATGGGCTG
---: |||:|| |||| |||| |||||:|:|------------------
26,047 CTGCCTCTGAGCTGGCTTTTCCCAGTTTT
: . : . : .
: . : . : . : . : .
34,156 GATTCTGGACTTTCTGGTTACATAGACTTGAACAAGTCACCTAAGTTCTG
---------------------|||:|------------------------
26,076 ATAAA
:
: . : . : . : . : .
34,206 AATTTATTTCCCCCTCTGCACAAGGATCAGATCTTTCAGATCTGTTTGAG
----|:|||||:|:|: ::||:||:|| :|:|:||:: :|-|||:|:||
26,081 TGTTTCCTCTTTAATACGAGAATGCAACCCTTTGTGT TGTCTAAG
. : . : . : . : .
: . : . : . : . : .
34,256 GCTGCTGTGAGGATCAAAGGCGGGTGAACGTCAATGTGTTCTGACTATTT
|:||-|:|:|:||| :|||:-||| |: || :| | |: |||: :||:
26,126 GTTG TATAAAGATGGAAGA GGGAGGTGGTGGAAGGGCAGTGATGGTTC
: . : . : . : . :
: . : . : . :
34,306 ATGTAAGAGTAAAAGGAGGCTGATTCTCTCCTCCTC
||---||||:||--|||||| :||||||:::||:--------------
26,174 TTG GAGTGAA GAGGCTCTCTCTCTCTCTCTTTTCTTCCTGCCTGG
. : . : . : . : .
. : . : . : . : . :
34,342 CCTCTTCTGCAGGCTCAAAAATGACCAGGCTAACTACTCGCTCAACACAG
||:||:|: ||| :||||:|||||: ||||:||||||||:||||||||||
26,219 CCCCTCCCCCAGCTTCAAGAATGATAAGGCCAACTACTCACTCAACACAG
: . : . : . : . : .
. : . : . : . : . :
34,392 ATGACCCGCTCATCTTCAAGTCCACCCTGGACACTGATTACCAGATGACC
|:||||| ||||||||||||||||||||:||||||||:||||||||||||
26,269 ACGACCCCCTCATCTTCAAGTCCACCCTAGACACTGACTACCAGATGACC
: . : . : . : . : .
. : . : . : . : . :
34,442 AAACGGGACATGGGCTTTACTGAAGAGGAGTTTAAAAGGCTGGTGAGTGG
||: ::|||||||||||:|||||:||||||||:||: |:|||||||||:
26,319 AAGAAAGACATGGGCTTCACTGAGGAGGAGTTCAAGCGACTGGTGAGTAT
: . : . : . : . : .
. : . : . : . : .
34,492 GTGTGAGCCATA CTGGCCTTGACTCGGGTTTGGGAGTATG GTATCT
||||||||:||:---|||:| :||:|:|:||| || | ||:||-|||| |
26,369 GTGTGAGCTATGAGCCTGACACTGGCCCAGGTGTGTGTGTGTGTGTATAT
: . : . : . : . : .
: . :
34,538 ACAGGTCCA GTCCGGG
::: || ::-|| || |---------------------------------
26,419 GTGTGTGTGTGTGCGCGCGCGCGCGCGTGCACACACACGCACGTGAGTGC
: . : . : . : . : .
. : . : . : .
34,554 GCCTGGAATCTTTGGAGAGAGGGAGTGAGTCT
------------------||||:|||:||:|||||| ::||:||:||:
26,469 ATATATTGTGTGTAGATTGCCTAGAACCTCTGGAGACCAAGAATGGGTTG
: . : . : . : . : .
: . : . : . : .
34,586 GCCTCAACAGTCCAAGACAAGCCCAACCTAG ACACTTTCCACAG
|||| |------|| | |||||:||:|||||------| :|:||||||||
26,519 GCCTGA CATGCCAAGCTCAGCCTAGTACCAAAAGCCTTCCACAG
: . : . : . : . :
: . : . : . : . : .
34,630 AGAAGACATCTTTGTGTTGACGTCCTGACCTA GGACCAGGTTTTTGATC
||| |---:||| |:: |::|:| |||:|: |-|||| |||: :| | |
26,563 AGATG CCTTGGCAGTAGCATGCTGGCTGAGGGACGAGGCAACTTAGC
. : . : . : . : .
: . : . : . : . : .
34,679 CTTTGCTTGGGTTGAGTGCCTTTAAAGAATCCAGTGAAAGCTGTCAACCC
||||||||||||||||: |||| :|||------------------| |||
26,610 CTTTGCTTGGGTTGAGCCCCTTAGAAG ACCCC
: . : . : . :
: . : . : . : . : .
34,729 TCTCCCCAGAAAGGTGTGTGCAGCAGCTATGAA GTCTTGC ACACTCT
|||||| ||| |||| ||-------:||:||:|-- |:|||:-|||----
26,642 TCTCCCAAGACAGGTCTG ACTGTGGACGTTTTTGTGACA
. : . : . : . :
: . : . : . : . : .
34,776 CTTCAGGTTGTTCTTAAATCCCAGGCTGAATAAGTCCATTCCTGCACGTG
-||:|||||| |: ||::-|:|||:|--------------|:|| | ||
26,681 TTTAGGTTGGTTGTAGG CTCAGAC CTTGGTCTTG
. : . : . :
: . : . : . : . :
34,826 TCTGCGAGGTGTCTCTGGCCCCCTACATGCCACCCTGTCTCTCAAAG
----------| :||||:: :|||: :|| |||||-----|||:|---
26,715 GGTTCTGATGTCCTGAGTGAGTCCCTG CAAGGATC
. : . : . : .
. : . : . : . : . :
34,873 GTTTCTCCAACTTCCTTCTCACAGCCCTTTTTCATGTAATGACAAATTA
-|||||:|| |||--------------------------||||-------
26,750 TGTTTCCCCCACT ATGA
: . : .
. : . : . : . : . :
34,922 AGAACACGACCTCATGGTCTCTACTCTGGCACTTGCTGCCGTGTGACAGT
---------------------|:|:|| || ||||||:------------
26,767 TGCCCTTGCCCTTGCTA
: . :
. : . : . : . : . :
34,972 GGACAAATCCTTCCCCCTCTAAGCGTATCTGCCCATGTTGAGTGAAGAGG
--------------------------------------------------
26,784
. : . : . : . : . :
35,022 ATGGACTATCACTACATTGCTAAGAGCTGCCTTCTTTGTTCTCTGGTTCC
-------------||| |||----||| |||||:||||--:|||::|||
26,784 ACAGGGCT GCTTCCTTCCTTGT CCTGACTCC
. : . : . :
. : . : . : . : . :
35,072 ATGTTGTCTGCCATTCTGGCCTTTCCAGAACATCAATGCGGCCAAATCTA
||||| :|-----------||:|||:||||||||||:||:|| ||:|| |
26,815 ATGTTTCC CCCTTCTAGAACATCAACGCAGCGAAGTCAA
. : . : . : . :
. : . : . : . : . :
35,122 GTTTCCTCCCAGAAGATGAAAAGAGGGAGCTTCTCGACCTGCTCTATAAA
|:|||||||||||:|| ||:||||:|||:||||| || | ||||||:|:|
26,854 GCTTCCTCCCAGAGGAAGAGAAGAAGGAACTTCTGGAACGGCTCTACAGA
. : . : . : . : . :
. : . : . : .
35,172 GCCTATGGGATGCCACC TTCAGCCTCTGCAGGTAG
| ||: :: ||||||----------- | ||:| |||||| |----
26,904 GAATACCAATAGCCACCACAGACTGACGGTACGCTTGTGCAGGGCGCAAT
. : . : . : . : . :
: . :
35,207 GTTCCT GTCTGGGC
-----------------||:|||------ |||| ||-------------
26,954 AACCACCCCACCACACTGTCCCTCCTTAACTCTGTGCGATTGTGGCAGAA
. : . : . : . : . :
. : . :
35,221 TTCTGGGCAGTTG CCTGTCC
--:::||||||| |----------------------------|||:| |
27,004 GTCCTTGGGCAGGAGCACACCTCTGCAGGGTTACAGCCACCACCCTATGC
. : . : . : . : . :
. : . : . : . : . :
35,241 TGGCCCCAGTGTGGCTTTCTGTGGGACTTCTAGCAAGATGCCCTTCCATT
|| |||: -:::::||||||||| |:||| |||||::| |::|:|||
27,054 TGTCCCTC CACAACTTTCTGTGTGGCTTGTAGCAGAAATCTTTCCCAGA
. : . : . : . : . :
. : . : . : . : .
35,291 CTTGGG CAGCGCATGAATGTGTGATGACTCCCTGGTTTCTGGGCCCTGG
| |||-||||-| |||| : :||: |||| :||||:||||||||----:
27,103 GTAGGGACAGC CCTGAAAAAATGGAGACTGTCTGGCTTCTGGGC A
. : . : . : . : .
: . : . : . : . : .
35,340 CTGGGAGCAGCGTCTCATTAGATCGGTTTGTTTTCTATAAAAGTTCTTGA
| | | |||| |||::|:||: :||:| ||::: || | :| | ||
27,148 CAGTGCTCAGCTTCTTGTCAGGGTGGCTATTTCCTGATCATTATGCAGGA
: . : . : . : . : .
: . : . : . : .
35,390 GAGGCT GTTCTAAGGGGAGACTTTCTGAA GCCCAGT
|||||-----: :||:::| | ||||---||||---------|||||||
27,198 TAGGCTCGGGCAGCCTGGAGTGTGACT TGAACAGCAGTTGGCCCAGT
: . : . : . : . :
: . : . : . : . : .
35,426 CCCAAAGGTCTGGGCAGTTGGGGACACCTCCATGGCTGCCCAAAGCCAAG
||||:----------------------:||||||:---------------
27,245 CCCAG TTCCATGA
. : .
: . : . : . : . : .
35,476 GGCAGGGAGAGGGGCCCAGGCTGTTCTGCTCCTTTCTTCCTATGTGGTCT
---------------------------|::||||| ||:|: |:|||:||
27,258 GTCCCTTTATTTCCTTATGGCCT
: . : . :
: . : . : .
35,526 TGGCAAGGCA TCTTCTTGCCATCATAGGAA
||||||| ||--------------------:| ||||||::|| ||||
27,281 TGGCAAGCCACACATTCCCTTGCTTGAAGCCCGTCTTGCTGTCTATGGAA
. : . : . : . : . :
: . : . : . : . :
35,556 GGA GTTCCTTTCTGGTTCTGGTGTTCTATGATTTTTACAACATCCT
|||----:|| ||:|||||::| |:|-||||:|| |||:||| |:::|:|
27,331 GGAAGTTATTACTCTCTGGCCCGGAT TTCTGTGCTTTCTACCATGCCTT
. : . : . : . : .
. : . : . : . : . :
35,602 GGGTACTACAAGTTGCCTGATCTTTTTGCTTCTCTGAACCAACGAGCAGG
: :|:::|::||--:|||||:||||:|::|||||||| :::|| ||||||
27,380 ACATGTCATGAG ACCTGACCTTTCTATTTCTCTGACTTGACCAGCAGG
: . : . : . : . : .
. : . : . : . : .
35,652 GCAGAACCTCTGAAGAC GCCACTCCTCCAAGCCTTCACCCTGTG
||:|: ||:|||||||:------||||||:|||::||||-|||:||||||
27,428 GCGGGTCCCCTGAAGATGGCAAGGCCACTTCTCTGAGCC TCATCCTGTG
: . : . : . : . : .
: . : . : . : . :
35,696 G AGTCACCCCAACTCTGTGGGGCTGAGCAACATTTTTACATTTATT
|----|||| :: ||||||||------------|||| || || ||:|||
27,477 GATAAAGTCTTTACAACTCTG ACATATTGACCTTCATT
: . : . : . :
. : . : . : . : . :
35,742 CCTTCCAAGAAGACCATGATCTCAATAGTCAGTTACTGATGCTCCTGAAC
|||||||:------------------------------------------
27,515 CCTTCCAG
. :
. : . : . : . : . :
35,792 CCTATGTGTCCATTTCTGCACACACGTATACCTCGGCATGGCCGCGTCAC
-----------------------------||||:|| : ||||: |||
27,523 ACCTTGGAGAGGCCAGGTCTG
. : . :
. : . : . : . : .
35,842 TTCTCTGATTATGTGCCCTGGCCAGGGACCAGCGCCCTTG CA CATGG
|:||||||||: :|::||||||:||| |||| | ||||-||--||||
27,544 TCCTCTGATTGGATATCCTGGCTAGGTCCCAGGGGACTTGACAATCATGC
. : . : . : . : . :
: . : . : . : . : .
35,889 GCATGGTTGAATCTGAAACCCTCCTTCTGTGGCAACTTGTACTGAAAATC
:||----|||||: :|||||:|||||||: :|---------||:||| |
27,594 ACA TGAATTGAAAACCTTCCTTCTAAAG CTAAAATTA
. : . : . : . :
: . : . : . : . : .
35,939 TGGTGCTCAATAAAGAAGCCCATGGCTGGTGGCATGCAGCAGGTGGCATG
|||||:||||||||| |||: :||:|||||: | ||||||| :|||:: :
27,631 TGGTGTTCAATAAAGCAGCTGGTGACTGGTATCTTGCAGCACATGGTGAA
. : . : . : . : . :
: . : . : . : . : .
35,989 TAATTTGGTGGTCTTGGGCGGGCCGATGTGGGCAGGATG AGCATGGA
||------||||||:||| ||-----||| :||||||---|| | |||
27,681 TA TGGTCTCGGGGCTGC TGGCTAGGATGCTAAGAAAGGA
. : . : . : .
: . : . : . : .
36,036 GGGAGCTGGGTCAGCCTGCTCAGCAGCAGG GCCTGAGCCT
||:: |: || | : :||| ||:: ||||-------||| |:|:||---
27,720 GGAGCCCTGGGCCCTACGCTGAGTGTCAGGCTGGGGAGCCAGGGTCTCTT
: . : . : . : . : .
: . : .
36,076 AAGGGTGGCTGT GAATGCCAGG
---------------------- ||:| ||||||------::||||:---
27,770 TCCTGCAGAAGCGATTCTTTCCCAGAGGGGCTGTTGGAGCAGATGCT
: . : . : . : . : .
: . : . :
36,098 CCAGAGATC CCAATGCTGTGGGCC
|| ||: ||-------------|||:| || |||:::-------------
27,817 CCTGAACTCTCCGCCCCTTTAACCAGTCCTTTGGATTTATTTTTATTATT
: . : . : . : . : .
. : . :
36,122 AAGAGGGGTCCAGAGGCTGT
------------------------ | | |||| : | |||:------
27,867 TTTAAATATTTAATTATGTTTATGTATATGGGTGTTTTGCCTGCTTGTAT
: . : . : . : . : .
. : . :
36,142 CCTCCTTCCAGAAG AAATAAGG C
------------------||| | |||||:|--|:| ||||-------|
27,917 GTATATGCATCATGTGTGCCTGGTGCCAGAGGTCAGAAAAGGGTACCACC
: . : . : . : . : .
. : . : . : .
36,165 TTCTCTGGT TGTTGCTCAAACATTCCCTGAACTC
|:|:|||||----------- || |:|| ||| : || || :|:|-----
27,967 TCCCCTGGTACTGGAATTTAGGTAGTTCTAACCCACCATGTGCCCATGTG
: . : . : . : . : .
: . :
36,199 TCAGC CCCTCCTA
----------- ||||--------------------------|||| ::|
28,017 CCCACCAGTGGACAGCAAGTGAGAGCCGACTCTCTCTTCCTACCCTGTCA
: . : . : . : . : .
. : . : .
36,212 ACTCTAGGTTTTAA GGAGTAAAGCT
:|:|:||| | ||------------------| |||||: ||-------
28,067 GCCCCAGGGATGAACTCAGGCTGCCAGGCTGGGTAGTAAGTCTCTACCCA
: . : . : . : . : .
: . : . : .
36,237 TCCTTTTGGGTTCCTGAAGCTGGCAGTTGGGGT
-----------------:|:|||::||:| : | || ||-----|||| |
28,117 CTGAGCATCTCACAGGCCCTTTTCAGGCTGTGGCAGGTG TGGGCT
: . : . : . : . :
: . : . : . : . : .
36,270 GAGAGCAGATGAGATGGAAGAGGGCTCATCAGACACTGGCCTTGGAGG
| |:|::|:||||:||::::|||:||||| |||::|::--| ||::||--
28,162 GTGGGTGGGTGAGGTGAGGAAGGACTCATGAGATGCCA CATGAGGGGT
. : . : . : . : .
: . : . : . : . : .
36,318 GTGCTGGCCTCTGCAGAACGCCAGCATCTTCTCAGAATCGTATGTTCTA
-|||||||:| |::|||:|-----------------------------||
28,210 TGTGCTGGTCACCACAGGA TA
: . : . :
: . : . : . : . : .
36,367 GAAGCCTG GGCGAAGTCCGGCTAATTGTGGACTTGGGGAAAATAAGGCC
||:| | |-:|:|::|:|| || :: || |||: |||||-| ::|::|||
28,231 GAGGGCAGCAGTGGGGCCCTGCAGGGTGAGGATGTGGGG ATGCAGAGCC
. : . : . : . : .
: . : . : . : . : .
36,416 CAACCCCTGTTTTTGCAAGGTTAAGGAGAAATAATCTTAAACCAGTCACA
||:-|: ||: ||| || | ||--||||| :||| |||:|||||||||
28,280 CAG CTGTGCAGTTGATAGTTAAA AGAAAAGATCATAAGCCAGTCACA
: . : . : . : . : .
: . : . : . : . : .
36,466 CAAATCATCGGCATTTATTTCCTGGGTCCTAGGTGTCACTTATCCTGGTG
||||:||:|:||||||||||||||||-||||:||:||| |:-||:||||:
28,327 CAAACCACCAGCATTTATTTCCTGGG CCTAAGTATCAGTC TCTTGGTA
: . : . : . : . :
: . : . : . : . :
36,516 GACAGGGCAGAGG TGGTCAGATCGTTTTGAGCCAAAATCCCTTCCCTA
| ||||||:||:--||||||| | ||||||||:| ||||||:|||||-|
28,375 CAAAGGGCAAAGAATTGGTCAGTTAGTTTTGAGTCCAAATCCTTTCCC A
. : . : . : . : . :
. : . : . : . : .
36,564 AAAATGGATCTGTGGAGCTCCATGAGGGAACCTCAGAGATGCACA A
||| ||||||||||::|||||||||||:||-||:|||| ||||:|----|
28,424 AAACTGGATCTGTGAGGCTCCATGAGGAAA CTTAGAGCTGCATAGATCA
. : . : . : . : . :
: . : . :
36,610 TGACAGTTTAGC TAAAATGGCTT
| | |::| |||-| |||:|||||--------------------------
28,473 TCAGAACTGAGCTTTAAACGGCTTTCAAAACAAAACCAAAACCAAAAACC
. : . : . : . : . :
. : . : . : . : . :
36,633 AAAAAATGTGAATTGATTGTCAGCTCTCTCCATATCTGCTGAAAAAAG
--|| |:| | :|| |||||||||||||:|::|-| |||-||||||||:-
28,523 AAAACAGAAGAAAAGTGATTGTCAGCTCCCCTC TCTCT CTGAAAAAG
. : . : . : . : .
. : . : . : . : . :
36,681 GTTTAAAATTTTTAAAAAGTTTAAAAGTGTTTTCTAAAAAAGGGACAAGC
---| | :||||||||||:| |||||| | :||-|||||||||| :|:|
28,570 TTATGTTTTTAAAAAATGTAAAAGAGGCTT TAAAAAAGGGCTAGGA
: . : . : . : . : .
.
36,731 AGGTCT
||:|||
28,616 AGATCT
:
|
| Program name | Description |
|---|---|
| blast2seq | Finds local alignments between two sequences, using BLAST |
| lfasta | Finds local alignments between two sequences, using fastA |
| matcher | Finds the best local alignments between two sequences |
| seqmatchall | All-against-all comparison of a set of sequences |
| sim_lav | Nonintersecting best local alignments, makes LALNVIEW file |
| supermatcher | Match large sequences against one or more other sequences |
| water | Smith-Waterman local alignment |
| wordfinder | Match large sequences against one or more other sequences |
| wordmatch | Finds all exact matches of a given size between 2 sequences |
| maskseq | Mask off regions of a sequence. |
| est2genome | Align EST and genomic DNA sequences |
| sim4 | Align an mRNA to a genomic DNA sequence |
| sim4_lav | idem as sim4, makes by default file for LALNVIEW |
The program blastz itself, the Java programs LAJ and LAT, the PipTools suite (from which genbank2exons and genbank2repeats are used) and the PipServer distribution (which is used to make the PostScipt/PDF output) were all developed by the group of Webb Miller (webb@bio.cse.psu.edu) at the Penn State University Center for Comparative Genomics and Bioinformatics.