lfasta

 

Function

Finds local alignments between two sequences, using fastA

Description

lfasta is an EMBOSS "wrapper" program for the programs lfasta and plfasta from Pearson's fastA package version 2. lfasta and plfasta compare two protein or DNA sequences for local similarity and show the local sequence alignments. While fasta reports only the best alignment between the query sequence and the library sequence, lfasta and plfasta will report all of the alignments between the two sequences with scores greater than a cut-off value. lfasta shows the actual local alignments between the two sequences and their scores, while plfasta produces a plot of the alignments that looks similar to 'dotmatrix' homology plot.

By default the "wrapper" lfasta produces the text output. You can obtain instead the graphic in PostScript format (program plfasta) with the option -psplot. If you work under X-Window the PostScript file is automatically "opened" with the program ghostview.

Algorithm

Usage

Here is a sample session with lfasta

> lfasta
Finds local alignments between two sequences, using fastA
Input sequence: embl:x00066
Second sequence: embl:k00153
Word (ktup) size [6]: 4
Output file [x00066.lfasta]:

Go to the input files for this example
Go to the output files for this example

second example, with graphical output

> lfasta -psplot
Finds local alignments between two sequences, using fastA
Input sequence: embl:x00066
Second sequence: embl:k00153
Word (ktup) size [6]: 4
Output file [x00066.lfasta.ps]:

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-asequence]         sequence   Sequence filename and optional format, or
                                  reference (input USA)
  [-bsequence]         sequence   Sequence filename and optional format, or
                                  reference (input USA)
   -wordsize           integer    [2 for protein, 6 for nucleic] Word (ktup)
                                  size (Integer 1 or more)
  [-outfile]           outfile    [*.lfasta] Output file name

   Additional (Optional) qualifiers (* if not always prompted):
*  -matrix             menu       [BL50] Amino acid comparison matrix (Values:
                                  BL50 (BLOSUM50); BL62 (BLOSUM62); 250
                                  (PAM250))
   -gapopen            integer    [12 for protein, 16 for nucleic] Gap opening
                                  penalty (Integer 0 or more)
   -gapextend          integer    [2 for protein, 4 for nucleic] Gap extension
                                  penalty. fastA subtracts from the
                                  similarity score for each gap a penalty of
                                  type <Gap opening penalty> + <Gap extension
                                  penalty> * (n - 1) (Integer 0 or more)
*  -format             menu       [0] Alignment format (Values: 0 (default); 1
                                  (x = conservative replacements, X =
                                  non-conservative substitutions); 2 (show
                                  only residues in sequence 2 that differ from
                                  sequence 1); 10 (write alignments in
                                  parsable format))

   Advanced (Unprompted) qualifiers:
   -psplot             boolean    Make PostScript file with dotplot-like
                                  graphic instead of writing alignment
   -linesize           integer    [60] Number of residues per line of the
                                  alignment (Integer from 10 to 200)
   -[no]ghostview      boolean    [Y] Open PostScript file with Ghostview

   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1            integer    Start of the sequence to be used
   -send1              integer    End of the sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2            integer    Start of the sequence to be used
   -send2              integer    End of the sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-outfile" associated qualifiers
   -odirectory3        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-asequence]
(Parameter 1)
Sequence filename and optional format, or reference (input USA) Readable sequence Required
[-bsequence]
(Parameter 2)
Sequence filename and optional format, or reference (input USA) Readable sequence Required
-wordsize Word (ktup) size Integer 1 or more 2 for protein, 6 for nucleic
[-outfile]
(Parameter 3)
Output file name Output file <sequence>.lfasta
or <sequence>.lfasta.ps
Additional (Optional) qualifiers Allowed values Default
-matrix Amino acid comparison matrix
BL50 (BLOSUM50)
BL62 (BLOSUM62)
250 (PAM250)
BL50
-gapopen Gap opening penalty Integer 0 or more 12 for protein, 16 for nucleic
-gapextend Gap extension penalty. fastA subtracts from the similarity score for each gap a penalty of type <Gap opening penalty> + <Gap extension penalty> * (n - 1) Integer 0 or more 2 for protein, 4 for nucleic
-format Alignment format
0 (default)
1 (x = conservative replacements, X = non-conservative substitutions)
2 (show only residues in sequence 2 that differ from sequence 1)
10 (write alignments in parsable format)
0
Advanced (Unprompted) qualifiers Allowed values Default
-psplot Make PostScript file with dotplot-like graphic instead of writing alignment Boolean value Yes/No No
-linesize Number of residues per line of the alignment Integer from 10 to 200 60
-[no]ghostview Open PostScript file with Ghostview Boolean value Yes/No Yes

Input file format

lfasta needs 2 input sequences of same type (nucleic acid or protein), which you can provide with any normal USA. There is however a built-in limit of 20.000 bases or amino acids per sequence.

Output file format

By default lfasta writes an output in text format, but you can instead obtain a graphic in PostScript format with the option -psplot.

Note that the local alignment of identical sequences produces "mirror-image" alignments, as well as a full identity alignment. lfasta reports only one-half of the local alignments in the text output ; in the graphical output (parameter -psplot) it draws all alignments, including the full diagonal.

Output files for usage example

File: x00066.lfasta

 LFASTA compares two sequences
 v2.1u00 Mar, 2001
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

 searching embl-id:K00153 library
 Comparison of:
(A) embl-id:X00066 X00066 X00066.1 Salmonella typhimurium hisR gene   - 972 nt
(B) embl-id:K00153 K00153 K00153.1 E.coli Arg-tRNA-2.                 - 77 nt
 using matrix file DNA

 74.026% identity in 77 nt overlap; init:  203, opt:  205

        310       320       330       340       350       360      
X00066 GCGCCCGTAGCTCAGCTGGATAGAGCGCTGCCCTCCGGAGGCAGAGGTCTCAGGTTCGAA
       ::  X::::::::::::::::::::  ::   :: :: :   :: ::::  :::::::::
K00153 GCATCCGTAGCTCAGCTGGATAGAGTACTCGGCTGCGAACCGAGCGGTCGGAGGTTCGAA
               10        20        30        40        50        60

        370       380   
X00066 TCCTGTCGGGCGTACCA
       ::::  :::  : :::X
K00153 TCCTCCCGGATGCACCA
               70       

----------

 72.131% identity in 61 nt overlap; init:  152, opt:  152

      670       680       690       700       710       720        
X00066 GTAGCGCAGCTTGGTAGCGCAACTGGTTTGGGACCAGTGGGTCGGAGGTTCGAATCCTCT
       X:::: ::::: : ::: : :   :: :  : :::    :::::::::::::::::::: 
K00153 GTAGCTCAGCTGGATAGAGTACTCGGCTGCGAACCGAGCGGTCGGAGGTTCGAATCCTCC
         10        20        30        40        50        60      

        
X00066 C
       X
K00153 C
        

----------

 73.214% identity in 56 nt overlap; init:   67, opt:  133

          450        460       470       480       490         
X00066 TAGCTCAGTTGG-TAGAGCCCTGGATTGTGATTCCAGTTGTCGTGGGTTCGAATCC
       :::::::: ::: :::::  :: :  :: ::  : ::  X:::  ::::::::::X
K00153 TAGCTCAGCTGGATAGAGTACTCGGCTGCGAACCGAGCGGTCGGAGGTTCGAATCC
        10        20        30        40        50        60   

----------

 71.429% identity in 28 nt overlap; init:   62, opt:   68

            600       610       620
X00066 GGGGGTTCAAGTCCCCCCCCTCGCACCA
       :: X:::: : ::: :::    :::::X
K00153 GGAGGTTCGAATCCTCCCGGATGCACCA
      50        60        70       

----------

Output files for usage example 2

Graphics File: x00066.lfasta.ps

[lfasta -psplot result]

Data files

The amino acid comparison matrices used to compare proteins are hard coded in the program and cannot be changed.

Notes

None.

References

see references for fasta.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
blast2seq Finds local alignments between two sequences, using BLAST
blastz Nonintersecting best local alignments, makes LAJ file
matcher Finds the best local alignments between two sequences
seqmatchall All-against-all comparison of a set of sequences
sim_lav Nonintersecting best local alignments, makes LALNVIEW file
supermatcher Match large sequences against one or more other sequences
water Smith-Waterman local alignment
wordfinder Match large sequences against one or more other sequences
wordmatch Finds all exact matches of a given size between 2 sequences
fasta fastA search of query sequence(s) against sequence search set
fasts Protein identification from peptides using fastA algorithm

Author(s)

The wrapper application lfasta was written by Guy Bottu (gbottu@vub.ac.be)
BEN, ULB, Brussels, Belgium

The programs lfasta and plfasta themselves were written by
William R. Pearson
Department of Biochemistry
Box 440, Jordan Hall
U. of Virginia
Charlottesville, VA

wrp@virginia.EDU

History

Completed 23 August 2002
Modified 20 March 2003 - adapted to fastA version 2.0u66
Modified 24 October 2005 - adapted to fastA version 2.1u1

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.