ebi_blast

 

Function

WU-BLAST search of query sequence against sequence databank using EBI Web Services

Description

ebi_blast uses the Web services of the EBI to submit a sequence for a similarity search by BLAST against one of the databanks available at the EBI.

At the EMBL-EBI you will find a collection of databanks different from that at the BEN site. You can among other things search the Alternative Splicing Database and a collection of parasite genomes.

The BLAST used at the EBI is not the NCBI BLAST but the BLAST of Washington University, which has some differences (see below). The implementation at EBI does not allow the user to set himself the various parameters that affect speed and sensitivity, but has instead a -sensitivity selector with a series of pre-set parameters labelled "very low", "low", "medium", "normal" and "high".

Algorithm

ebi_blast relies on the SOAP based interaction between the Perl client wublast.pl and the Web server at http://www.ebi.ac.uk/Tools/webservices/wsdl/WSWUBlast.wsdl, which provides access to programs from the WU-BLAST package.

You can find a detailed explanation of the algorithm and statistics of NCBI BLAST in the on-line help for blast. WU-BLAST has a number of differences. The most notorious are :

WU-BLAST at the EBI allows the users to choose the following settings that go from low sensitivity + high speed to high sensitivity + low speed :

for nucleic acid searches (blastn) :
sensitivity very low low medium normal high
word size (W) 14 12 11 11 9
word offset 2 1 1 1 1
match score (M) +1 +1 +1 +5 +5
mismatch score (N) -3 -3 -3 -4 -4
distance for two hit algorithm 30 40 (off) (off) (off)
maximum number of HSPs stored 1000 1000 1000 1000 no limit
gap opening penalty (Q) 3 3 3 10 10
gap extension penalty (R) 3 3 3 10 10
bandwidth for gapped extension 16 16 16 16 24


for protein searches :
sensitivity very low low medium normal high
word size (W) 5 3 3 3 3
word offset 2 1 1 1 1
neighbouring threshold (T) (off) (off) 11 11 11
distance for two hit algorithm 30 40 40 (off) (off)
maximum number of HSPs stored 1000 1000 1000 1000 no limit
(for proteins the bandwidth for gapped extension is always 32)

The submission and retrieval procedure

ebi_blast submits a request in asynchronous mode and obtains from the server at the EBI a jobID. At the side of the server the job is put in a waiting queue and after it has been completed the results are kept stored for some time. ebi_blast waits 1 minute and then sends to the server a request to check the status of the submitted job (RUNNING/ERROR/DONE). If the job is still "RUNNING" ebi_blast will at increasing intervals of time submit a new request (up to 29 hours and 55 min). If the job has been "DONE" ebi_blast sends a request to retrieve the result.

Usage

Here is a sample session with ebi_blast

> ebi_blast
WU-BLAST search of query sequence against sequence databank using EBI Web
Services
         1 : blastn (nuc against nuc)
         2 : blastp (prot against prot)
         3 : blastx (nuc translated against prot)
         4 : tblastn (prot against nuc translated)
         5 : tblastx (nuc translated against nuc translated)
Select type of search you want to run [2]: 1
Query sequence: embl:x15320
   general : general databank
  parasite : parasite genome databank (nucleic acid db only !)
       ASD : alternative splicing databank
Databank type [general]: parasite
       nem : Nematoda except C. elegans
       fil : Filaria
    brugia : Brugia malayi
      oncv : Onchocerca volvulus
    apicom : Apicomplexa
      plas : Plasmodium
     plasf : Plasmodium falciparum
      toxo : Toxoplasma gondii
    crypto : Cryptosporidium parvum
   eimeria : Eimeria
    kineto : Kinetoplastids
   schisto : Schistosoma
    schunq : David Johnson's unique Schistosoma
   mansoni : Schistosoma mansoni
       jap : Schistosoma japonicum
    japunq : David Johnson' unique Schistosoma japonicum
   cercunq : David Johnson's unique Schistosoma cercarial stage
  entamoeba : entamoeba
   bgESTnr : nonredundant big EST databank
Parasite genome databank [nem]: plasf
Output file [x15320.ebi_blastn]

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
   -program            menu       [2] Search type : nuc. or prot. (Values: 1
                                  (blastn (nuc against nuc)); 2 (blastp (prot
                                  against prot)); 3 (blastx (nuc translated
                                  against prot)); 4 (tblastn (prot against nuc
                                  translated)); 5 (tblastx (nuc translated
                                  against nuc translated)))
  [-sequence]          sequence   Query sequence
   -dbtype             menu       [general] Databank type (Values: general
                                  (general databank); parasite (parasite
                                  genome databank (nucleic acid db only !));
                                  ASD (alternative splicing databank))
*  -gennucdb           menu       [em_rel] General nucleic acid sequence
                                  databank (Values: em_rel (EMBL databank last
                                  release without EST, GSS, HTG, HTC);
                                  em_rel_hum (EMBL human sequences);
                                  em_rel_mus (EMBL mouse sequences);
                                  em_rel_rod (EMBL other rodents sequences);
                                  em_rel_mam (EMBL other mammal sequences);
                                  em_rel_vrt (EMBL other vertebrate
                                  sequences); em_rel_inv (EMBL invertebrate
                                  sequences); em_rel_pln (EMBL plant
                                  sequences); em_rel_fun (EMBL fungi
                                  sequences); em_rel_pro (EMBL prokaryote
                                  sequences); em_rel_phg (EMBL bacteriophage
                                  sequences); em_rel_vrl (EMBL other viral
                                  sequences); em_rel_env (EMBL environmental
                                  sample sequences); em_rel_tgn (EMBL
                                  transgenic sequences); em_rel_syn (EMBL
                                  synthetic sequences); em_rel_unc (EMBL
                                  unclassified sequences); em_rel_std (EMBL
                                  standard sequences); em_rel_htg (EMBL High
                                  Throughput Genome sequences); em_rel_htc
                                  (EMBL High Throughput cDNA sequences);
                                  em_rel_pat (EMBL patent sequences);
                                  em_rel_sts (EMBL Sequence Tagged Site
                                  sequences); em_rel_est (EMBL Expressed
                                  Sequence Tags); em_rel_gss (EMBL Genome
                                  Survey Sequences); em_rel_tsa (EMBL shotgun
                                  Transcriptome Assembly); em_rel_tpa (EMBL
                                  Third Party Annotations); em_rel_std_hum
                                  (EMBL human standard sequences);
                                  em_rel_std_mus (EMBL mouse standard
                                  sequences); em_rel_std_rod (EMBL other

  [some lines have been deleted for brevity]

                                  em_rel_tpa_syn (EMBL synthetic Third Party
                                  Annotations); em_rel_tpa_unc (EMBL
                                  unclassified Third Party Annotations); emcds
                                  (EMBL coding sequences); emvec (EMBL vector
                                  subset); imgtligm (IMGT/LIGM databank);
                                  imgthla (IMGT/HLA databank); hgvbase
                                  (HGVBASE (European SNP databank)))
*  -genprotdb          menu       [uniprot] General protein sequence databank
                                  (Values: uniprot (UniProt databank);
                                  uniref100 (UniRef100); uniref100_seg
                                  (UniRef100 SEG filtered); uniref90
                                  (UniRef90); uniref50 (UniRef50); uniparc
                                  (UniParc); swissprot (UniProt/SwissProt);
                                  ipi (IPI (International Protein Index));
                                  prints (sequences from PRINTS); pdb
                                  (sequences from PDB); sgt (MSD structural
                                  genomics targets); intact (sequences from
                                  IntAct); imgthlap (IMGT/HLA proteins); epop
                                  (European patent sequences); jpop (Japanese
                                  patent sequences); kpop (Korean patent
                                  sequences); uspop (U.S.A. patent sequences))
*  -paranucdb          menu       [nem] Parasite genome databank (Values: nem
                                  (Nematoda except C. elegans); fil (Filaria);
                                  brugia (Brugia malayi); oncv (Onchocerca
                                  volvulus); apicom (Apicomplexa); plas
                                  (Plasmodium); plasf (Plasmodium falciparum);
                                  toxo (Toxoplasma gondii); crypto
                                  (Cryptosporidium parvum); eimeria (Eimeria);
                                  kineto (Kinetoplastids); schisto
                                  (Schistosoma); schunq (David Johnson's
                                  unique Schistosoma); mansoni (Schistosoma
                                  mansoni); jap (Schistosoma japonicum);
                                  japunq (David Johnson' unique Schistosoma
                                  japonicum); cercunq (David Johnson's unique
                                  Schistosoma cercarial stage); entamoeba
                                  (entamoeba); bgESTnr (nonredundant big EST
                                  databank))
*  -paraprotdb         menu       [X] Parasite proteome databank (currently
                                  nothing available !)
*  -asdnucdb           menu       [altsgen] Alternatively spliced nucleic acid
                                  databank (Values: altsgen (AltSplice
                                  confirmed genes); altsiso (AltSplice
                                  confimed isoforms); aedb (AEDB alternative
                                  exons))
*  -asdprotdb          menu       [apdb] Alternatively spliced protein
                                  databank (Values: apdb (ASD peptides))
  [-outfile]           outfile    [*.ebi_blast] Output file name

   Additional (Optional) qualifiers (* if not always prompted):
*  -strand             selection  [both] Strand to search. By default BLAST
                                  searches both strands, but for blastn and
                                  (t)blastx you can choose to search only the
                                  top or bottom strand of the databank
                                  respectively query sequence.
   -sensitivity        menu       [normal] Sensitivity. This selector sets a
                                  pre-selected set of parameters, see on-line
                                  manual. Note the trade-off between
                                  sensitivity and speed. (Values: vlow (very
                                  low); low (low); medium (medium); normal
                                  (normal); high (high))
   -expect             float      [10.0] E() value = number of databank
                                  sequences with same or higher score that you
                                  expect to find by chance. BLAST lists
                                  sequences with an E() value lower than the
                                  cutoff. (Number from 0.000 to 1000.000)
*  -matrix             selection  [6] Amino acid comparison matrix
*  -nucfilter          selection  [none] Algorithm for removing low complexity
                                  segments out of nucleic acid query sequence
*  -protfilter         selection  [none] Algorithm for removing low complexity
                                  segments out of protein query sequence
                                  (eventually after translation)
   -statistics         menu       [sump] Algorithm for assessing statistical
                                  significance of multiple hits (Values: sump
                                  (sum statistics); poisson (Poisson
                                  statistics); kap (none (individual
                                  Karlin-Altschul score)))
   -listsize           integer    [100] Show only the n best scoring sequences
                                  that satisfy E() cutoff (Integer 0 or more)
   -align              integer    [50] Show only alignments for the n first
                                  sequences (Integer 0 or more, but not >
                                  listsize)
   -sort               menu       [pvalue] Method for sorting reported
                                  databank sequences in output (by default
                                  according to combined P-value for all hits)
                                  (Values: pvalue (P-value); count (number of
                                  hits); highscore (score of best hit);
                                  totalscore (sum of scores))
   -topcombon          integer    [0] Report for each databank sequence only
                                  up to n compatible top scoring combinations
                                  of multiple hits (default is to report all
                                  hits unsorted) (Integer 0 or more)

   Advanced (Unprompted) qualifiers:
   -xml                boolean    Use XML formatting

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of the sequence to be used
   -send1              integer    End of the sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
-program Search type : nuc. or prot.
1 (blastn (nuc against nuc))
2 (blastp (prot against prot))
3 (blastx (nuc translated against prot))
4 (tblastn (prot against nuc translated))
5 (tblastx (nuc translated against nuc translated))
2
[-sequence]
(Parameter 1)
Query sequence Readable sequence Required
-dbtype Databank type
general (general databank)
parasite (parasite genome databank (nucleic acid db only !))
ASD (alternative splicing databank)
general
-gennucdb General nucleic acid sequence databank
em_rel (EMBL databank last release without EST, GSS, HTG, HTC)
em_rel_hum (EMBL human sequences)
em_rel_mus (EMBL mouse sequences)
em_rel_rod (EMBL other rodents sequences)
em_rel_mam (EMBL other mammal sequences)
em_rel_vrt (EMBL other vertebrate sequences)
em_rel_inv (EMBL invertebrate sequences)
em_rel_pln (EMBL plant sequences)
em_rel_fun (EMBL fungi sequences)
em_rel_pro (EMBL prokaryote sequences)
em_rel_phg (EMBL bacteriophage sequences)
em_rel_vrl (EMBL other viral sequences)
em_rel_env (EMBL environmental sample sequences)
em_rel_tgn (EMBL transgenic sequences)
em_rel_syn (EMBL synthetic sequences)
em_rel_unc (EMBL unclassified sequences)
em_rel_std (EMBL standard sequences)
em_rel_htg (EMBL High Throughput Genome sequences)
em_rel_htc (EMBL High Throughput cDNA sequences)
em_rel_pat (EMBL patent sequences)
em_rel_sts (EMBL Sequence Tagged Site sequences)
em_rel_est (EMBL Expressed Sequence Tags)
em_rel_gss (EMBL Genome Survey Sequences)
em_rel_tsa (EMBL shotgun Transcriptome Assembly)
em_rel_tpa (EMBL Third Party Annotations)
em_rel_std_hum (EMBL human standard sequences)
em_rel_std_mus (EMBL mouse standard sequences)
em_rel_std_rod (EMBL other rodents standard sequences)
em_rel_std_mam (EMBL other mammal standard sequences)
em_rel_std_vrt (EMBL other vertebrate standard sequences)
em_rel_std_inv (EMBL invertebrate standard sequences)
em_rel_std_pln (EMBL plant standard sequences)
em_rel_std_fun (EMBL fungi standard sequences)
em_rel_std_pro (EMBL prokaryote standard sequences)
em_rel_std_phg (EMBL bacteriophage standard sequences)
em_rel_std_vrl (EMBL other viral standard sequences)
em_rel_std_env (EMBL environmental sample sequences)
em_rel_std_tgn (EMBL transgenic standard sequences)
em_rel_std_syn (EMBL synthetic standard sequences)
em_rel_std_unc (EMBL unclassified standard sequences)
em_rel_htg_hum (EMBL human HTG sequences)
em_rel_htg_mus (EMBL mouse HTG sequences)
em_rel_htg_rod (EMBL other rodent HTG sequences)
em_rel_htg_mam (EMBL other mammal HTG sequences)
em_rel_htg_vrt (EMBL other vertebrate HTG sequences)
em_rel_htg_inv (EMBL invertebrate HTG sequences)
em_rel_htg_pln (EMBL plant HTG sequences)
em_rel_htg_fun (EMBL fungi HTG sequences)
em_rel_htg_pro (EMBL prokaryote HTG sequences)
em_rel_htg_phg (EMBL bacteriophage HTG sequences)
em_rel_htg_vrl (EMBL other viral HTG sequences)
em_rel_htg_env (EMBL environmental sample HTG sequences)
em_rel_htc_hum (EMBL human HTC sequences)
em_rel_htc_mus (EMBL mouse HTC sequences)
em_rel_htc_rod (EMBL other rodent HTC sequences)
em_rel_htc_mam (EMBL other mammal HTC sequences)
em_rel_htc_vrt (EMBL other vertebrate HTC sequences)
em_rel_htc_inv (EMBL invertebrate HTC sequences)
em_rel_htc_pln (EMBL plant HTC sequences)
em_rel_htc_fun (EMBL fungi HTC sequences)
em_rel_htc_pro (EMBL prokaryote HTC sequences)
em_rel_pat_hum (EMBL human patent sequences)
em_rel_pat_mus (EMBL mouse patent sequences)
em_rel_pat_rod (EMBL other rodents patent sequences)
em_rel_pat_mam (EMBL other mammal patent sequences)
em_rel_pat_vrt (EMBL other vertebrate patent sequences)
em_rel_pat_inv (EMBL invertebrate patent sequences)
em_rel_pat_pln (EMBL plant patent sequences)
em_rel_pat_fun (EMBL fungi patent sequences)
em_rel_pat_pro (EMBL prokaryote patent sequences)
em_rel_pat_phg (EMBL bacteriophage patent sequences)
em_rel_pat_vrl (EMBL other viral patent sequences)
em_rel_pat_env (EMBL environmental sample patent sequences)
em_rel_pat_syn (EMBL synthetic patent sequences)
em_rel_pat_unc (EMBL unclassified patent sequences)
em_rel_sts_hum (EMBL human STS sequences)
em_rel_sts_mus (EMBL mouse STS sequences)
em_rel_sts_rod (EMBL other rodents STS sequences)
em_rel_sts_mam (EMBL other mammal STS sequences)
em_rel_sts_vrt (EMBL other vertebrate STS sequences)
em_rel_sts_inv (EMBL invertebrate STS sequences)
em_rel_sts_pln (EMBL plant STS sequences)
em_rel_sts_fun (EMBL fungi STS sequences)
em_rel_sts_pro (EMBL prokaryote STS sequences)
em_rel_est_hum (EMBL human EST)
em_rel_est_mus (EMBL mouse EST)
em_rel_est_rod (EMBL other rodents EST)
em_rel_est_mam (EMBL other mammals EST)
em_rel_est_vrt (EMBL other vertebrate EST)
em_rel_est_inv (EMBL invertebrate EST)
em_rel_est_pln (EMBL plant EST)
em_rel_est_fun (EMBL fungi EST)
em_rel_est_pro (EMBL prokaryote EST)
em_rel_est_env (EMBL environmental sample EST)
em_rel_gss_hum (EMBL human GSS)
em_rel_gss_mus (EMBL mouse GSS)
em_rel_gss_rod (EMBL other rodents GSS)
em_rel_gss_mam (EMBL other mammals GSS)
em_rel_gss_vrt (EMBL other vertebrate GSS)
em_rel_gss_inv (EMBL invertebrate GSS)
em_rel_gss_pln (EMBL plant GSS)
em_rel_gss_fun (EMBL fungi GSS)
em_rel_gss_pro (EMBL prokaryote GSS)
em_rel_gss_phg (EMBL bacteriophage GSS)
em_rel_gss_vrl (EMBL other viral GSS)
em_rel_gss_env (EMBL environmental sample GSS)
em_rel_gss_tgn (EMBL transgenic GSS)
em_rel_tsa_vrt (EMBL other vertebrate Transcriptome Assembly)
em_rel_tsa_inv (EMBL invertebrate Transcriptome Assembly)
em_rel_tsa_pln (EMBL plant Transcriptome Assembly)
em_rel_tsa_fun (EMBL fungi Transcriptome Assembly)
em_rel_tpa_hum (EMBL human Third Party Annotations)
em_rel_tpa_mus (EMBL mouse Third Party Annotations)
em_rel_tpa_rod (EMBL other rodents Third Party Annotations)
em_rel_tpa_mam (EMBL other mammal Third Party Annotations)
em_rel_tpa_vrt (EMBL other vertebrate Third Party Annotations)
em_rel_tpa_inv (EMBL invertebrate Third Party Annotations)
em_rel_tpa_pln (EMBL plant Third Party Annotations)
em_rel_tpa_fun (EMBL fungi Third Party Annotations)
em_rel_tpa_pro (EMBL prokaryote Third Party Annotations)
em_rel_tpa_phg (EMBL bacteriophage Third Party Annotations)
em_rel_tpa_vrl (EMBL other viral Third Party Annotations)
em_rel_tpa_syn (EMBL synthetic Third Party Annotations)
em_rel_tpa_unc (EMBL unclassified Third Party Annotations)
emcds (EMBL coding sequences)
emvec (EMBL vector subset)
imgtligm (IMGT/LIGM databank)
imgthla (IMGT/HLA databank)
hgvbase (HGVBASE (European SNP databank))
em_rel
-genprotdb General protein sequence databank
uniprot (UniProt databank)
uniref100 (UniRef100)
uniref100_seg (UniRef100 SEG filtered)
uniref90 (UniRef90)
uniref50 (UniRef50)
uniparc (UniParc)
swissprot (UniProt/SwissProt)
ipi (IPI (International Protein Index))
prints (sequences from PRINTS)
pdb (sequences from PDB)
sgt (MSD structural genomics targets)
intact (sequences from IntAct)
imgthlap (IMGT/HLA proteins)
epop (European patent sequences)
jpop (Japanese patent sequences)
kpop (Korean patent sequences)
uspop (U.S.A. patent sequences)
uniprot
-paranucdb Parasite genome databank
nem (Nematoda except C. elegans)
fil (Filaria)
brugia (Brugia malayi)
oncv (Onchocerca volvulus)
apicom (Apicomplexa)
plas (Plasmodium)
plasf (Plasmodium falciparum)
toxo (Toxoplasma gondii)
crypto (Cryptosporidium parvum)
eimeria (Eimeria)
kineto (Kinetoplastids)
schisto (Schistosoma)
schunq (David Johnson's unique Schistosoma)
mansoni (Schistosoma mansoni)
jap (Schistosoma japonicum)
japunq (David Johnson' unique Schistosoma japonicum)
cercunq (David Johnson's unique Schistosoma cercarial stage)
entamoeba (entamoeba)
bgESTnr (nonredundant big EST databank)
nem
-paraprotdb Parasite proteome databank. CURRENTLY THERE IS NOTHING AVAILABLE ! This parameter is only present for consistency and forward compatibility.
-asdnucdb Alternatively spliced nucleic acid databank
altsgen (AltSplice confirmed genes)
altsiso (AltSplice confimed isoforms)
aedb (AEDB alternative exons)
altsgen
-asdprotdb Alternatively spliced protein databank
apdb (ASD peptides)
apdb
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.ebi_<program>
Additional (Optional) qualifiers Allowed values Default
-strand Strand to search. By default BLAST searches both strands, but for blastn and (t)blastx you can choose to search only the top or bottom strand of the databank respectively query sequence. Choose from selection list of values both
-sensitivity Sensitivity. This selector sets a pre-selected set of parameters, see on-line manual. Note the trade-off between sensitivity and speed.
vlow (very low)
low (low)
medium (medium)
normal (normal)
high (high)
normal
-expect E() value = number of databank sequences with same or higher score that you expect to find by chance. BLAST lists sequences with an E() value lower than the cutoff. Number from 0.000 to 1000.000 10.0
-matrix Amino acid comparison matrix Choose from selection list of values BLOSUM62
-nucfilter Algorithm for removing low complexity segments out of nucleic acid query sequence none
dust
none
-protfilter Algorithm for removing low complexity segments out of protein query sequence (eventually after translation) none
SEG
XNU
SEG+XNU
none
-statistics Algorithm for assessing statistical significance of multiple hits
sump (sum statistics)
poisson (Poisson statistics)
kap (none (individual Karlin-Altschul score))
sump
-listsize Show only the n best scoring sequences that satisfy E() cutoff Integer 0 or more 100
-align Show only alignments for the n first sequences Integer 0 or more, but not > listsize 50
-sort Method for sorting reported databank sequences in output (by default according to combined P-value for all hits)
pvalue (P-value)
count (number of hits)
highscore (score of best hit)
totalscore (sum of scores)
pvalue
-topcombon Report for each databank sequence only up to n compatible top scoring combinations of multiple hits (default is to report all hits unsorted) Integer 0 or more 0
Advanced (Unprompted) qualifiers Allowed values Default
-xml Use XML formatting Boolean value Yes/No No

Input file format

ebi_blast searches a query sequence against a search sequence set. For the sake of not overloading the EBI Web server you can only submit one query sequence at a time. You can use any normal sequence USA.

You can select the search set from the list of databanks available at the EBI. Because of the great number of choices available the list has been split into three submenus.

Output file format

By default the output is a standard WU-BLAST output. It is possible (parameter -xml) to obtain instead an output in XML format.

Output files for usage example

File: x15320.ebi_blastn

BLASTN 2.0MP-WashU [04-May-2006] [linux26-x64-I32LPF64 2006-05-10T17:22:28]

Copyright (C) 1996-2006 Washington University, Saint Louis, Missouri USA.
All Rights Reserved.

Reference:  Gish, W. (1996-2006) http://blast.wustl.edu

Notice:  this program and its default parameter settings are optimized to find
nearly identical sequences rapidly.  To identify weak protein similarities
encoded in nucleic acid, use BLASTX, TBLASTN or TBLASTX.

Query=  Sequence
        (2372 letters)

Database:  plasf
           23,720 sequences; 18,755,443 total letters.

WARNING:  Use of the hspsepSmax parameter should be considered with long
          database sequences, to improve the biological relevance of the HSP
          groups that are assembled and to improve the statistical
          discrimination of these groups from random background.
Searching....10....20....30....40....50....60....70....80....90....100% done

                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

EMBL:AC006280  Plasmodium falciparum chromosome 12 clone ...   369  4.2e-08   1


>EMBL:AC006280  Plasmodium falciparum chromosome 12 clone 3D7, *** SEQUENCING
            IN PROGRESS ***, 1 ordered pieces.
        Length = 163,443

  Plus Strand HSPs:

 Score = 369 (61.4 bits), Expect = 4.2e-08, P = 4.2e-08
 Identities = 199/333 (59%), Positives = 199/333 (59%), Strand = Plus / Plus

Query:    595 TGGTTGACTACTCTGCGCCAAACGTGGCGAAAGAGATGCATGTCGGTCACCTGCGCTCTA 654
              ||||||| |  ||| | |||||  | || |||||||||||||| |||||  | ||||| |
Sbjct: 148864 TGGTTGATTTTTCTTCACCAAATATAGCTAAAGAGATGCATGTTGGTCATTTACGCTCAA 148923

Query:    655 CCATTATTGGTGAC-GCAGCAGT-GCGTACTCTGGAGTTCCTCGG-TCACAAAGTGATTC 711
              | || || |||||| | |   || | ||| | || | ||       | | | |   || |
Sbjct: 148924 CTATAATAGGTGACAGTATATGTAGAGTATT-TGAATTTTTAAAAATTA-ATACCCAT-C 148980

Query:    712 GCGCAAACCACGTCGGCGACTGGGGCACTCAGTTCGGTATGCTGATTGCATGGCTGGAAA 771
              | |  || || || || || ||||| ||||| || |||||| | ||    |   |  |||
Sbjct: 148981 GAGTTAATCATGTAGGTGATTGGGGTACTCAATTTGGTATGATTATAAATTATATAAAAA 149040

Query:    772 AGCAGCAGCAGGAAAACGCCGGTGAAATGG-AGCTGGCTG-ACCTTGA-AGGTTTCTACC 828
                ||  | | | |       |  ||||||  || |    | |  || | | |||| || |
Sbjct: 149041 CACATTATCCGAATTTTAAAGAAGAAATGCCAGATTTAAGTAATTTAACAAGTTTATATC 149100

Query:    829 GCGATGCGAAAAAGCAT-TACGATGAAGATGAAGAGTTCGCCGAG-CG-CGCACGTAACT 885
                ||  | |||||  || || |||| |||| |||| || |   |  |  |  | | || |
Sbjct: 149101 AAGAATCTAAAAAA-ATGTATGATGCAGATAAAGAATTTGAAAAATCATCTAAAGAAAAT 149159

Query:    886 ACGTGGTAAAACTGCAAAGCGGTGACGAATATT 918
               |    ||||  | ||||    ||| ||| |||
Sbjct: 149160 GCAAT-TAAAT-TACAAAATAATGATGAAGATT 149190


Parameters:
  E=10
  B=250
  V=500
  mformat="7,/ebi/extserv/blast-work/interactive/blast-20090218-16030748_app.xml"
  mformat=1
  sump
  filter=none
  sort_by_pvalue
  putenv="WUBLASTMAT=/ebi/extserv/bin/wu-blast/matrix"
  putenv="WUBLASTDB=/ebi/services/idata/v2422/blastdb"
  putenv="WUBLASTFILTER=/ebi/extserv/bin/wu-blast/filter"

  ctxfactor=2.00

  Query                        -----  As Used  -----    -----  Computed  ----
  Strand MatID Matrix name     Lambda    K       H      Lambda    K       H
   +1      0   +5,-4           0.192   0.182   0.357    same    same    same
               Q=10,R=10       0.104   0.0151  0.0600    n/a     n/a     n/a
   -1      0   +5,-4           0.192   0.182   0.357    same    same    same
               Q=10,R=10       0.104   0.0151  0.0600    n/a     n/a     n/a

  Query
  Strand MatID  Length  Eff.Length     E     S  W   T   X   E2      S2
   +1      0     2372      2372       9.6  179 11 n/a  73  0.042    83
                                                      134  0.047   124
   -1      0     2372      2372       9.6  179 11 n/a  73  0.042    83
                                                      134  0.047   124


Statistics:

  Database:  /ebi/services/idata/v2422/blastdb/plasf
   Title:  plasf
   Posted:  1:57:36 PM GMT Nov 9, 2004
   Created:  1:57:36 PM GMT Nov 9, 2004
   Format:  XDF-1
   # of letters in database:  18,755,443
   # of sequences in database:  23,720
   # of database sequences satisfying E:  1
  No. of states in DFA:  256 (512 KB)
  Total size of DFA:  709 KB (2138 KB)
  Time to generate neighborhood:  0.00u 0.00s 0.00t   Elapsed:  00:00:00
  No. of threads or processors used:  4
  Search cpu time:  0.08u 0.05s 0.13t   Elapsed:  00:00:00
  Total cpu time:  0.08u 0.05s 0.13t   Elapsed:  00:00:01
  Start:  Wed Feb 18 16:03:08 2009   End:  Wed Feb 18 16:03:09 2009
WARNINGS ISSUED:  1

Data files

None.

Notes

Since it could take some time before the server at the EBI has processed the job and sent the result, it is preferable to start ebi_blast "in batch". If you work under wEMBOSS you can do that by writing your E-mail address in the box at the bottom of the page.

References

  1. Pillai S., Silventoinen V., Kallio K., Senger M., Sobhany S., Tate J., Velankar S., Golovin A., Henrick K., Rice P., Stoehr P., Lopez R. SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Res. 33(1):W25-W28 (2005)

Warnings

The configuration file of ebi_blast was last updated on 17 September 2007. Because the collection of databanks available at the EBI is changing there could be some difference between the list of databanks in the menu's and the list of databanks effectively available.

Diagnostic Error Messages

It can happen that the submission of the job fails or that anyhow the job ID number has not been successfully retrieved and stored. In that case ebi_blast will give up and issue the message :
  ERROR !!  EBI Web Server failed to return job ID

There are various error messages related to the retrieval of the result of a submitted job :

  EBI Web Server failed to respond on <date+time>
  you can later try manual check with command :
  /opt/sw/EBIWS/wublast.pl  --status --jobid <jobid>

  ERROR !!  some error occurred on the EBI Web Server

  ERROR !!  EBI Web Server could not retrieve job result

  ERROR !!  EBI Web Server executed job but failed to retrieve output

  job still not finished after more than 30 h. I QUIT.
  you can try manual check with command :
  /opt/sw/EBIWS/wublast.pl  --status --jobid <jobid>

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program nameDescription
blast BLAST search of query sequence(s) against sequence search set
ebi_fasta fastA search of query sequence against sequence databank using EBI Web Services
fasta fastA search of query sequence(s) against sequence search set
fasts Protein identification from peptides using fastA algorithm
phiblast Search protein sequence set combining matching of pattern with local alignment of a query sequence surrounding the match
psiblast Iterative BLAST search with generation of profile of protein sequence against protein sequence set
ebi_tmhmm Reports membrane spanning regions using EBI Web Services

Author(s)

The application ebi_blast was written by Guy Bottu (gbottu@vub.ac.be)
BEN, ULB, Brussels, Belgium

The program blasta from the WU-BLAST suite itself was written by Warren Gish (Washington University in St. Louis School of Medecine, Missouri 63110 USA) and the SOAP based Web services client and server were developed at the EMBL-EBI (Hinxton, UK).

History

Completed 19 May 2006
Modified 22 May 2007 - adapted to changes in EBI Web Services
Modified 9 April 2009 - adapted to changes in EBI Web Services

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.