ebi_tmhmm

 

Function

Reports membrane spanning regions using EBI Web Services

Description

ebi_tmhmm uses the Web services of the EBI to search a protein sequence or the translated open reading frames of a nucleic acid sequence for potential transmembrane alpha-helices, using the TMHMM program of Krogh et al. The method is based upon a Hidden Markov Model (HMM) that has been trained on a set of membrane proteins with helical membrane spanning regions.

ebi_tmhmm transforms the result sent by the EBI Web server into a file in GFF format, which can easily be given as input to other EMBOSS programs or to any software that supports GFF format. ebi_tmhmm also automatically starts the EMBOSS program showfeat to provide a panoramic overview of the localization of the predicted transmembrane segments. For an input protein sequence it also automatically starts the EMBOSS Embassadir program topo to make a graphical representation.

Note that the EBI sometimes disables nucleic acid searches and in this case a nucleic acid query might yield no result, although there actually are transmembrane coding segments. You can check in the InterProScan home page whether this is currently the case.

Algorithm

ebi_tmhmm relies on the SOAP based interaction between the Perl client interproscan.pl and the Web server at http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl, which provides access to the InterProScan server of the EBI (the InterProScan server by default submits a sequence to a whole series of motif searching tools in parallel ; ebi_tmhmm however sets a parameter for only using TMHMM).

TMHMM is based upon a Hidden Markov Model (HMM) architecture. The architecture is made up of 7 types of states corresponding to the core of the transmembrane helix, helix caps, cytoplasmic loops, short and long cytoplasmic loop states, and globular domains that are part of each loop. The prediction of transmembrane helices is done by finding an optimal alignment of the sequence with the model using the N-Best algorithm. In the N-Best algorithm, the algorithm uses the model architecture to find the best labeling of the sequence, given the model.

Usage

Here is a sample session with ebi_tmhmm

> ebi_tmhmm
Reports membrane spanning regions using EBI Web Services
Input sequence: sw:opsd_human
Output file [opsd_human.ebi_tmhmm]:
Graph type [x11]: cps

  Starting topo with parameter -sections=39-59,74-96,115-133,154-174,202-222,254-276,286-306

Created topo.ps

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-sequence]          sequence   Sequence filename and optional format, or
                                  reference (input USA)
  [-outfile]           outfile    [*.ebi_tmhmm] Output file name
*  -graph              graph      [$EMBOSS_GRAPHICS value, or x11] Graph type
                                  (ps, hpgl, hp7470, hp7580, meta, cps, x11,
                                  tekt, tek, none, data, xterm, png, gif)

   Additional (Optional) qualifiers (* if not always prompted):
*  -orfminsize         integer    [100] Minimum open reading frame size. Open
                                  reading frames of at least this size
                                  (default = 100) are translated and analyzed.
                                  (Integer 0 or more, but not > sequence
                                  length)
*  -gencode            menu       [0] Genetic code for translating sequence
                                  (Values: 0 (Standard); 1 (Standard (with
                                  alternative initiation codons)); 2
                                  (Vertebrate Mitochondrial); 3 (Yeast
                                  Mitochondrial); 4 (Mold, Protozoan,
                                  Coelenterate Mitochondrial and
                                  Mycoplasma/Spiroplasma); 5 (Invertebrate
                                  Mitochondrial); 6 (Ciliate Macronuclear and
                                  Dasycladacean); 9 (Echinoderm
                                  Mitochondrial); 10 (Euplotid Nuclear); 11
                                  (Bacterial); 12 (Alternative Yeast Nuclear);
                                  13 (Ascidian Mitochondrial); 14 (Flatworm
                                  Mitochondrial); 15 (Blepharisma
                                  Macronuclear); 16 (Chlorophycean
                                  Mitochondrial); 21 (Trematode
                                  Mitochondrial); 22 (Scenedesmus obliquus);
                                  23 (Thraustochytrium Mitochondrial))

   Advanced (Unprompted) qualifiers:
   -[no]showfeat       boolean    [Y] Call showfeat to show position membrane
                                  spanning regions in sequence
   -[no]topo           boolean    [Y] Open output with topo (only for protein)

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of the sequence to be used
   -send1              integer    End of the sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   "-graph" associated qualifiers
   -gprompt            boolean    Graph prompting
   -gdesc              string     Graph description
   -gtitle             string     Graph title
   -gsubtitle          string     Graph subtitle
   -gxtitle            string     Graph x axis title
   -gytitle            string     Graph y axis title
   -goutfile           string     Output file for non interactive displays
   -gdirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence filename and optional format, or reference (input USA) Readable sequence Required
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.ebi_tmhmm
-graph Graph type EMBOSS has a list of known devices, including ps, hpgl, hp7470, hp7580, meta, cps, x11, tekt, tek, none, data, xterm, png, gif EMBOSS_GRAPHICS value, or x11
Additional (Optional) qualifiers Allowed values Default
-orfminsize Minimum open reading frame size. Open reading frames of at least this size (default = 100) are translated and analyzed. Integer 0 or more, but not > sequence length 100
-gencode Genetic code for translating sequence
0 (Standard)
1 (Standard (with alternative initiation codons))
2 (Vertebrate Mitochondrial)
3 (Yeast Mitochondrial)
4 (Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma)
5 (Invertebrate Mitochondrial)
6 (Ciliate Macronuclear and Dasycladacean)
9 (Echinoderm Mitochondrial)
10 (Euplotid Nuclear)
11 (Bacterial)
12 (Alternative Yeast Nuclear)
13 (Ascidian Mitochondrial)
14 (Flatworm Mitochondrial)
15 (Blepharisma Macronuclear)
16 (Chlorophycean Mitochondrial)
21 (Trematode Mitochondrial)
22 (Scenedesmus obliquus)
23 (Thraustochytrium Mitochondrial)
0
Advanced (Unprompted) qualifiers Allowed values Default
-[no]showfeat Call showfeat to show position membrane spanning regions in sequence Boolean value Yes/No Yes
-[no]topo Open output with topo (only for protein) Boolean value Yes/No Yes

Input file format

ebi_tmhmm reads any normal sequence USA.

Output file format

Output files for usage example

File: opsd_human.ebi_tmhmm

OPSD_HUMAN	TMHMM	transmembrane_regions	39	59	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	74	96	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	115	133	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	154	174	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	202	222	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	254	276	.	.	.
OPSD_HUMAN	TMHMM	transmembrane_regions	286	306	.	.	.

File: opsd_human.ebi_tmhmm.showfeat

OPSD_HUMAN
Rhodopsin (Opsin-2).
|==========================================================| 348
     |---| |---|  |--|   |---|   |---|    |---| |--|         transmembrane_regions

Graphics File: topo.1.cps

[topo result]

Data files

None.

Notes

Since it could take some time before the server at the EBI has processed the job and sent the result, it is preferable to start ebi_tmhmm "in batch". If you work under wEMBOSS you can do that by writing your E-mail address in the box at the bottom of the page.

You can give the GFF file written by ebi_tmhmm as input to other EMBOSS programs by composing a command line of type :
<program> <sequence> -ufo=<GFF file>

References

  1. Pillai S., Silventoinen V., Kallio K., Senger M., Sobhany S., Tate J., Velankar S., Golovin A., Henrick K., Rice P., Stoehr P., Lopez R. SOAP-based services provided by the European Bioinformatics Institute. Nucleic Acids Res. 33(1):W25-W28 (2005)
  2. A. Krogh, B. Larsson, G. von Heijne, and E. L. L. Sonnhammer. Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. Journal of Molecular Biology, 305(3):567-580, January 2001.

Warnings

None.

Diagnostic Error Messages

If an input nucleic acid sequence has a length of more than 5000 the program issues the error message :
  The length of the sequence is longer than the maximum of 5000 allowed for nucleic acid sequences

It can happen that the submission of the job fails or that anyhow the job ID number has not been successfully retrieved and stored. In that case ebi_tmhmm will give up and issue the message :

  ERROR !!  EBI Web Server failed to return job ID

There are various error messages related to the retrieval of the result of a submitted job :

  EBI Web Server failed to respond on <date+time>
  you can later try manual check with command :
  /opt/sw/EBIWS/interproscan.pl  --status --jobid <jobid>

  ERROR !!  some error occurred on the EBI Web Server

  ERROR !!  EBI Web Server could not retrieve job result

  ERROR !!  EBI Web Server executed job but failed to retrieve output

  job still not finished after more than 30 h. I QUIT.
  you can try manual check with command :
  /opt/sw/EBIWS/interproscan.pl  --status --jobid <jobid>

It can happen that the algorithm fails to find likely potential transmembrane segments in the sequence. In that case you will get the message :

  No transmembrane regions were found !

Exit status

It exits prematurely with status 255 and an error message if an input nucleic acid sequence has a length of 5000 or more.

Known bugs

None.

See also

Program nameDescription
garnier Predicts protein secondary structure
helixturnhelix Report nucleic acid binding motifs
hmoment Hydrophobic moment calculation
jpred Predicts protein secondary structure using neural networks
pepcoil Predicts coiled coil regions
pepnet Displays proteins as a helical net
pepwheel Shows protein sequences as helices
proftmb Reports transmembrane beta barrels
tmap Displays membrane spanning regions
topo Draws an image of a transmembrane protein
ebi_blast WU-BLAST search of query sequence against sequence databank using EBI Web Services
ebi_fasta fastA search of query sequence against sequence databank using EBI Web Services

Author(s)

The wrapper application ebi_tmhmm was written by Guy Bottu (gbottu@vub.ac.be)
BEN, ULB, Brussels, Belgium

The program TMHMM itself was written by A. Krogh, E. L. L. Sonnhammer and G. von Heijne, Center for Biological Sequence Analysis, Technical University of Denmark. The SOAP based Web services client and server were developed at the EMBL-EBI (Hinxton, UK).

History

Completed 31 May 2006
Modified 24 May 2007 - adapted to changes in EBI Web Services

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.