|
|
unafold |
unafold can find structures containing GC, AU (or AT) and GU (or GT) base pairs. It relies on Gibbs free enthalpy of formation calculations and has tables with energy values for base stacking and the effect of the formation of stems, hairpin loops, bulge loops and multibranch loops. It does however not take into account tertiary interactions, modified bases and the possible formation of "pseudo-knots".
By default unafold considers the molecule to be linear RNA at 37°C. You can however precise that it must consider the molecule to be circular, (single-stranded) DNA and/or at another temperature. The current version (3.6) of UNAFOLD has three different built-in sets of energy tables :
unafold can operate in 3 different modes :
The chance of finding the correct folding can be greatly improved by adding a number of "constaints", that is provividing directives (based on experimental evidence) about which bases cannot or must pair (see input).
and
. Thus base triples are
deliberately excluded from the definition of secondary structure. The
reason is that base triples pose an even greater challenge, because the
exact nature of the triple cannot be predicted in advance, and even if
it could, we have no data for assigning free energies.
If UNAFOLD is working in partition function calculation mode it stochastically samples a number of likely foldings, as set by the -maxfold=n parameter ; note that you can get several times the same folding. If UNAFOLD is working in MFOLD mode it will compute a number of foldings as set by the -suboptimality=P, -window=W and -maxfold=MAX parameters. W may be thought of as a distance parameter. The distance between 2 base pairs i.j and i'.j' may be defined as max{|i-i'|,|j-j'|}. Then if k-1 foldings have already been predicted, the kth will have at leas W base pairs that are at least a distance W from any of the base pairs in the first k-1 foldings. UNAFOLD continues sampling foldings until k = MAX or until there are no more foldings with a Gibbs free enthalpy of formation above %P of the minimum.
If W is not specified, UNAFOLD will choose its value from this table based on sequence length. The user is encouraged to experiment with this parameter.
| Sequence length | Default window size |
| 1-29 | 0 |
| 30-49 | 1 |
| 50-119 | 2 |
| 120-199 | 3 |
| 200-299 | 5 |
| 300-399 | 7 |
| 400-499 | 8 |
| 500-599 | 10 |
| 600-699 | 11 |
| 700-799 | 12 |
| 800-1199 | 15 |
| 1200-1999 | 20 |
| 25 |
> unafold -storeall
Prediction of optimal and suboptimal RNA or DNA secondary structure
Input nucleotide sequence: embl:k00152
1 : energy minimization
2 : energy minimization with suboptimal foldings
3 : partition function calculation
Mode [2]:
Molecule is single-stranded DNA rather than RNA [N]:
Molecule is circular [N]:
Base name for output files [K00152]:
png : PNG
jpg : JPEG
Graphic output format [png]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
[-sequence] sequence Nucleotide sequence filename and optional
format, or reference (input USA)
-mode menu [2] Mode. The default mode 2 corresponds to
the old MFOLD software. (Values: 1 (energy
minimization); 2 (energy minimization with
suboptimal foldings); 3 (partition function
calculation))
-dna toggle Molecule is single-stranded DNA rather than
RNA
* -circular boolean Molecule is circular
-basename string [$(sequence.name)] Base name for output
files (Any string is accepted)
-graph menu [png] Graphic output format (Values: png
(PNG); jpg (JPEG))
Additional (Optional) qualifiers (* if not always prompted):
-temperature integer [37] Temperature (Integer from -5 to 105)
* -na float [1.0] Na+ molar concentration (for DNA only)
(Number 0.000 or more)
* -mg float [0.0] Mg++ molar concentration (for DNA
only) (Number 0.000 or more)
-minstem integer [2] Minimum stem size. The default implies
that a helix of size 1, that is a base pair
that does not stack on another base pair on
either side, is not considered (Integer 1 or
more)
-maxloop integer [30] Maximum interior or bulge loop size
(Integer 0 or more)
-maxbp integer [No limit] Maximum distance between pairing
bases (Integer 0 or more)
* -suboptimality integer [5] Maximum percent free energy
suboptimality for computing suboptimal
structures when UNAFOLD is working in MFOLD
mode (Integer from 0 to 100)
* -window integer [depends on sequence length] Base pair
distance window for suboptimal foldings when
UNAFOLD is working in MFOLD mode. A
negative value sets sequence length
dependent value (see on-line manual). (Any
integer value)
* -maxfold integer [maximum 100 suboptimal foldings or 10
stochastically sampled foldings] Number of
predicted foldings. If UNAFOLD is working in
MFOLD mode this corresponds to the maximum
number of foldings within the limits set by
the -suboptimality and -window parameters,
if UNAFOLD is working in partition function
calculation mode this corresponds to the
number of stochastically sampled foldings.
(Integer 1 or more)
* -annotation menu [none] Structure annotation mode. If UNAFOLD
is working in MFOLD mode 'ann' and
'ss-count' stand for colouring the bases
according to the number of alternative base
pairings respectively the number of
single-stranded foldings. If UNAFOLD is
working in partition function calculation
mode 'ann' and 'ss-count' stand for
colouring the bases according to base
pairing probability respectively
single-stranded probability. (Values: none
(No annotation); ann (Base pairing
annotation); ss-count (Single stranded
annotation))
Advanced (Unprompted) qualifiers:
-constraints infile File with constraints that force or prohibit
base pairings. See on-line manual for
syntax
-cutoff float [1e-6] Cutoff for display in basepair
probability plot (Number from 0.000 to
1.000)
-display menu [auto] Structure display mode (Values: auto
(write base symbols if sequence is shorter
than 800); bases (always write base
symbols); lines (do not write base symbols))
-numbering integer [depends on sequence length] Base numbering
frequency. A value of 0 triggers the default
(10 for seq. shorter than 50, 50 for seq.
longer than 300, otherwise 20). (Integer 0
or more)
-storeall boolean Store all UNAFOLD output files
Associated qualifiers:
"-sequence" associated qualifiers
-sbegin1 integer Start of the sequence to be used
-send1 integer End of the sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||
|---|---|---|---|---|---|---|---|---|---|
| [-sequence] (Parameter 1) |
Nucleotide sequence filename and optional format, or reference (input USA) | Readable sequence | Required | ||||||
| -mode | Mode. The default mode 2 corresponds to the old MFOLD software. |
|
2 | ||||||
| -dna | Molecule is single-stranded DNA rather than RNA | Toggle value Yes/No | No | ||||||
| -circular | Molecule is circular | Boolean value Yes/No | No | ||||||
| -basename | Base name for output files | Any string is accepted | <sequence name> | ||||||
| -graph | Graphic output format |
|
png | ||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||
| -temperature | Temperature | Integer from -5 to 105 | 37 | ||||||
| -na | Na+ molar concentration (for DNA only) | Number 0.000 or more | 1.0 | ||||||
| -mg | Mg++ molar concentration (for DNA only) | Number 0.000 or more | 0.0 | ||||||
| -minstem | Minimum stem size. The default implies that a helix of size 1, that is a base pair that does not stack on another base pair on either side, is not considered | Integer 1 or more | 2 | ||||||
| -maxloop | Maximum interior or bulge loop size | Integer 0 or more | 30 | ||||||
| -maxbp | Maximum distance between pairing bases | Integer 0 or more | No limit | ||||||
| -suboptimality | Maximum percent free energy suboptimality for computing suboptimal structures when UNAFOLD is working in MFOLD mode | Integer from 0 to 100 | 5 | ||||||
| -window | Base pair distance window for suboptimal foldings when UNAFOLD is working in MFOLD mode. A negative value sets sequence length dependent value (see on-line manual). | Any integer value | depends on sequence length | ||||||
| -maxfold | Number of predicted foldings. If UNAFOLD is working in MFOLD mode this corresponds to the maximum number of foldings within the limits set by the -suboptimality and -window parameters, if UNAFOLD is working in partition function calculation mode this corresponds to the number of stochastically sampled foldings. | Integer 1 or more | maximum 100 suboptimal foldings or 10 stochastically sampled foldings | ||||||
| -annotation | Structure annotation mode. If UNAFOLD is working in MFOLD mode 'ann' and 'ss-count' stand for colouring the bases according to the number of alternative base pairings respectively the number of single-stranded foldings. If UNAFOLD is working in partition function calculation mode 'ann' and 'ss-count' stand for colouring the bases according to base pairing probability respectively single-stranded probability. |
|
none | ||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||
| -constraints | File with constraints that force or prohibit base pairings. See on-line manual for syntax | Input file | Required | ||||||
| -cutoff | Cutoff for display in basepair probability plot | Number from 0.000 to 1.000 | 1e-6 | ||||||
| -display | Structure display mode |
|
auto | ||||||
| -numbering | Base numbering frequency. A value of 0 triggers the default (10 for seq. shorter than 50, 50 for seq. longer than 300, otherwise 20). | Integer 0 or more | depends on sequence length | ||||||
| -storeall | Store all UNAFOLD output files | Boolean value Yes/No | No | ||||||
You can optionally (-constraints=infile) give to mfold a "constraints" file with information (based on laboratory evidence) about which bases cannot or must pair. This can greatly improve the chance of finding the correct structure. An example of a file, that works for the usage example (it makes mfold find the correct structure intstead of what you get by default), is :
P 16 0 1 P 35 0 3 P 48 0 1 P 73 0 1 P 75 0 2 |
The complete set of syntax rules for the "constraints" file is as follows :
and
.
In this case, no base pairs are allowed between
ri,ri+1,...,rj and
rk,rk+1,...,rl. Note that the 2
segments need not be distinct. For example, the command :
level length i j energy 1 5 49 65 -297 1 6 35 57 -287 1 3 38 47 -292 1 1 29 49 -293 1 6 30 48 -296 1 7 1 72 -297 1 4 10 44 -297 1 1 13 34 -294 1 5 14 33 -297 1 1 19 28 -291 1 3 19 27 -297 1 2 10 35 -284 1 4 10 25 -296 |
This file contains instructions for drawing the energy dotplot. The first record is a header, and each subsequent record describes a single helix, with the position of the first base, the position of the last base and the length. The energy is the smallest free energy change (dG), in kcal/mol, of the folding that still contains the helix.
76 dG = -29.7 K00152 K00152.1 E.coli Arg-tRNA-1. 1 G 0 2 72 1 0 2 2 C 1 3 71 2 1 3 3 A 2 4 70 3 2 4 4 T 3 5 69 4 3 5 5 C 4 6 68 5 4 6 6 C 5 7 67 6 5 7 7 G 6 8 66 7 6 8 [Part of this file has been deleted for brevity] 70 T 69 71 3 70 69 71 71 G 70 72 2 71 70 72 72 C 71 73 1 72 71 73 73 A 72 74 0 73 72 0 74 C 73 75 0 74 0 0 75 C 74 76 0 75 0 0 76 A 75 0 0 76 0 0 |
This file contains instructions for drawing the secondary structure above (see below for exact syntax).
[The files K00152.ann, K00152.dG, and the structure 2, 3 and 4 have been omitted for brevity]
|
|
|
|
19 G 18 20 27 19 18 20There are as many lines as bases in the sequence and the 8 colums have the following meaning :
1 6 35 57 -287with :
No foldings were found
If you add -annotation=ann or -annotation=ss-count to the command line, while the program is running in energy minimization mode, it issues the following warning :
annotation reset to 'none' ! If only one folding is computed,
counting alternative base pairings is pointless.
| Program name | Description |
|---|---|
| cmsearchrfam | Scans nucleic acids for RNA genes and conserved motifs using Rfam |
| einverted | Finds DNA inverted repeats |
| hybridize | Prediction of hybridization between nucleic acid sequences, with consideration of seconcary structure |
The UNAFOLD suite itself was written by Michael Zuker (zukerm@rpi.edu) and Nicholas Markham (markhn@rpi.edu) at Rensselaer Polytechnic Institute (Troy, New York, USA).
Completed 23 October 2008.