|
|
hybridize |
Note that the hybridization between sequences is concentration dependant and that the user must provide the concentration(s) of the involved sequence(s) (there is no default provided). hybridize can operate in 3 different modes :
hybridize uses built-in sets of energy tables to compute the complete partition function and from it base pair probabilities (see also unafold). It does this for the different folded species (Af, Bf, AA, BB, AB) and at different temperatures (set by tmin, tmax and tinc). The temperature range used actually extends 5° above and below tmin and tmax, because of the need to compute the derivative.
The contribution of unfolded molecules to the ensemble is estimated assuming that a single-stranded molecule "melts" (cooperatively goes from "stacked" to "unstacked") at 50°C and that the "stacked" state has an enthalpy that corresponds to 10% of the enthalpy of a fully double-stranded molecule.
hybridize uses the above to compute the concentration of the various species present and the free energies (G) for the ensemble. It computes :
enthalpy : H = G - T * dG/dT entropy : S = - dG/dT heat capacity : Cp = - T * d2G/dT2It computes the molar extinction coefficient using values for dinucleotides.
> hybridize
Prediction of hybridization between nucleic acid sequences, with
consideration of secondary structure
1 : single sequence
2 : double-stranded sequence
3 : two sequences
Operation mode and input type [3]:
Input nucleotide sequence: asis::CAACCTCGATCGGGAGATTG
Second sequence: asis::GCTTCTCCAGATCCAGGTTG
Molecule is DNA rather than RNA [N]:
Concentration of first sequence [0.0]: 0.0000001
Concentration of second sequence [0.0]: 0.0000001
Base name for output files [asis]: oligo1
Base name for output files, second part [asis]: oligo2
ps : PostScript
pdf : PDF
Graphic output format [ps]:
|
Go to the input files for this example
Go to the output files for this example
Standard (Mandatory) qualifiers (* if not always prompted):
-mode menu [3] Operation mode and input type (Values: 1
(single sequence); 2 (double-stranded
sequence); 3 (two sequences))
[-asequence] sequence Nucleotide sequence filename and optional
format, or reference (input USA)
* -bsequence sequence Nucleotide sequence filename and optional
format, or reference (input USA)
-dna toggle Molecule is DNA rather than RNA
-aconc float [no default !] Concentration of first
sequence. Entering value is required !
(Number 0.000 or more)
* -bconc float [no default !] Concentration of second
sequence. Entering value is required when
hybridize is running in mode 3 ! (Number
0.000 or more)
-abasename string [$(asequence.name)] Base name for output
files (Any string is accepted)
* -bbasename string [$(bsequence.name)] Base name for output
files, second part (Any string is accepted)
-graph menu [ps] Graphic output format (Values: ps
(PostScript); pdf (PDF))
Additional (Optional) qualifiers (* if not always prompted):
-tmin integer [0] Minimum temperature (Integer from -5 to
105)
-tmax integer [100] Maximum temperature (Integer from -5
to 105)
-tinc integer [1] Temperature increment (Integer 1 or
more)
* -na float [1.0] Na+ molar concentration (for DNA only)
(Number 0.000 or more)
* -mg float [0.0] Mg++ molar concentration (for DNA
only) (Number 0.000 or more)
-minstem integer [2] Minimum stem size. The default implies
that a helix of size 1, that is a base pair
that does not stack on another base pair on
either side, is not considered (Integer 1 or
more)
-maxloop integer [30] Maximum interior or bulge loop size
(Integer 0 or more)
-maxbp integer [No limit] Maximum distance between pairing
bases (Integer 0 or more)
-[no]unfolded toggle [Y] Estimate enthalpy and entropy unfolded
sequence. If you unset this the unfolded
sequences are not taken into account for
calculating the ensemble.
* -fraction float [0.1] Fraction of double strand stacking
enthalpy used for estimating enthalpy
unfolded sequence (Number from 0.000 to
1.000)
* -tmelt integer [50] Melting temperature used for estimation
of entropy unfolded sequence (Integer from
-5 to 105)
Advanced (Unprompted) qualifiers:
-storeall boolean Store all UNAFOLD output files
Associated qualifiers:
"-asequence" associated qualifiers
-sbegin1 integer Start of the sequence to be used
-send1 integer End of the sequence to be used
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-bsequence" associated qualifiers
-sbegin integer Start of the sequence to be used
-send integer End of the sequence to be used
-sreverse boolean Reverse (if DNA)
-sask boolean Ask for begin/end/reverse
-snucleotide boolean Sequence is nucleotide
-sprotein boolean Sequence is protein
-slower boolean Make lower case
-supper boolean Make upper case
-sformat string Input sequence format
-sdbname string Database name
-sid string Entryname
-ufo string UFO features
-fformat string Features format
-fopenfile string Features file name
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
|
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||
|---|---|---|---|---|---|---|---|---|---|
| -mode | Operation mode and input type |
|
3 | ||||||
| [-asequence] (Parameter 1) |
Nucleotide sequence filename and optional format, or reference (input USA) | Readable sequence | Required | ||||||
| -bsequence | Nucleotide sequence filename and optional format, or reference (input USA) | Readable sequence | Required | ||||||
| -dna | Molecule is DNA rather than RNA | Toggle value Yes/No | No | ||||||
| -aconc | Concentration of first sequence. Entering value is required ! | Number 0.000 or more | no default ! | ||||||
| -bconc | Concentration of second sequence. Entering value is required when hybridize is running in mode 3 ! | Number 0.000 or more | no default ! | ||||||
| -abasename | Base name for output files | Any string is accepted | <first sequence name> | ||||||
| -bbasename | Base name for output files, second part | Any string is accepted | <second sequence name> | ||||||
| -graph | Graphic output format |
|
ps | ||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||
| -tmin | Minimum temperature | Integer from -5 to 105 | 0 | ||||||
| -tmax | Maximum temperature | Integer from -5 to 105 | 100 | ||||||
| -tinc | Temperature increment | Integer 1 or more | 1 | ||||||
| -na | Na+ molar concentration (for DNA only) | Number 0.000 or more | 1.0 | ||||||
| -mg | Mg++ molar concentration (for DNA only) | Number 0.000 or more | 0.0 | ||||||
| -minstem | Minimum stem size. The default implies that a helix of size 1, that is a base pair that does not stack on another base pair on either side, is not considered | Integer 1 or more | 2 | ||||||
| -maxloop | Maximum interior or bulge loop size | Integer 0 or more | 30 | ||||||
| -maxbp | Maximum distance between pairing bases | Integer 0 or more | No limit | ||||||
| -[no]unfolded | Estimate enthalpy and entropy unfolded sequence. If you unset this the unfolded sequences are not taken into account for calculating the ensemble. | Toggle value Yes/No | Yes | ||||||
| -fraction | Fraction of double strand stacking enthalpy used for estimating enthalpy unfolded sequence | Number from 0.000 to 1.000 | 0.1 | ||||||
| -tmelt | Melting temperature used for estimation of entropy unfolded sequence | Integer from -5 to 105 | 50 | ||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||
| -storeall | Store all UNAFOLD output files | Boolean value Yes/No | No | ||||||
Other output files you can obtain are :
You must provide a value for the concentration of the first sequence ! You must provide a value for the concentration of the second sequence !
If one or both of the sequences are longer than 100 bases, the program issues one of these messages :
The sequence should not be longer than 100 bases. The sequences should not be longer than 100 bases.
If in "two sequences" mode the two input sequences are identical, the program issues the message :
The two sequences should be different!
If in "double-stranded sequence" mode it turns out that the two strands of the sequence are identical, the program reverts to functioning in "single sequence" mode and issues the message :
The sequence is a palindrome, hence only one unfolded species !
Error messages from the "naked" programs are displayed. For example, it can happen that the UNAFOLD suite programs fail to compute some values, with as consequence that gnuplot cannot make the plot. You will then see messages like (note : "nan" means "not a number") :
Warning: at 105 degrees the relative error of [B]+2[BB]+[AB] is nan line 16: undefined variable: nanIf such error messages are issued it can be useful to re-run the program, setting tmin and tmax so that the temperatures at which the errors occur are at least 5° outside the computaion range.
If a sequence is shorter than 12 bases, you can see :
Note: for sequences of 11 bases or less, hybrid-ss-noml is functionally equivalent to hybrid-ss, and significantly faster
| Program name | Description |
|---|---|
| cmsearchrfam | Scans nucleic acids for RNA genes and conserved motifs using Rfam |
| einverted | Finds DNA inverted repeats |
| unafold | Prediction of optimal and suboptimal RNA or DNA secondary structure |
The UNAFOLD suite itself was written by Michael Zuker (zukerm@rpi.edu) and Nicholas Markham (markhn@rpi.edu) at Rensselaer Polytechnic Institute (Troy, New York, USA).