codehop

 

Function

Select degenerate primers from set of related proteins

Description

codehop is an EMBOSS "wrapper" program for the program CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primers), which is distributed together with the BLIMPS software suite of the FHCRC. It designs PCR (Polymerase Chain Reaction) primers from protein multiple-sequence alignments. The program is intended for cases where the protein sequences are distant from each other and degenerate primers are needed.

A CODEHOP primer is degenerate at the 3' core region, with a length of 11-12 bp across four codons of highly conserved amino acids, and is non-degenerate at the 5' consensus clamp region, with a length which depends on its desired annealing temperature, typically between 20 and 30 bp :

5'                            3'
--------------------===========

non-degenerate      degenerate
consensus clamp     core
#bases from temp    11-12 bases

The hybrid structure (5' consensus and 3' degenerate) of CODEHOP primers allows the PCR amplification to be specific during the early cycles from the original source DNA and selective during the late cycles from the PCR synthesized products :

CODEHOP diagram

Schematic comparison of standard degenerate PCR (left) with the CODEHOP (right), illustrating regions of mismatch in primer-to-template annealing during early PCR cycles and in primer-to-product annealing during subsequent cycles. Vertical lines indicate nucleotide matches between primer (arrow) and template or synthesized product. The overall degeneracy is the product of degeneracies at each nucleotide position, so that the fraction of precisely hybridizing primers is 1/degeneracy.

The "naked" CODEHOP program takes as input a file in Blocks format with a series of local multiple protein sequence alignments without gaps. The current implementation of the codehop "wrapper" program can operate in three modes :

  1. The default is to take as input a multiple sequence alignment (with gaps). The "wrapper" uses the mablock program of the FHCRC to slice the alignment into a series of blocks (local alignments without gaps). The user can set the minimum and maximum block width.
  2. The user can provide himself a local alignment without gaps. In this case he will however have to provide several such alignments and hence run the program several times.
  3. The user can eventually provide himself a file in Blocks format containing a set of local alignments without gaps. Note that this could be a file generated by a previous run of the program.
Besides an alignment with proteins belonging to the family you must also provide a genetic code and a codon usage table for the organism that you suspect to contain an unknown member of the family and with which you want to perform the PCR experiment.

CODEHOP does not select primer pairs. It selects from each block a series of primers. By default a maximum of 3 primers per block are reported, but you can change this limit or ask to show all primers. It can of course happen that no primer is found. It is the responsability of the user to select from the output two sets of degenerate primers with opposite orientations and at appropriate distance to allow for amplification.

Tips for designing primers

First run the program in its default mode. If you don't get predictions, or if you don't like what you get, we think that you should first raise the degeneracy to 256 or higher (if you dare) and retry. Next, you might raise the strictness of the core region, for example to 0.1 or 0.25. You might have one or more favored sequences, in which case you can, rather than raising the degeneracy or strictness, raise the weight of your favored sequence(s) (to bias the primer) ; you can do this by editing the xxx.blks output file (modify the number at the right of the sequence) and give it as input to codehop in Blocks file format mode. You can also remove individual sequences by down-weighing them to 0 if they are too divergent or misaligned and so prevent finding a solution.

Cycling conditions

For amplification, we recommend using AmpliTaq Gold with a 9' preheat (this provides an automatic hotstart - a hotstart of some kind is important). We have had success using the time-release feature with addition of 15-20 extra cycles. The CODEHOP strategy is different from the usual degenerate PCR, and it is desirable to keep annealing temperatures high - even 60oC may be OK if you have a >60oC clamp. We recommend trying the highest temperature that yields a clean PCR product. We have used "touchdown" PCR down to Tm-3oC or lower, say from 63oC down to a good clamp annealing temperature in -0.5 -> -1oC increments, and the remaining cycles are carried out at the 53-57oC clamp annealing temperature for a 60oC clamp. The intent of the touchdown is to give the correct product a head start, because it is likely to anneal at a higher temperature than any failure product. Once the clamp 'takes over', then all primed products, whether correct or not, will be on an even footing, so we try to keep the stringency high in all cycles. With luck, it should not be necessary to gel-purify product, but rather you may clone directly from the reaction mix if a single band of the expected size is obtained.

Algorithm

The CODEHOP program designs a pool of primers containing all possible 11- or 12-mers for the 3' degenerate core region and having the most probable nucleotide predicted for each position in the 5' non-degenerate clamp region.

The program consists of the following steps: (note the scheme on the right)
1) A set of blocks is input, where a block is an aligned array of amino acid 
sequence segments without gaps that represents a highly conserved region of 
homologous proteins. A weight is provided for each sequence segment, which can be 
increased to favor the contribution of selected sequences in designing the primer. 
A codon usage table is chosen for the target genome.

2) An amino acid position-specific scoring matrix is computed for each block using the odds ratio method.

3) A consensus amino acid residue is selected for each position of the block as the highest scoring amino acid in the matrix.

4) For each position of the block, the most common codon corresponding to the amino acid chosen in step 3 is selected utilizing the user-selected codon usage table. This selection is used for the default 5' consensus clamp in step 8.

5) A DNA PSSM is calculated from the amino acid matrix (step 2), genetic code table and codon usage table. The DNA matrix has three positions for each position of the amino acid matrix. The score for each amino acid is divided among its codons in proportion to their relative weights from the codon usage table, and the scores for each of the four different nucleotides are combined in each DNA matrix position. Nucleotide positions are treated independently when the scores are combined. As an option, the highest scoring nucleotide residue from each position can replace the most common codons from step 4 that are used in the consensus clamp.

6) The degeneracy is determined at each position of the DNA matrix based on the number of bases found there. As an option, a weight threshold can be specified such that bases that contribute less than a minumum weight are ignored in determining degeneracy.

7) Possible degenerate core regions are identified by scanning the DNA matrix in the 3' to 5' direction. A core region must start on an invariant 3' nucleotide position, have a length of 11 or 12 positions ending on a codon boundary, and have a maximum degeneracy of 128 (current default). The degeneracy of a region is the product of the number of possible bases in each position.

8) Candidate degenerate core regions are extended by addition of a 5' consensus clamp from step 4 or 5. The length of the clamp is controlled by a melting point temperature calculation (current default = 60o) and is usually ~20 nucleotides.

9) Steps 7 and 8 are repeated on the reverse complement of the DNA matrix from step 5 for primers corresponding to the opposite DNA strand.

           CODEHOP program scheme

1) input   -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 1  Protein sequence block
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 2
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 3
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 4
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   seq 5
           -  -  -  -  -  -  -  -  -  -  -  -  -  -  -   etc.

  |
  | 2) transformation to AA PSSM
  V
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Ala    AA PSSM
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Cys
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Asp
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Glu
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Phe
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Gly
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   His
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Ile
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   Lys
           |  |  |  |  |  |  |  |  |  |  |  |  |  |  |   etc.

     |  |
     |  | 3) calculation of AA consensus sequence
     |  V
     |     -  -  -  -  -  -  -  -  -  -  -  -  -  -  -          AA consensus sequence
     |
     |  |
     |  | 4) transformation to DNA consensus sequence
     |  V
     |     -------------------------------------------          DNA consensus sequence
 |   |
 |   | 5) back-translation to DNA PSSM
 |   V
 |         |||||||||||||||||||||||||||||||||||||||||||   A      DNA PSSM
 |         |||||||||||||||||||||||||||||||||||||||||||   C
 |         |||||||||||||||||||||||||||||||||||||||||||   G
 |         |||||||||||||||||||||||||||||||||||||||||||   T
 | | |
 | | | 6) calculation of degeneracies
 | | V
 | |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~          position degeneracy values
 | |
 | | 7) identify degenerate regions ("===")
 | |
 | 8) identify consensus regions for degenerate regions ("---")
 | |
 V V
                   5'  -------====           3'                 CODEHOP primers
   output          3'         ====---------  5'

Terms and parameters

Degeneracy strictness

Degeneracy strictness specifies how to count nucleotide(s) with low occurrences. A nucleotide will be counted if the ratio of its frequency value over the highest (maximal) value in that position is more or equal to the strictness. Strictness can have values between 0 and 1. Strictness of 0 will cause all the nucleotides that actually appear in the position to be counted. Strictness of 1 means that only the nucleotides with the highest value in a position will be counted. Intermediate strictness values give behavior in between.

examples:
1 2 3 4 5
base value ratio value ratio value ratio value ratio value ratio
A 0 0 25 1 40 0.67 30 0.67 45 1
C 0 0 25 1 0 0 5 0.11 11 0.24
G 100 1 25 1 60 1 45 1 17 0.38
T 0 0 25 1 0 0 20 0.44 27 0.6
Strictness degeneracy
0 1 4 2 4 4
0.33 1 4 2 3 3
0.5 1 4 2 2 2
0.67 1 4 2 2 1
1 1 4 1 1 1
The highest base frequency value(s) in each of the 5 examples is underlined.

Usage

Here is a sample session with codehop

> codehop
Select degenerate primers from set of related proteins
         1 : global alignment
         2 : local alignment without gaps
         3 : Blocks format file
Operation mode and input type [1]: 
Input sequence(s): ADH.msf
Codon usage file: Ehuman.cut
Output file [e.codehop]: ADH_human.codehop
6 blocks found with minimum width 10 and maximum with 55

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers (* if not always prompted):
   -mode               menu       [1] Operation mode and input type. The
                                  algorithm needs local alignments without
                                  gaps (blocks) and the software will slice a
                                  global alignment into blocks. (Values: 1
                                  (global alignment); 2 (local alignment
                                  without gaps); 3 (Blocks format file))
*  -seqs               seqall     Sequence(s) filename and optional format, or
                                  reference (input USA)
*  -blocksfile         infile     Blocks file
   -cfile              codon      Codon usage table name
  [-outfile]           outfile    [*.codehop] Output file name

   Additional (Optional) qualifiers (* if not always prompted):
*  -minwidth           integer    [10] Minimum block width (Integer 8 or more)
*  -maxwidth           integer    [55] Maximum block width (Integer from
                                  minimum block width to sequences length)
   -gencode            menu       [0] Genetic code for backtranslaton (Values:
                                  0 (Standard); 1 (Vertebrate Mitochondrial);
                                  2 (Yeast Mitochondrial); 3 (Mold
                                  Mitochondrial and Mycoplasma); 4
                                  (Invertebrate Mitochondrial); 5 (Ciliate
                                  Nuclear); 6 (Echinoderm Mitochondrial); 7
                                  (Euplotid Nuclear); 8 (Bacterial and Plant
                                  Plastid); 9 (Alternative Yeast Nuclear); 10
                                  (Ascidian Mitochondrial); 11 (Flatworm
                                  Mitochondrial); 12 (Blepharisma
                                  Macronuclear); 13 (Chlorophycean
                                  Mitochondrial); 14 (Trematode
                                  Mitochondrial); 15 (Scenedesmus obliquus
                                  mitochondrial); 16 (Thraustochytrium
                                  mitochondrial))
   -coredegeneracy     float      [128.0] Maximum core degeneracy. The core
                                  degeneracy is the number of alternative
                                  primer sequences. (Number 1.0 or more)
   -corestrictness     float      [0.0] Core strictness (Number from 0.000 to
                                  1.000)
   -most               toggle     Use the most common codons in the clamp
*  -clampstrictness    float      [1.0] Clamp strictness (Number from 0.000 to
                                  1.000)
   -clamptm            float      [60.0] Clamp melting temperature (Any
                                  numeric value)
   -clamppolynuc       integer    [5] Clamp maxiumum number of consecutive
                                  nucleotides of same type (Integer 1 or more)
   -dnaconc            float      [50.0] Primer DNA concentration (nM) (Number
                                  0.000 or more)
   -rose               boolean    Force core/clamp boundary to be codon
                                  boundary
   -[no]invar          boolean    [Y] 3' base of primer must be invariant
   -showall            toggle     Show all degenerate primers in output
*  -maxshow            integer    [3] Maximum number degenerate primers per
                                  block in output (Integer 1 or more)

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-seqs" associated qualifiers
   -sbegin             integer    Start of each sequence to be used
   -send               integer    End of each sequence to be used
   -sreverse           boolean    Reverse (if DNA)
   -sask               boolean    Ask for begin/end/reverse
   -snucleotide        boolean    Sequence is nucleotide
   -sprotein           boolean    Sequence is protein
   -slower             boolean    Make lower case
   -supper             boolean    Make upper case
   -sformat            string     Input sequence format
   -sdbname            string     Database name
   -sid                string     Entryname
   -ufo                string     UFO features
   -fformat            string     Features format
   -fopenfile          string     Features file name

   "-cfile" associated qualifiers
   -format             string     Data format

   "-outfile" associated qualifiers
   -odirectory1        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
-mode Operation mode and input type. The algorithm needs local alignments without gaps (blocks) and the software will slice a global alignment into blocks.
1 (global alignment)
2 (local alignment without gaps)
3 (Blocks format file)
1
-seqs Sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
-blocksfile Blocks file Input file Required
-cfile Codon usage table name Codon usage file in EMBOSS data path Required
[-outfile]
(Parameter 1)
Output file name Output file <sequence>.codehop
Additional (Optional) qualifiers Allowed values Default
-minwidth Minimum block width Integer 8 or more 10
-maxwidth Maximum block width Integer from minimum block width to sequences length 55
-gencode Genetic code for backtranslaton
0 (Standard)
1 (Vertebrate Mitochondrial)
2 (Yeast Mitochondrial)
3 (Mold Mitochondrial and Mycoplasma)
4 (Invertebrate Mitochondrial)
5 (Ciliate Nuclear)
6 (Echinoderm Mitochondrial)
7 (Euplotid Nuclear)
8 (Bacterial and Plant Plastid)
9 (Alternative Yeast Nuclear)
10 (Ascidian Mitochondrial)
11 (Flatworm Mitochondrial)
12 (Blepharisma Macronuclear)
13 (Chlorophycean Mitochondrial)
14 (Trematode Mitochondrial)
15 (Scenedesmus obliquus mitochondrial)
16 (Thraustochytrium mitochondrial)
0
-coredegeneracy Maximum core degeneracy. The core degeneracy is the number of alternative primer sequences. Number 1.0 or more 128.0
-corestrictness Core strictness Number from 0.000 to 1.000 0.0
-most Use the most common codons in the clamp Toggle value Yes/No No
-clampstrictness Clamp strictness Number from 0.000 to 1.000 1.0
-clamptm Clamp melting temperature Any numeric value 60.0
-clamppolynuc Clamp maxiumum number of consecutive nucleotides of same type Integer 1 or more 5
-dnaconc Primer DNA concentration (nM) Number 0.000 or more 50.0
-rose Force core/clamp boundary to be codon boundary Boolean value Yes/No No
-[no]invar 3' base of primer must be invariant Boolean value Yes/No Yes
-showall Show all degenerate primers in output Toggle value Yes/No No
-maxshow Maximum number degenerate primers per block in output Integer 1 or more 3
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

If codehop is working in global alignment mode or local alignment without gaps mode it reads any normal sequence USA for one or more protein sequences. The sequences should be properly aligned already. In the first case the complete sequences or their most conserved parts should be aligned by the inclusion of gap symbols at appropriate positions. In the second case the sequences should represent a local region of similarity, wide enough to allow the selection of primers, and containing no gap symbols.

codehop can also work in Blocks format file mode on a set of different blocks (local alignments without gaps) in Blocks format provided by the user.

Output file format

If codehop is working in global alignment mode or local alignment without gaps mode it creates besides the output file with the CODEHOP report containing the proposed primers also a file with extension .blks with the alignment(s) in Blocks format (created by the mablock program), on which CODEHOP has operated.

Output files for usage example

File: ADH_human.codehop

CODEHOP Version 10/14/04.1
COPYRIGHT 1997-2004, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

Parameters:
 Amino acids PSSM calculated with odds ratios normalized to 100
 and back-translated with Standard genetic code
 and codon usage table "Ehuman.cut"
 Maximum core degeneracy 128    Core strictness 0.00
 Clamp strictness 1.00   Target clamp temperature 60.00 C
 DNA Concentration 50.00 nM   Salt Concentration 50.00 mM
 Codon boundary 0   Most common codon 0
 Verbose 0   Output 3
 Begin 1   PolyX 5
Suggested CODEHOPS: The degenerate region (core) is printed in lower case,
the non-degenerate region (clamp) is printed in upper case.

Processing Block 4codehopA
M  K  G  W  A  A  M  D  F  W  K  H  L  E  P  M  T  F  T  R  R  E  P  G  P  H  D  V  Y  I  K  I  E  F  C  G  I  C  H  S  D  I  H  Q  V  H  N  E  W  G  M  S  H  Y  P  
No suggested primers found.

Processing Complement of Block 4codehopA
M  K  G  W  A  A  M  D  F  W  K  H  L  E  P  M  T  F  T  R  R  E  P  G  P  H  D  V  Y  I  K  I  E  F  C  G  I  C  H  S  D  I  H  Q  V  H  N  E  W  G  M  S  H  Y  P  
No suggested primers found.

Processing Block 4codehopB
C  V  P  G  H  E  I  V  G  R  V  V  E  V  G  S  K  V  H  K  Y  K  V  G  D  R  V  G  V  G  C  Q  V  D  C  C  R  E  C  E  Y  C  T  S  G  Q  E  Q  Y  C  P  H  M  H  W  
TTGGTCyynggncayga -3'  Core: degen=128 len=11  Clamp: score=58, len=6 temp=-25.7 *** CLAMP NEEDS EXTENSION

Processing Complement of Block 4codehopB
C  V  P  G  H  E  I  V  G  R  V  V  E  V  G  S  K  V  H  K  Y  K  V  G  D  R  V  G  V  G  C  Q  V  D  C  C  R  E  C  E  Y  C  T  S  G  Q  E  Q  Y  C  P  H  M  H  W  
         ccngtrctyymGCACCCGCACCACC -5'  Core: degen=64 len=11  Clamp: score=65, len=14 temp= 60.2

Processing Block 4codehopC
G  Y  H  T  Q  G  G  Y  A  E  H  C  V  C  H  E  H  Y  V  I  R  I  P  D  N  L  P  L  D  A  A  A  P  L  L  C  A  G  I  T  V  Y  S  P  L  K  H  W  G  C  G  P  G  M  W  
                                                                                           CCGCCCCTCTGCTGtgygsnggnrt -3'  Core: degen=128 len=11  Clamp: score=70, len=14 temp= 62.7

Processing Complement of Block 4codehopC
G  Y  H  T  Q  G  G  Y  A  E  H  C  V  C  H  E  H  Y  V  I  R  I  P  D  N  L  P  L  D  A  A  A  P  L  L  C  A  G  I  T  V  Y  S  P  L  K  H  W  G  C  G  P  G  M  W  
                                                                                                         acrcsnccnyaGTGGCACATGTGGGGAGAC -5'  Core: degen=128 len=11  Clamp: score=64, len=19 temp= 62.0

Processing Block 4codehopD
V  G  I  V  G  I  G  G  L  G  H  M  G  V  K  Y  A  K  A  M  G  H  H  V  T  V  F  S  T  S  H  K  K  R  E  D  A  M  H  L  G  A  D  H  Y  I  N  M  R  D  P  D  G  W  K  
                                   GGGCGTCAAGTTCGCCvmngcnatggg -3'  Core: degen=96 len=11  Clamp: score=66, len=16 temp= 61.4

Processing Complement of Block 4codehopD
V  G  I  V  G  I  G  G  L  G  H  M  G  V  K  Y  A  K  A  M  G  H  H  V  T  V  F  S  T  S  H  K  K  R  E  D  A  M  H  L  G  A  D  H  Y  I  N  M  R  D  P  D  G  W  K  
                           ccngtryancsGCAGTTCAAGCGGTTCCG -5'  Core: degen=128 len=11  Clamp: score=62, len=18 temp= 62.9

Processing Block 4codehopE
E  H  H  D  G  F  D  Y  I  C  N  T  V  S  A  K  H  N  F  D  Q  Y  Y  Q  L  M  K  H  D  G  T  L  V  M  V  G  A  P  E  H  P  H  K  F  P  V  F  M  L  M  L  M  R  V  S  
No suggested primers found.

Processing Complement of Block 4codehopE
E  H  H  D  G  F  D  Y  I  C  N  T  V  S  A  K  H  N  F  D  Q  Y  Y  Q  L  M  K  H  D  G  T  L  V  M  V  G  A  P  E  H  P  H  K  F  P  V  F  M  L  M  L  M  R  V  S  
No suggested primers found.

Processing Block 4codehopF
I  M  G  S  M  I  G  G  R  K  E  T  Q  E  M  L  D  F  C  A  E  H  N  V  T  P  W  I  E  M  I  E  M  D  Y  I  N  H  A  F  E  R  M  E  K  G  D  V  R  Y  R  F  V  I  D  
No suggested primers found.

Processing Complement of Block 4codehopF
I  M  G  S  M  I  G  G  R  K  E  T  Q  E  M  L  D  F  C  A  E  H  N  V  T  P  W  I  E  M  I  E  M  D  Y  I  N  H  A  F  E  R  M  E  K  G  D  V  R  Y  R  F  V  I  D  
No suggested primers found.

The 5'-consensus clamp region is written in uppercase, the 3'-degenerate core region is written in lowercase. Note that a primer GGGCGTCAAGTTCGCCvmngcnatggg with degeneracy 96 means that you must actually synthetise and use a mixture of 96 primers, where v must be replaced by A, C or G, m must be replaced by A or C and n must be replaced by any of the 4 bases, this in all possible combinations (3X2X4X4=96).
A CLAMP NEEDS EXTENSION message means that the program could find an appropriate degenerate core but that it could not extend the consensus clamp sufficiently enough as to obtain a high enough melting temperature, because it hit the block border. You can try to mend this by running the program again on an extended alignment (try removing columns with gaps and/or ambiguities) or by adding yourself the most plausible codons.

File: ADH_human.codehop.blks

ID   4codehop; BLOCK
AC   4codehopA; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             (   0) IKAVGAYSAKQPLEPMDITRREPGPNDVKIEIAYCGVCHSDLHQVRSEWAGTVYP 8.080556
B.stearoth         (   0) MKAAVVNEFKKALEIKEVERPKLEEGEVLVKIEACGVCHTDLHAAHGDWPIKKLP 10.830556
M.tubercul         (   0) VAAYAAMSATEPLTKTTITRRDPGPHDVAIDIKFAGICHSDIHTVKAEWGQPNYP 8.486111
yeast              (   0) GIGISNAKDWKHPKLVSFDPKPFGDHDVDVEIEACGICGSDFHIAVGNWGPVPEN 11.441667
cider_tree         (   0) TTGWAARDPSGVLSPYTYSLRNTGPEDLYIKVLSCGVCHSDIHQIKNDLGMSHYP 8.841667
A.thaliana         (   0) AFGLAAKDNSGVLSPFSFTRRETGEKDVRFKVLFCGICHSDLHMVKNEWGMSTYP 7.319444
//
ID   4codehop; BLOCK
AC   4codehopB; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             (  55) CVPGHEIVGRVVAVGDQVEKYAPGDLVGVGCIVDSCKHCEECEDGLENYCDHMTG 8.947222
B.stearoth         (  55) LIPGHEGVGIVVEVAKGVKSIKVGDRVGIPWLYSACGECEYCLTGQETLCPHQLN 10.477778
M.tubercul         (  55) VVPGHEIAGVVTAVGSEVTKYRQGDRVGVGCFVDSCRECNSCTRGIEQYCKPGAN 7.966667
yeast              (  55) QILGHEIIGRVVKVGSKCHTVKIGDRVGVGAQALACFECERCKSDNEQYCTNDHV 10.411111
cider_tree         (  55) MVPGHEVVGEVLEVGSEVTKYRVGDRVGTGIVVGCCRSCSPCNSDQEQYCNKKIW 8.922222
A.thaliana         (  55) LVPGHEIVGVVTEVGAKVTKFKTGEKVGVGCLVSSCGSCDSCTEGMENYCPKSIQ 8.275000
//
ID   4codehop; BLOCK
AC   4codehopC; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             ( 110) EPHTLGGYSQQIVVHERYVLRIRHPQELAAVAPLLCAGITTYSPLRHWQAGPGKK 8.808333
B.stearoth         ( 110) GGSVDGGYAEYCKAPADYVAKIPDNLDPVEVAPILCAGVTTYKALKVSGARPGEW 11.061111
M.tubercul         ( 110) GQPTQGGYSEAIVVDENYVLRIPDVLPLDVAAPLLCAGITLYSPLRHWNAGANTR 8.036111
yeast              ( 110) GYISQGGFASHVRLHEHFAIQIPENIPSPLAAPLLCGGITVFSPLLRNGCGPGKR 10.030556
cider_tree         ( 110) GKPTQGGFAGEIVVGERFVVKIPDGLESEQAAPLMCAGVTVYSPLVRFGLKSGLR 8.652778
A.thaliana         ( 110) NTITYGGYSDHMVCEEGFVIRIPDNLPLDAAAPLLCAGITVYSPMKYHGLDPGMH 8.411111
//
ID   4codehop; BLOCK
AC   4codehopD; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             ( 165) VGVVGIGGLGHMGIKLAHAMGAHVVAFTTSEAKREAAKALGADEVVNSRNADEMA 8.488889
B.stearoth         ( 165) VAIYGIGGLGHIALQYAKAMGLNVVAVDISDEKSKLAKDLGADIAINGLKEDPVK 10.852778
M.tubercul         ( 165) VAIIGLGGLGHMGVKLGAAMGADVTVLSQSLKKMEDGLRLGAKSYYATADPDTFR 9.716667
yeast              ( 165) VGIVGIGGIGHMGILLAKAMGAEVYAFSRGHSKREDSMKLGADHYIAMLEDKGWT 9.441667
cider_tree         ( 165) GGILGLGGVGHMGVKIAKAMGHHVTVISSSDKKREALEHLGADAYLVSSDENGMK 8.283333
A.thaliana         ( 165) IGVVGLGGLGHVGVKFAKAMGTKVTVISTSEKKREAINRLGADAFLVSRDPKQIK 8.216667
//
ID   4codehop; BLOCK
AC   4codehopE; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             ( 220) AHLKSFDFILNTVAAPHNLDDFTTLLKRDGTMTLVGAPATPHKSPVFNLIMKRRA 8.430556
B.stearoth         ( 220) AIHDGVHAAISVAVNKKAFEQAYQSVKRGGTLVVVGLPNADLPIPIFDTVLNGVS 9.994444
M.tubercul         ( 220) KLRGGFDLILNTVSANLDLGQYLNLLDVDGTLVELGIPEHPMAVPAFALALMRRS 8.241667
yeast              ( 220) EQYSDLLVVCSSSLSKVNFDSIVKIMKIGGSIVSIAAPEVNEKLVLKPLGLMGVS 11.219444
cider_tree         ( 220) EATDSLDYIFDTIPVVHPLEPYLALLKLDGKLILTGVINAPLQFISPMVMLGRKS 9.044444
A.thaliana         ( 220) DAMGTMDGIIDTVSATHSLLPLLGLLKHKGKLVMVGAPEKPLELPVMPLIFERKM 8.069444
//
ID   4codehop; BLOCK
AC   4codehopF; distance from previous block=(0,0)
DE   block derived from ADH.msf
BL   UNK motif;  width=55; seqs=6; 99.5%=0; strength=0 
E.coli             ( 275) IAGSMIGGIPETQEMLDFCAEHGIVADIEMIRADQINEAYERMLRGDVKYRFVID 7.197222
B.stearoth         ( 275) VKGSIVGTRKDMQEALDFAARGKVRPIVETAELEEINEVFERMEKGKINGRIVLK 12.213889
M.tubercul         ( 275) LAGSNIGGIAETQEMLNFCAEHGVTPEIELIEPDYINDAYERVLASDVRYRFVID 7.700000
yeast              ( 275) ISSSAIGSRKEIEQLLKLVSEKNVKIWVEKLPISEVSHAFTRMESGDVKYRFTLV 11.833333
cider_tree         ( 275) ITGSFIGSMKETEEMLEFCKEKGLTSQIEVIKMDYVNTALERLEKNDVRYRFVVD 8.263889
A.thaliana         ( 275) VMGSMIGGIKETQEMIDMAGKHNITADIELISADYVNTAMERLEKADVRYRFVID 7.791667
//

Data files

The codehop "wrapper" program takes as input a codon usage file in EMBOSS format. The EMBOSS distribution contains a list of codon usage files. If you run the program in a terminal you can find out which file you need by executing the command :
more $EMBOSS_DATA/cut.list
If you work under wEMBOSS you can select from a selector.

At the BEN site you can find besides the codon usage files from the EMBOSS distribution also files from the CUTG databank. You can search the CUTG databank using MRS. E.g., if you want a table for the bacteriophage lambda, a search for "*lambda*" will show you that there is a table with ID Bacteriophage_lambda, you can then run codehop with file Bacteriophage_lambda.cutg

Notes

None.

References

  1. "Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly-related sequences" by T.M. Rose, E.R. Schultz, J.G. Henikoff, S. Pietrokovski, C.M. McCallum and S. Henikoff, Nucleic Acids Research, 26(7):1628-1635.
  2. "CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design" by T.M. Rose, J.G. Henikoff and S. Henikoff, Nucleic Acids Research, 31(13):3763-3766.

Warnings

Remember that codehop needs as input already aligned sequences. If you give as input sequences of uneven length (or, while working in local alignment without gaps mode give as input sequences with gaps) the program will run without warning message but will yield no or meaningless results.

Note that, in order to get a valid result, it is important to choose a appropriate codon usage table and genetic code.

Note that you should run the PCR under the conditions specified to the program (50 mM K+, primer DNA concentration as input (by default 50 nM), working temperature as input (default 60 �C)).

Diagnostic Error Messages

There is an error message specific to this program that is issued if the number of input sequences is higher than 400 :

 The number of sequences n is higher than the maximum of 400

There is error message specific to this program that is issued if the "naked" codehop program produces an empty output file :

  No output, codehop program crashed !

Exit status

It exits prematurely with status 255 and an error message if there are more than 400 input sequences.

Known bugs

None.

See also

Program nameDescription
eprimer3 Picks PCR primers and hybridization oligos
primersearch Searches DNA sequences for matches with primer pairs
stssearch Search a DNA database for matches with a set of STS primers

Author(s)

The wrapper application codehop was written by Guy Bottu (gbottu@vub.ac.be)
BEN, ULB, Brussels, Belgium

The program codehop itself was written by a team of developers working at the FHCRC. For any questions, please contact :

Steven Henikoff                                 steveh@fhcrc.org

Fred Hutchinson Cancer Research Center          FAX: 206-667-5889
1100 Fairview AV N, A1-162, PO Box 19024        Seattle, WA 98109-1024

History

Completed 6 April 2005

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.