ID short_identifier; BLOCK AC block_number; distance from previous block = (min,max) DE description BL XXX motif; width=w; seqs=s; 99.5%=n1; strength=n2 sequence_id (offset) sequence_segment sequence_weight . . . //ID line starts a block entry and contains a short identifier for the group of sequences from which the block was made. If the block was taken from InterPro, it will be the InterPro group ID. The identifier is terminated by a semi-colon, and the word "BLOCK" indicates the entry type.
AC line contains the block number, a seven-character group number for
sequences from which the block was made, followed by a letter (A-Z)
indicating the order of the block in the sequences.
If the group has only one block, the letter is omitted.
If the block was made from InterPro group IPRnnnnnn, the block number
is IPBnnnnnnA.
If the block was converted from Terri Attwood's Prints Database the
block number is PRnnnnnA.
min,max = minimum,maximum number of amino acids from previous block for
sequences in this block. For the first block in the group, the distance from
the beginning of the sequences. If these values are unknown 0,0 can be
filled in. Note that the codehop program does not need or use these
values.
DE line contains a description of the group of sequences from which the block was made. If the block was taken from InterPro, it will be a slightly edited version of the InterPro description.
BL line contains information about the block :
XXX = the amino acids in the spaced triplet found by MOTIF upon
which the block is based. If this is undefined because the block was not
made by the MOTIF program UNK can be set.
w = width of the sequence segments (columns) in the block.
s = number of sequence segments (rows) in the block.
n1 = raw calibration score ; 99.5th percentile score of true
negative sequences. Raw search scores are normalized by
dividing by this score and multiplying by 1000.
n2 = median normalized score of known true positive sequences as
documented in InterPro.
Following the BL line are lines for each sequence with a segment in the block. The segments may be clustered with clusters separated by blank lines. Each segment line contains a sequence identifier, the offset from the beginning of the sequence to the block in parentheses, the sequence segment, and a weight for the segment. The weights are normalized so that the most distant segment has a weight of 100.
// line terminates a block entry. A Blocks format file can contain several blocks the one after the other.
You can find more information about the Blocks databank and tools for making blocks at the FHCRC site. Note that when using the codehop program under EMBOSS can use the <output>.blks file generated by a previous run of codehop.