|
|
indexsearch |
You can start the program by typing :
> indexsearch<enter> |
search in which databanks ? (output format : List File with descriptions) a) embl general nucleic acid databank b) uniprot general protein databank (= SwissProt + TrEMBL) c) swissprot manually annotated part of UniProt d) trembl EMBL ORF translations not yet in SwissProt e) remtrembl other EMBL ORF translations till October 2003 f) uniprot_varsplic SwissProt splice variants g) uniref100 UniProt subset without redundant fragments h) uniref90 UniProt subset with no more than 90% identity i) uniref50 UniProt subset with no more than 50% identity j) pir old general protein databank (December 2004) k) genpept GenBank ORF translations l) refseq NCBI "reference" genes and transcripts m) refseqp NCBI "reference" proteins n) vector Intelligenetics vector databank (January 1996) o) emvec EMBL vector subset p) imgt LIGM databank of Igg. and TcR genes q) hla databank of human MHC genes r) genomereviews complete microbial genomes from EMBL s) gpcrdb UniProt G protein coupled receptors subset t) epd Eukaryotic Promoter Database Select databank by typing lowercase letter. Type F to toggle output format or Q to quit. Please make your choice : a |
The first thing you will see is the DatabaseScreen. You must choose the databank you want to search by typing a lowercase letter. Note that the letters can change when the collection of databanks available at the BEN site changes.
search in embl (output format : List File with descriptions)
a) all text fields : globin AND duplication
b) entry name (ID) :
c) accession number :
d) organism (species) :
e) organism classification (taxon) :
f) organelle :
g) description :
h) keywords :
i) comments :
j) references :
k) features :
l) sequence length :
m) entry creation date :
interfield operator : AND append * after text fields : NO
press lowercase letter to edit search field content,
M to type instead a complete query in MRS language,
L to toggle interfield logic, W to toggle wildcarding, R to reset query,
|
When you have selected the databank, you get into the QueryScreen. Note at the top of the screen the name of the databank you have selected. You must now compose your query. You can press M to get a prompt were you can type in a query in MRS query language, as you would while using the box in the MRS WWW interface, see the on-line help. To make you work easier, indexsearch allows you to type in query words for specific fields, whithout the need to type in the field name. You can for each field type in several words and connect them by the logical operators AND/OR/NOT. Note that indexsearch will to a certain extent reformat your query ; if e.g. you press a, type globin & duplication and press <enter>, you will see that globin AND duplication is now written after a) all text fields :. If you have typed a mistake, you can reset a field by typing the appropriate hot key and then just pressing <enter> and you can reset all fields by typing uppercase R. You can start the search by keeping the control key down while typing d.
50 entries were found. Do you want to : S) save the entries (and quit) Z) save the entries but do not quit P) preview the result on the screen R) refine the query C) change the set of selected databanks Q) quit make your choice by pressing key : How should I call the output List File (* search.list *) : globin.list |
When the search is terminated, you get into the OutputScreen.
Note at the top of the screen how many entries were found. You can here
select what you want using a menu with hot keys. You can before saving preview
the result on your screen (option P). If you are not satisfied you can go
back to the DatabaseScreen to change the databank to be
searched (option C) or to the QueryScreen to change the query
(option R). In case you intend to perform several complex queries with each
time a small difference, there is an option for saving the result without
quitting the program.
In the section QUERY/FIELDSQUERY you can find a list of text boxes for the various fields. You must there type in the search terms. If you want to type in instead a query in MRS query language, you can do this in the "Do instead this query in MRS language" box in the QUERY/FIELDSGENERAL section (usually at the bottom of the page).
If you type search terms in more than one field the logic between the fields is by default "and". You can change this : working interactively at the command line you can in the QueryScreen use the "hot key" L to toggle between AND, OR and NOT ; under a graphical interface you will find an "Interfield logic" check box or selector. Note that for "not" the query is not symmetric : it is the query_term_on_top but_not the_query_term_below. Often you will not be able to perform the query you want. In this case you must resort to typing a query in MRS language.
You can use the wild cards ? and *. ? stands for any character, * for any string of characters (including nothing). You can let MRS/indexsearch append automatically a wild card * at the end of every query term : you can do this at the command line with the "hot key" W in the QueryScreen and under a graphical user interface by setting "Append wild card * after each query string?" to "y".
# indexsearch output (EMBOSS List File) # Mon Oct 9 17:36:14 2006 # MRS database(s) searched : embl_release|embl_updates # query : globin AND duplication # 50 entries found embl:AL590842 # Yersinia pestis CO92 complete genome embl:CP000075 # Pseudomonas syringae pv. syringae B728a, complete genome. embl:J00153 # Homo sapiens HBAP1 pseudogene, complete cds; and hemoglobin alpha 2 (HBA2) and hemoglobin alpha 1 (HBA1) genes, complete cds. embl:J00176 # Human A-gamma-globin gene on chromosome 11, allele B. embl:K01898 # Human beta globin deletion mutation promoting Indian thalassemia. embl:M91036 # Homo sapiens G-gamma globin (G-gamma globin) and A-gamma globin (A-gamma globin) genes, complete cds. embl:U01317 # Human beta globin region on chromosome 11. embl:V00489 # Human alpha-globin gene with flanks. embl:V00490 # Human pseudogene for alpha-2 globin. embl:V00576 # Human repetitive sequence fragment located approximately 1300 base pairs 5' to the capping site of the human beta globin gene. embl:CR932181 # Paramecium tetraurelia, globin, putative, complete gene. embl:CR932184 # Paramecium tetraurelia, globin, putative, complete gene. embl:CR932185 # Paramecium tetraurelia, globin, putative, complete gene. embl:CR932197 # Paramecium tetraurelia, globin, putative, complete gene. embl:CR932204 # Paramecium tetraurelia, globin, putative, complete gene. embl:AY450927 # Macropus eugenii epsilon globin gene, complete cds. embl:AY450928 # Macropus eugenii beta globin gene, complete cds. embl:AY459589 # Macropus eugenii alpha globin gene, complete cds. embl:AY459590 # Macropus eugenii theta globin gene, complete cds. embl:J00047 # Goat beta-x-globin (psi-beta-x) pseudogene, complete cds with 3'flank. embl:J00048 # Goat beta-x-globin (psi-beta-x) pseudogene 3' flanking region. embl:J05174 # Gibbon gamma-1 and gamma-2 globin genes, complete cds. embl:K01671 # Goat germline-like beta-globin gene epsilon III, 5' end and flank. embl:K01672 # Goat germline beta-globin gene epsilon IV, 5' end and flank. embl:K02437 # Goat embryonic beta-globin epsilon-V pseudogene, complete sequence. embl:M15844 # Rabbit alpha-like globin gene cluster, zeta-1 region. embl:M15845 # Rabbit alpha-like globin gene cluster, theta-1 region. embl:M15846 # Rabbit alpha-like globin gene cluster, zeta-3 region. embl:M15847 # Rabbit alpha-like globin gene cluster alpha-1 globin gene, partial cds. embl:M91454 # Orangutan alpha-globin gene duplicate region. embl:M94631 # Hylobates lar (clone LambdaGialphaG1) 3'alpha1Alu1 D, 3'alpha1Alu1 E and 3'alpha1Alu1 F Alu repeat regions. embl:M94634 # Hylobates lar alpha2-globin and alpha1-globin genes, complete cds. embl:V00154 # Goat pseudogene psi-beta-Z for a beta-globin. embl:X53419 # M.mulatta gamma-globin-1(G), gamma-globin-2(A) genes and L1 LINE element embl:X53420 # A.geoffroyi gamma-globin gene and L1 LINE element embl:AE017042 # Yersinia pestis biovar Microtus str. 91001, complete genome. embl:AE017282 # Methylococcus capsulatus str. Bath, complete genome. embl:AL939104 # Streptomyces coelicolor A3(2) complete genome; segment 1/29 embl:AM286690 # Alcanivorax borkumensis SK2, complete genome embl:BX640428 # Bordetella parapertussis strain 12822, complete genome; segment 6/14 embl:CP000031 # Silicibacter pomeroyi DSS-3, complete genome. embl:CP000089 # Dechloromonas aromatica RCB, complete genome. embl:CP000090 # Ralstonia eutropha JMP134 chromosome 1, complete sequence. embl:CP000113 # Myxococcus xanthus DK 1622, complete genome. embl:CP000352 # Ralstonia metallidurans CH34, complete genome. embl:CP000377 # Silicibacter sp. TM1040, complete genome. embl:CP000378 # Burkholderia cenocepacia AU 1054 chromosome 1, complete sequence. embl:L44128 # Mus caroli L1Mc1, LINE-1 interspersed repetitive DNA, complete sequence. embl:L44129 # Mus caroli L1Mc1, LINE-1 interspersed repetitive DNA, complete sequence. embl:L44130 # Mus caroli L1Mc3, LINE-1 interspersed repetitive DNA, complete sequence. |
Note that the internal logic of indexsearch demands that you select the output format before you start the search. Sometimes you might want to preview a List with descriptions but save a List without descriptions or a fastA format databank. The only way is to change output format and then repeat the search ; note that working interactively at the command line you can from the OutputScreen return to the DatabaseScreen or the QueryScreen while retaining your query.
When you have performed a search with indexsearch you can give the output as input to textsearch for further refinement. textsearch (which searches the description/definition line for strings of characters) is too slow to search a big databank as EMBL but can be useful for smaller sets. You can also combine the results of different indexsearch runs using listor.
| Program name | Description |
|---|---|
| textsearch | Search sequence documentation. Slow, use SRS and Entrez! |
MRS is being developed by Maarten Hekkelman (CMBI, Radboud University, Toernooiveld 1, 6525 ED NIJMEGEN, The Netherlands).
MRS is currently distributed via berliOS. Questions and remarks should be mailed to the list mrs-user@lists.berlios.de.
Completed 20 December 2006