Molecular sequence analysis: EMBOSS
Molecular sequence analysis includes many different aspects, and therefore requires many different computational functions. EMBOSS is an integrated package of programs aimed at covering many of the main elements encountered when analyzing molecular sequence data.
Identifying available databases
- showdb
Searching for sequences
- textsearch
Information on sequences
- infoseq
- entret
Example: analysis of rhodopsin sequences
- textsearch: search 'genbank' for 'rhodopsin'; retrieve only USA, save in 'rhodopsin.usa'
genbank-id:AB009620
genbank-id:AB009621
genbank-id:AB009622
genbank-id:AB009623
genbank-id:AB029320
genbank-id:AB059748
- infoseq: sequence filename '@rhodopsin.usa'
- entret: sequence filename '@rhodopsin.usa'
- showfeat: sequence filename '@rhodopsin.usa'
- coderet: sequence filename '@rhodopsin.usa', output filename 'rhodopsin.cds', output sequence name 'rhodopsin'; save sequences in 'rhodopsin.fasta'
- clustalx (not part of EMBOSS): load sequences, do alignment, save in Phylip format 'rhodopsin.phy'
- fdnapars: multiple sequence filename 'rhodopsin.phy'
- drawtree (not part of EMBOSS): treefile 'rhodopsin.tree', font file '/usr/pkg/share/fonts/phylip/font1'; output file is 'plotfile'
Additional resources
EMBOSS home pageEMBOSS tutorial