Example of Multiple Sequence Input
One of the simplest examples of multiple sequence input is sequence data
in the Pearson/Fasta format. This format looks like:
>locus1
GCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGC
GCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGC
GCGCGCGCGC
>locus2
ATATATATATATATATATATATATATATATATATATATATATATATATAT
ATATATATATATATATATATATATATATATATATATATATATATATATAT
ATATATATAT
>locus3
ACGTACGTACGT
The format can be descriped as follows. The first line should begin
with a greater-than character ('>'), which indicates the start
of a sequence. The
first word after the the '>' is the name of the sequence, any other
words on this line are considered to be a comment/title for the sequence.
All lines until the next line beginning with a '>' (or until there are
no more lines) should contain the
raw sequence data (no spaces or numbers) of the first sequence. Lines
should be no longer than 80 characters, but 50 or 60 is often the
default line length for output. The next line beginning with a '>'
(if there are any more lines) marks the start of the second sequence.
Again, the first word after the '>' is the second sequence name and any
additional words are the a comment/title for the second sequence. This
second line beginning with a '>' should be followed by lines containing
the sequence data up until another line beginning with a '>', (or until
there are no more lines). Additional sequences can follow the second
sequence in a similar fashion.
Kim C. Worley, Human Genome Center, Baylor College of Medicine (kworley@bcm.tmc.edu)
Last modified: Mon Jul 1 15:59:16 CDT 1996
|