Example of Multiple Sequence Input

One of the simplest examples of multiple sequence input is sequence data in the Pearson/Fasta format. This format looks like:

>locus1 GCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGC GCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGC GCGCGCGCGC >locus2 ATATATATATATATATATATATATATATATATATATATATATATATATAT ATATATATATATATATATATATATATATATATATATATATATATATATAT ATATATATAT >locus3 ACGTACGTACGT
The format can be descriped as follows. The first line should begin with a greater-than character ('>'), which indicates the start of a sequence. The first word after the the '>' is the name of the sequence, any other words on this line are considered to be a comment/title for the sequence. All lines until the next line beginning with a '>' (or until there are no more lines) should contain the raw sequence data (no spaces or numbers) of the first sequence. Lines should be no longer than 80 characters, but 50 or 60 is often the default line length for output. The next line beginning with a '>' (if there are any more lines) marks the start of the second sequence. Again, the first word after the '>' is the second sequence name and any additional words are the a comment/title for the second sequence. This second line beginning with a '>' should be followed by lines containing the sequence data up until another line beginning with a '>', (or until there are no more lines). Additional sequences can follow the second sequence in a similar fashion.


Kim C. Worley, Human Genome Center, Baylor College of Medicine (kworley@bcm.tmc.edu)
Last modified: Mon Jul 1 15:59:16 CDT 1996

.
BCM HGSC