FASTA format
FASTA is a text-based format for storing data about nucleotide or amino acid sequences. Each nucleotide or amino acid is represented by a single ASCII letter.
Sequences can be stored in a text file in FASTA format in the following way. A line beginning with >
indicates that the sequence will start on the following line. (The line with the >
may contain a name or unique identifier for the sequence.) The sequences themselves consist of single-letter codes, many of which are familiar. For example, A
indicates the presence of adeninein the sequence.
Here is an example sequence in FASTA format:
>gi|186681228|ref|YP_001864424.1| phycoerythrobilin:ferredoxin oxidoreductase
MNSERSDVTLYQPFLDYAIAYMRSRLDLEPYPIPTGFESNSAVVGKGKNQEEVVTTSYAFQTAKLRQIRA
AHVQGGNSLQVLNFVIFPHLNYDLPFFGADLVTLPGGHLIALDMQPLFRDDSAYQAKYTEPILPIFHAHQ
QHLSWGGDFPEEAQPFFSPAFLWTRPQETAVVETQVFAAFKDYLKAYLDFVEQAEAVTDSQNLVAIKQAQ
LRYLRYRAEKDPARGMFKRFYGAEWTEEYIHGFLFDLERKLTVVK
This post is part of a series. The most recent post in the series is “Nanopore sequencing”. Learn when new posts appear by subscribing (RSS). You may also follow @briefly-noted@write.as
in Mastodon or subscribe for email updates.