Examples
It is often easier to see how to use code through some well-worked examples. Here we provide worked examples for every major feature of protfasta.
read_fasta examples
Example 1 - Simple read
import protfasta
sequences = protfasta.read_fasta('inputfile.fasta')
# sequences is a dict: {header: sequence}
Example 2 - Allow duplicate FASTA records and return a list
import protfasta
sequences = protfasta.read_fasta('inputfile.fasta',
expect_unique_header=False,
return_list=True,
duplicate_record_action='ignore')
Example 3 - Correct invalid residues using the standard table
import protfasta
sequences = protfasta.read_fasta('inputfile.fasta',
invalid_sequence_action='convert')
Example 4 - Correct invalid residues using a custom dictionary
import protfasta
CD = {'U': 'G', '-': ''}
sequences = protfasta.read_fasta('inputfile.fasta',
invalid_sequence_action='convert',
correction_dictionary=CD)
Example 5 - Convert what can be converted, drop the rest
import protfasta
sequences = protfasta.read_fasta('inputfile.fasta',
invalid_sequence_action='convert-remove')
Example 6 - Parse an alignment (keep gap characters)
import protfasta
aln = protfasta.read_fasta('alignment.fasta',
alignment=True,
invalid_sequence_action='convert')
Example 7 - Custom header parser
Extract just the UniProt accession from a structured header such as
>sp|P12345|NAME_HUMAN ...:
import protfasta
def get_accession(header):
return header.split('|')[1]
sequences = protfasta.read_fasta('uniprot.fasta',
header_parser=get_accession)
Example 8 - Remove duplicate sequences and write directly to disk
import protfasta
protfasta.read_fasta('inputfile.fasta',
duplicate_sequence_action='remove',
output_filename='unique.fasta')
Example 9 - Fast read with no sanity checking
By default protfasta performs a lot of sanity checking. When you already trust the input file, you can turn it all off:
import protfasta
sequences = protfasta.read_fasta('inputfile.fasta',
invalid_sequence_action='ignore',
duplicate_record_action='ignore',
duplicate_sequence_action='ignore',
expect_unique_header=False)
iter_fasta examples (large files)
For files that are too large to hold in memory, use
iter_fasta to stream records one at a time.
Example 10 - Stream a huge FASTA file
import protfasta
long_seq_count = 0
for header, seq in protfasta.iter_fasta('metagenome.fasta'):
if len(seq) > 1000:
long_seq_count += 1
print(long_seq_count)
Example 11 - Streaming filter + write
import protfasta
out = []
for header, seq in protfasta.iter_fasta('huge.fasta'):
if 100 <= len(seq) <= 500:
out.append([header, seq])
protfasta.write_fasta(out, 'filtered.fasta')
write_fasta examples
Example 12 - Write from a dictionary
import protfasta
sequence_in = {'seq1': 'MEEPQSDPSVEPPLS',
'seq2': 'DEAPRMPEAAPPVAPA'}
protfasta.write_fasta(sequence_in, 'example.fasta')
Example 13 - Write from a list
import protfasta
sequence_in = [['seq1', 'MEEPQSDPSVEPPLS'],
['seq2', 'DEAPRMPEAAPPVAPA']]
protfasta.write_fasta(sequence_in, 'example.fasta')
Example 14 - Single-line sequences
import protfasta
protfasta.write_fasta(sequence_in, 'example.fasta', linelength=None)
Example 15 - Append to an existing FASTA file
import protfasta
protfasta.write_fasta(sequence_in, 'archive.fasta',
append_to_fasta=True)