Note
Go to the end to download the full example code
Identification of potential open reading frames#
This example script searches for potential open reading frames (ORFs) in the Porcine circovirus genome.
At first we will download and read the Porcine circovirus genome. For translation we will use the default codon table (eukaryotes), since domestic pigs are the host of the virus.
Since we want to perform a six-frame translation we have to look at the complementary strand of the genome as well.
Forward strand:
50 - 995: MPSKKNGRSGPQPHKRWVFTLNNPSEDERKKIRELPISLFDYFIVGEEGNEEGRTPHLQGFANFVKKQTFNKVKWYFGARCHIEKAKGTDQQNKEYCSKEGNLLIECGAPRSQGQRSDLSTAVSTLLESGSLVTVAEQHPVTFVRNFRGLAELLKVSGKMQKRDWKTNVHVIVGPPGCGKSKWAANFADPETTYWKPPRNKWWDGYHGEEVVVIDDFYGWLPWDDLLRLCDRYPLTVETKGGTVPFLARSILITSNQTPLEWYSSTAVPAVEALYRRITSLVFWKNATEQSTEEGGQFVTLSPPCPEFPYEINY*
66 - 114: MEEADPNHTKGGCSR*
198 - 246: MRKDEHPTSRGSLIL*
271 - 319: MVFRCPLPHRESERN*
367 - 382: MWSS*
527 - 995: MQKRDWKTNVHVIVGPPGCGKSKWAANFADPETTYWKPPRNKWWDGYHGEEVVVIDDFYGWLPWDDLLRLCDRYPLTVETKGGTVPFLARSILITSNQTPLEWYSSTAVPAVEALYRRITSLVFWKNATEQSTEEGGQFVTLSPPCPEFPYEINY*
552 - 732: MYTSLWGHLGVVKANGLLILQTRKPHTGNHLETSGGMVTMVKKWLLLMTFMAGCPGMIY*
595 - 607: MGC*
660 - 732: MVTMVKKWLLLMTFMAGCPGMIY*
669 - 732: MVKKWLLLMTFMAGCPGMIY*
693 - 732: MTFMAGCPGMIY*
702 - 732: MAGCPGMIY*
720 - 732: MIY*
832 - 970: MVLLNCCPSCRSSLSEDYFLGILEECYRTIHGGRGPVRHPFPPMP*
906 - 987: MLQNNPRRKGASSSPFPPHALNFHMK*
961 - 970: MP*
978 - 987: MK*
1015 - 1036: MVFIIH*
1084 - 1192: MVTRILYSWSYILFSNAVPRPTWSTFPVVCSLSHS*
1285 - 1306: MVWREE*
1290 - 1362: MAGGVVYIGVIGEGCGLRDKVII*
1522 - 1630: MSTAQEGVLTVVFLIVYPKVRERRVLKMPFFLLQR*
1603 - 1630: MPFFLLQR*
1680 - 1740: MAAGAVSSSPVTPPWIRHI*
Reverse strand:
33 - 735: MTYPRRRYRRRRHRPRSHLGQILRRRPWLVHPRHRYRWRRKNGIFNTRLSRTFGYTIKKTTVRTPSWAVDMMRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP*
157 - 247: MASSTPASPAPSDILSRKPQSERPPGRWT*
243 - 735: MMRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP*
246 - 735: MRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP*
262 - 328: MTFFPQEGAQTPARCPLNTTE*
409 - 460: MITLSRRPQPSPMTPM*
445 - 460: MTPM*
454 - 460: M*
607 - 613: M*
681 - 735: MYVQFREFNLKDPPLNP*
685 - 742: MYNSENLILKTPHLTLNE*
734 - 779: MNNKNHYEVIKRTQ*
789 - 822: MEIQGMGGKG*
804 - 822: MGGKG*
1014 - 1080: MDIDHTVSVDHPRAASHKSHQ*
1096 - 1411: MVTIPPLVSRWFPVCGFRVCKISSPFAFTTPRWPHNDVYISLPITLLHFPAHFQKFSQPAEISDKRYRVLLCNGHQTPALQQGTHSSRQVTPLSLRSRSSTFNQ*
1137 - 1164: MWFPGLQN*
1202 - 1382: MTCTLVFQSRFCIFPLTFKSSASPRKFLTNVTGCCSATVTRLPLSSKVLTAVDRSLRCP*
1469 - 1595: MWQRAPKYHFTLLNVCFFTKLANPWRWGVRPSSLPSSPTIK*
1509 - 1533: MFASSQN*
1715 - 1766: MLLLRCCRGAAAAEVRW
# Code source: Patrick Kunzmann
# License: BSD 3 clause
import biotite.database.entrez as entrez
import biotite.sequence.io.fasta as fasta
# Download Porcine circovirus genome
file = entrez.fetch("KP282147", None, "fa", "nuccore", "fasta")
fasta_file = fasta.FastaFile.read(file)
genome = fasta.get_sequence(fasta_file)
# Perform translation for forward strand
proteins, positions = genome.translate()
print("Forward strand:")
for i in range(len(proteins)):
print(
"{:4d} - {:4d}: {:}".format(
positions[i][0], positions[i][1], str(proteins[i])
)
)
print("\n")
# Perform translation for complementary strand
genome_rev = genome.reverse().complement()
proteins, positions = genome_rev.translate()
print("Reverse strand:")
for i in range(len(proteins)):
print(
"{:5d} - {:5d}: {:}".format(
positions[i][0], positions[i][1], str(proteins[i])
)
)