.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/gallery/sequence/misc/orf_identification.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_gallery_sequence_misc_orf_identification.py: Identification of potential open reading frames =============================================== This example script searches for potential open reading frames (ORFs) in the Porcine circovirus genome. At first we will download and read the Porcine circovirus genome. For translation we will use the default codon table (eukaryotes), since domestic pigs are the host of the virus. Since we want to perform a six-frame translation we have to look at the complementary strand of the genome as well. .. GENERATED FROM PYTHON SOURCE LINES 15-46 .. rst-class:: sphx-glr-script-out .. code-block:: none Forward strand: 50 - 995: MPSKKNGRSGPQPHKRWVFTLNNPSEDERKKIRELPISLFDYFIVGEEGNEEGRTPHLQGFANFVKKQTFNKVKWYFGARCHIEKAKGTDQQNKEYCSKEGNLLIECGAPRSQGQRSDLSTAVSTLLESGSLVTVAEQHPVTFVRNFRGLAELLKVSGKMQKRDWKTNVHVIVGPPGCGKSKWAANFADPETTYWKPPRNKWWDGYHGEEVVVIDDFYGWLPWDDLLRLCDRYPLTVETKGGTVPFLARSILITSNQTPLEWYSSTAVPAVEALYRRITSLVFWKNATEQSTEEGGQFVTLSPPCPEFPYEINY* 66 - 114: MEEADPNHTKGGCSR* 198 - 246: MRKDEHPTSRGSLIL* 271 - 319: MVFRCPLPHRESERN* 367 - 382: MWSS* 527 - 995: MQKRDWKTNVHVIVGPPGCGKSKWAANFADPETTYWKPPRNKWWDGYHGEEVVVIDDFYGWLPWDDLLRLCDRYPLTVETKGGTVPFLARSILITSNQTPLEWYSSTAVPAVEALYRRITSLVFWKNATEQSTEEGGQFVTLSPPCPEFPYEINY* 552 - 732: MYTSLWGHLGVVKANGLLILQTRKPHTGNHLETSGGMVTMVKKWLLLMTFMAGCPGMIY* 595 - 607: MGC* 660 - 732: MVTMVKKWLLLMTFMAGCPGMIY* 669 - 732: MVKKWLLLMTFMAGCPGMIY* 693 - 732: MTFMAGCPGMIY* 702 - 732: MAGCPGMIY* 720 - 732: MIY* 832 - 970: MVLLNCCPSCRSSLSEDYFLGILEECYRTIHGGRGPVRHPFPPMP* 906 - 987: MLQNNPRRKGASSSPFPPHALNFHMK* 961 - 970: MP* 978 - 987: MK* 1015 - 1036: MVFIIH* 1084 - 1192: MVTRILYSWSYILFSNAVPRPTWSTFPVVCSLSHS* 1285 - 1306: MVWREE* 1290 - 1362: MAGGVVYIGVIGEGCGLRDKVII* 1522 - 1630: MSTAQEGVLTVVFLIVYPKVRERRVLKMPFFLLQR* 1603 - 1630: MPFFLLQR* 1680 - 1740: MAAGAVSSSPVTPPWIRHI* Reverse strand: 33 - 735: MTYPRRRYRRRRHRPRSHLGQILRRRPWLVHPRHRYRWRRKNGIFNTRLSRTFGYTIKKTTVRTPSWAVDMMRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP* 157 - 247: MASSTPASPAPSDILSRKPQSERPPGRWT* 243 - 735: MMRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP* 246 - 735: MRFNINDFLPPGGGSNPRSVPFEYYRIRKVKVEFWPCSPITQGDRGVGSSAVILDDNFVTKATALTYDPYVNYSSRHTITQPFSYHSRYFTPKPVLDSTIDYFQPNNKRNQLWLRLQTTGNVDHVGLGTAFENSIYDQEYNIRVTMYVQFREFNLKDPPLNP* 262 - 328: MTFFPQEGAQTPARCPLNTTE* 409 - 460: MITLSRRPQPSPMTPM* 445 - 460: MTPM* 454 - 460: M* 607 - 613: M* 681 - 735: MYVQFREFNLKDPPLNP* 685 - 742: MYNSENLILKTPHLTLNE* 734 - 779: MNNKNHYEVIKRTQ* 789 - 822: MEIQGMGGKG* 804 - 822: MGGKG* 1014 - 1080: MDIDHTVSVDHPRAASHKSHQ* 1096 - 1411: MVTIPPLVSRWFPVCGFRVCKISSPFAFTTPRWPHNDVYISLPITLLHFPAHFQKFSQPAEISDKRYRVLLCNGHQTPALQQGTHSSRQVTPLSLRSRSSTFNQ* 1137 - 1164: MWFPGLQN* 1202 - 1382: MTCTLVFQSRFCIFPLTFKSSASPRKFLTNVTGCCSATVTRLPLSSKVLTAVDRSLRCP* 1469 - 1595: MWQRAPKYHFTLLNVCFFTKLANPWRWGVRPSSLPSSPTIK* 1509 - 1533: MFASSQN* 1715 - 1766: MLLLRCCRGAAAAEVRW | .. code-block:: Python # Code source: Patrick Kunzmann # License: BSD 3 clause import biotite.database.entrez as entrez import biotite.sequence.io.fasta as fasta # Download Porcine circovirus genome file = entrez.fetch("KP282147", None, "fa", "nuccore", "fasta") fasta_file = fasta.FastaFile.read(file) genome = fasta.get_sequence(fasta_file) # Perform translation for forward strand proteins, positions = genome.translate() print("Forward strand:") for i in range(len(proteins)): print( "{:4d} - {:4d}: {:}".format( positions[i][0], positions[i][1], str(proteins[i]) ) ) print("\n") # Perform translation for complementary strand genome_rev = genome.reverse().complement() proteins, positions = genome_rev.translate() print("Reverse strand:") for i in range(len(proteins)): print( "{:5d} - {:5d}: {:}".format( positions[i][0], positions[i][1], str(proteins[i]) ) ) .. _sphx_glr_download_examples_gallery_sequence_misc_orf_identification.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: orf_identification.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: orf_identification.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_