Bioinformatics in Python: DNA Toolkit. Part 2: Transcription, Reverse Complement

Показать описание

🚀 [DESCRIPTION]:

In this lesson, we continue enhancing our DNA Toolkit. You'll add two more crucial functions: transcription and reverse_complement. We'll also dive into restructuring and re-formatting output, implementing nucleotide colored output for better readability, and exploring the power of docstrings for clean code documentation.
#DNAToolkit #Bioinformatics #PythonProgramming #Transcription #ReverseComplement #Docstrings #PythonColors #CodingTutorial

🔹🔹🔹🔹🔹

💻 [GITHUB REPO]:

🔹🔹🔹🔹🔹

🔗 [VIDEO LINKS]:

Amazing Python lessons by Corey Schafer:

Python colorized output:

colored() function code:

🔹🔹🔹🔹🔹

✨ [CONNECT WITH US]!

Stay updated and join our growing community of bio-coders and rebel thinkers!

🔹🔹🔹🔹🔹

🔬 [JOIN THE REBEL SCIENCE COMMUNITY]!

Engage with us, ask questions, and share your ideas!

🔹🔹🔹🔹🔹

💖 [SUPPORT REBELSCIENCE]!

Love what we do? Your support helps us create more open-source bioinformatics tools and content! You can support us with a one-time donation or a subscription.

Forum subscriptions via Stripe come with many benefits, including a dedicated community, community-only projects and discussions (like Rosalind and other related courses), and opportunities for collaboration on different projects.

Any donations will be used for new content creation and to cover server and community management expenses. Thank you for being a part of rebelScience!

🔹🔹🔹🔹🔹

Рекомендации по теме

Комментарии

Dear Rebel
please note that you have to use complement only and not reverse complement in order to match the original DNA string with it, otherwise you will not get the expected result as "A" should be matched with "T" and "G" should be matched with "C" and just simple review of your result you will find that this not the case. I think you have to remove [::-1] from reverse_complement function and you will get the expected result.

dlearhasan

One of the users left a very good comment here, which is not visible anymore, for some odd reason. He brought up a very good point. at 7:20 we print out the following:

[5] + DNA String + Reverse Complement:
5' AAAATCGGCGTTTGGCCCCTTTTGCCCC 3'
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3' GGGGCAAAAGGGGCCAAACGCCGATTTT 5

3' -> 5' part is really confusing as it actually is 5' -> 3' as we have already reversed it there. So the correct output should be:

[5] + DNA String + Complement + Reverse Complement:
5' AAAATCGGCGTTTGGCCCCTTTTGCCCC 3'
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
3' TTTTAGCCGCAAACCGGGGAAAACGGGG 5' (complement)
5' GGGGCAAAAGGGGCCAAACGCCGATTTT 3' (reverse complement)

So thanks to the person who brought it up. An amazing example of collaborative work. If anyone finds something wrong/confusing, please feel free to leave a comment here. I will also bring this up in our next video and we will correct the output.

rebelScience

Awesome Video!..Not many great bioinfo vids on YT. Thanks

TheCuriousLifeOfCode

At 7:27 i think you dont need to reverse the bottom part of DNA helix, because the output shows AAA is connected to TAT and ACG in connected to ATG and so on...

matic

Great series.

If anyone wants the transcript function to swap out the other nucleotides and make an RNA transcript of the randomly generated sequence, this will work in the DNAToolkit file:

("import random" to whichever files require it)

def transcription(seq):
"""DNA->RNA Transcription Product, Replacing Thymine with Uracil"""
seq = seq.replace('C', 'c')
char_to_replace = {'A': 'U',
'T': 'A',
'G': 'C',
'c': 'G'}
# Iterate over all key-value pairs in dictionary
for key, value in char_to_replace.items():
# Replace key character with value character in string
seq = seq.replace(key, value)
return seq

And this will work in the bio_seq file later on (lesson 8):

def transcription(self):
"""DNA->RNA Transcription Product, Replacing Thymine with Uracil"""
self.seq = self.seq.replace('C', 'c')
char_to_replace = {'A': 'U',
'T': 'A',
'G': 'C',
'c': 'G'}
#replace to "G"
# Iterate over all key-value pairs in dictionary
for key, value in char_to_replace.items():
# # Replace key character with value character in string
self.seq = self.seq.replace(key, value)
return self.seq.replace(key, value)

It is KEY to have the first line after the docstring convert Cytosine to a lower case value, which is passed through instead of "C" in the "replace to G" line. If this is not done, then ALL "C"s, original and transcribed, are converted to "G". These functions work with the nucleotide-colouring function in "Utilities.py"

If there is anything wrong with this explanation or it breaks a communication convention let me know. I am quite new to the IT side of this.

billal

Docstrings are cool! Thanks rebelCoder.

amitrupani

Thank you for this series, it was ready easy to follow along!

regx_

Hi rebel, shall we add [::-1]? I get the result without this part and I do not know how this code works with you. may u explain, plz?

mohammedamertaha

Thanks for doing this videos, they're really useful!

lucianoinso

I just want to share some notes I have. There is probably a more efficient way for this to be done, but I figured out a way to do transcription and make a reverse complement for a given DNA sequence.

Transcription for a given sequence:

main:
# DNA Toolset/Code testing file
from DNAToolkit import *

#this line here is where to put the code to be transcriped
givenDNAStr = "AGTC"
DNAStr = validateSeq(givenDNAStr)

print(transcription(DNAStr))

DNATool kit:
Nucleotides = ["A", "C", "G", "T"]

# Check the sequence to make sure it is a DNA string
def validateSeq(dna_Seq):
tmpseq = dna_Seq.upper()
for nuc in tmpseq:
if nuc not in Nucleotides:
return False
return tmpseq

def transcription(seq):
# DNA to RNA a process known as transcription
return seq.replace("T", "U")

For a reverse complement of a given DNA sequence:

main:

# DNA Toolset/Code testing file
from DNAToolkit import *

# insert a desired sequence for complement:
givenDNAStr = 'AGTC'

DNAStr = validateSeq(givenDNAStr)

print(f' DNA sequence: {DNAStr}\n')

print(f'Complementary DNA sequence:

DNATool kit:
Nucleotides = ["A", "C", "G", "T"]
DNA_ReverseComplement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': "G"}

# I am not sure what this line of code is doing, but the results won't run in the terminal without it
def validateSeq(dna_Seq):
tmpseq = dna_Seq.upper()
for nuc in tmpseq:
if nuc not in Nucleotides:
return False
return tmpseq

def reverse_complement(seq):
return for nuc in seq])

LumTheAlien

at 7:40, i think you have a mistake. A-A is wrong.

manhtuan

great videos, thanks for all your work

stengah

Thanks, very well explained, regards from Colombia.

cristiancamilocanizalessil

Does anyone knows why when I try to color the sequences, in the terminal it shows the sequences on two or more lines???

riccardo

hi, please help me,
how to show colored on spyder. I do not find terminal

PhanCanhTrinhPlus

Amazing content, how can i get horizontal neon colored current line theme?

hyperionspring

Good video. But I thing you’ve made a mistake.
When you’re doing the reverse complement the sense change. It becomes also 5’ to 3’

techlife

The transaction tool is wrong bro. In transcription RNA is the complement and not just replacing T with U if it is a template seq. If that is a codig seq. U r right

angsumandas

Do the anaconda promt must show the colours? I use spyder and when I run the script the screen of anaconda it only shows the codes for each colour with each letter, as the output did in the video. Someone could help me? Thank you for the amazing work with these videos

juanmaruizrobles

Hi Rebel!
first of all thanks for your amazing videos.
unfortunately, I still get those garbage even on my terminal. what should I do?
and could you tell me which modules I should learn for bioinformatics?

ghazal

Bioinformatics in Python: DNA Toolkit. Part 2: Transcription, Reverse Complement

Bioinformatics in Python: DNA Toolkit. Part 1: Validating and counting nucleotides.

Bioinformatics in Python: DNA Toolkit. Part 6: Protein search in a reading frame

Bioinformatics in Python: DNA Toolkit. Part 7: A search for a real protein from NCBI database

Bioinformatics in Python: DNA Toolkit. Part 4: Translation, Codon Usage

Bioinformatics in Python: DNA Toolkit. Part 9: RNA, Helper functions

Bioinformatics in Python: DNA Toolkit. Part 8.1: Code refactoring into a bio_seq class

Bioinformatics in Python: DNA Toolkit. Part 5: Open Reading Frames

Bioinformatics in Python: DNA Toolkit. Part 2: Transcription, Reverse Complement

Bioinformatics in Python: DNA Toolkit. Part 8.2: Code refactoring into a bio_seq class

Bioinformatics in Python: DNA Toolkit. Part 3: GC Content Calculation

Bioinformatics Tools Programming in Python with Qt. Part 2.

Python 3 for Bioinformatics 4: How to plot Parsed DNA sequences

Python DNA analysis script (PRACTICE)

Python for bioinformatics | how to create strings and sequences (DNA, RNA)

Python3 for bioinformatics 1 : GC content from raw entry

Python Bioinformatics - DNA And Counting Bases

Genome Toolkit. Part 1: project setup

Python 3 bioinformatics working on codon positions in DNA sequences

How to design a bioinformatics project on cancer. #bioinformatics #biotechnology #biology #genomics

Biopython tutorial

Bioinformatics: Why Python In Bioinformatics And Code Editor Selection | BioCode Ltd

Python for bioinformatics (DNA Structure and Validating)

Python for Bioinformatics - Calculating GC Content of a DNA Sequence

Python 3 for Bioinformatics 6: GUI application calculate GC content