1.4 Gene Finding

1 Transcription

1.1 Transcription

Transcription is the process of copying a segment of DNA into RNA, which is then used to synthesize a polypeptide.

1.2 DNA polymerase

RNA polymerase binds to promoter DNA, which are located close to the start sites of genes.

It then generates a transcription bubble, which separates the two strands of the DNA helix. This is done by breaking the hydrogen bonds between complementary DNA nucleotides.

1.3 Pre-RNA

While RNA is produced in the transcription bubble, we call it pre-RNA. Some base sequences make the transcription stop.

1.4 Ribosomes

Ribosomes translate RNA into polypeptides.

1.5 Messenger and non-coding RNA

The segments of DNA transcribed into RNA molecules that can encode proteins are said to produce messenger RNA (mRNA).

mRNA comprises only 1-3% of total RNA samples. Less than 2% of the human genome can be transcribed into mRNA.

1.6 Non-coding RNA

Other segments of DNA are copied into RNA molecules called non-coding RNAs (ncRNAs).

At least 80% of mammalian genomic DNA can be actively transcribed, with the majority of this 80% considered to be ncRNA.

There is no consensus in the literature on how much of non-coding transcription is functional.

1.7 Exercise: DNA to RNA

ATGAACAACCAACGGAA

2 Codons

2.1 Codons

Codons are triples of bases correspond. These are the units ribosomes use to translate between bases and aminoacids.

2.2 Mapping codons

Many representations are possible to show the translation between codons and aminoacids.

2.3 Possible aminoacids

Aminoacids have four different types: basic, acidic, polar and non-polar.

2.4 Mapping codons

Translate into the corresponding aminoacids:

ACUGAGACACUUGCG

2.5 Exercise: finding codons

In the full sequence, genes don’t have to start at the beginning of the sequence. What are the possible codons in the following sequence?

ACTTGACTACTTTA

2.6 Example in R: translating to aminoacids

> s
183-letter DNAString object
seq: ATGGGGCTCATTCAAGAAGAATGGAGCGATCAAAGTGTTACG...CATCTGACCACCCGAGGGGGAGAGCCGCACATGATAGTTAG
> Biostrings::translate(s)
61-letter AAString object
seq: MGLIQEEWSDQSVTGFQERNLKHVEHNEQEEKICDHAPHAAAHSPGVPSDHPRGRAAHDS*

3 Genes

3.1 Genes

The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. Genes can lead to the production of sequences of aminoacids. That is, proteins.

3.2 Starting and stopping

There are particular codons that can imply the start and the end of genes:

3.3 Exercise 1: finding potential genes

Find the possible genes in this sequence:

ATGGGGCTCATTCAAGAAGAATGGAGTAACTAG

3.4 Exercise 2: finding potential genes

Find the possible genes in this sequence:

GCTCATTCAAGAAGAATCGAGTAACTGCAG

4 Computationally

4.1 To find potential genes

  1. Find the potential start and end codons in the sequence.
  2. Match start codons to end codons. These end codons must be after the start codons, and there must a multiple of three bases between them.
  3. Extract the obtained sequences between start and end.

4.2 1, 2 and 3 reading frames

We can name sites as reading frames 1, 2 or 3 depending if their nucleotide has remainder 0, 1 or 2, respectively when divided by three. Start and end sites must be in the same reading frames.

4.3 Matching starts and ends

In each line, we march starts to the first available end.

5 Next

5.1 Next classes

We will learn how to look for codons and genes in DNA sequences in R.