The Full Wiki

Transcription (genetics): Wikis


Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.


From Wikipedia, the free encyclopedia


This article is part of the series on:

Gene expression
a Molecular biology topic (portal)

Introduction to Genetics
General flow: DNA > RNA > Protein
special transfers (RNA > RNA,
RNA > DNA, Protein > Protein)
Genetic code
Transcription (Transcription factors,
RNA Polymerase,promoter)

Prokaryotic / Archaeal / Eukaryotic

post-transcriptional modification
Translation (Ribosome,tRNA)

Prokaryotic / Archaeal / Eukaryotic

post-translational modification
(functional groups, peptides,
structural changes
gene regulation
epigenetic regulation
(Genomic imprinting)
transcriptional regulation
post-transcriptional regulation
alternative splicing,miRNA)
translational regulation
post-translational regulation
ask a question , edit

Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA[1]. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes. During transcription, a DNA sequence is read by RNA polymerase, which produces a complementary, antiparallel RNA strand. As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all instances where thymine (T) would have occurred in a DNA complement.

Transcription is the first step leading to gene expression. The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. If the gene transcribed encodes for a protein, the result of transcription is messenger RNA (mRNA), which will then be used to create that protein via the process of translation. Alternatively, the transcribed gene may encode for either ribosomal RNA (rRNA) or transfer RNA (tRNA), other components of the protein-assembly process, or other ribozymes.

A DNA transcription unit encoding for a protein contains not only the sequence that will eventually be directly translated into the protein (the coding sequence) but also regulatory sequences that direct and regulate the synthesis of that protein. The regulatory sequence before (upstream from) the coding sequence is called the five prime untranslated region (5'UTR), and the sequence following (downstream from) the coding sequence is called the three prime untranslated region (3'UTR).

Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.[2]

As in DNA replication, DNA is read from 3' → 5' during transcription. Meanwhile, the complementary RNA is created from the 5' → 3' direction. Although DNA is arranged as two antiparallel strands in a double helix, only one of the two DNA strands, called the template strand, is used for transcription. This is because RNA is only single-stranded, as opposed to double-stranded DNA. The other DNA strand is called the coding strand, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). The use of only the 3' → 5' strand eliminates the need for the Okazaki fragments seen in DNA replication.

Transcription is divided into 5 stages: pre-initiation, initiation, promoter clearance, elongation and termination.


Major steps



In eukaryotes, RNA polymerase, and therefore the initiation of transcription, requires the presence of a core promoter sequence in the DNA. Promoters are regions of DNA which promote transcription and in eukaryotes, are found at -30, -75 and -90 base pairs upstream from the start site of transcription. Core promoters are sequences within the promoter which are essential for transcription initiation. RNA polymerase is able to bind to core promoters in the presence of various specific transcription factors.

The most common type of core promoter in eukaryotes is a short DNA sequence known as a TATA box, found -30 base pairs from the start site of transcription. The TATA box, as a core promoter, is the binding site for a transcription factor known as TATA binding protein (TBP), which is itself a subunit of another transcription factor, called Transcription Factor II D (TFIID). After TFIID binds to the TATA box via the TBP, five more transcription factors and RNA polymerase combine around the TATA box in a series of stages to form a preinitiation complex. One transcription factor, DNA helicase, has helicase activity and so is involved in the separating of opposing strands of double-stranded DNA to provide access to a single-stranded DNA template. However, only a low, or basal, rate of transcription is driven by the preinitiation complex alone. Other proteins known as activators and repressors, along with any associated coactivators or corepressors, are responsible for modulating transcription rate.

The transcription preinitiation in archaea, formerly a domain of prokaryote, is essentially homologous to that of eukaryotes, but is much less complex.[3] The archaeal preinitiation complex assembles at a TATA-box binding site; however, in archaea, this complex is composed of only RNA polymerase II, TBP, and TFB (the archaeal homologue of eukaryotic transcription factor II B (TFIIB)).[4][5]


Simple diagram of transcription initiation. RNAP = RNA polymerase

In bacteria, a domain of prokaryotes, transcription begins with the binding of RNA polymerase to the promoter in DNA. RNA polymerase is a core enzyme consisting of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. At the start of initiation, the core enzyme is associated with a sigma factor (number 70) that aids in finding the appropriate -35 and -10 base pairs downstream of promoter sequences.

Transcription initiation is more complex in eukaryotes. Eukaryotic RNA polymerase does not directly recognize the core promoter sequences. Instead, a collection of proteins called transcription factors mediate the binding of RNA polymerase and the initiation of transcription. Only after certain transcription factors are attached to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bind to the promoter, forming a transcription initiation complex. Transcription in the archaea domain is similar to transcription in eukaryotes.[6]

Promoter clearance

After the first bond is synthesized, the RNA polymerase must clear the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation and is common for both eukaryotes and prokaroytes[7]. Abortive initiation continues to occur until the σ factor rearranges, resulting in the transcription elongation complex (which gives a 35 bp moving footprint). The σ factor is released before 80 nucleotides of mRNA are synthesized[8]. Once the transcript reaches approximately 23 nucleotides, it no longer slips and elongation can occur. This, like most of the remainder of transcription, is an energy-dependent process, consuming adenosine triphosphate (ATP).

Promoter clearance coincides with phosphorylation of serine 5 on the carboxy terminal domain of RNA Pol in prokaryotes, which is phosphorylated by TFIIH.


Simple diagram of transcription elongation

One strand of DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly-formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one less oxygen atom) in its sugar-phosphate backbone).

Unlike DNA replication, mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene.

Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.


Simple diagram of transcription termination

Bacteria use two different strategies for transcription termination. In Rho-independent transcription termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C rich hairpin loop followed by a run of U's, which makes it detach from the DNA template. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex.

Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript followed by template-independent addition of As at its new 3' end, in a process called polyadenylation.

Measuring and detecting transcription

Transcription can be measured and detected in a variety of ways:

  • Nuclear Run-on assay: measures the relative abundance of newly formed transcripts
  • RNase protection assay and ChIP-Chip of RNAP: detect active transcription sites
  • RT-PCR: measures the absolute abundance of total or nuclear RNA levels, which may however differ from transcription rates
  • DNA microarrays: measures the relative abundance of the global total or nuclear RNA levels; however, these may differ from transcription rates
  • In situ hybridization: detects the presence of a transcript
  • MS2 tagging: by incorporating RNA stem loops, such as MS2, into a gene, these become incorporated into newly synthesized RNA. The stem loops can then be detected using a fusion of GFP and the MS2 coat protein, which has a high affinity, sequence specific interaction with the MS2 stem loops. The recruitment of GFP to the site of transcription is visualised as a single fluorescent spot. This remarkable new approach has revealed that transcription occurs in discontinuous bursts, or pulses (see Transcriptional bursting). With the notable exception of in situ techniques, most other methods provide cell population averages, and are not capable of detecting this fundamental property of genes[9].
  • Northern blot: the traditional method, and until the advent of RNA-Seq, the most quantitative
  • RNA-Seq: applies next-generation sequencing techniques to sequence whole transcriptomes, which allows the measurement of relative abundance of RNA, as well as the detection of additional variations such as fusion genes, post-translational edits and novel splice sites

Transcription factories

Active transcription units are clustered in the nucleus, in discrete sites called transcription factories or euchromatin. Such sites can be visualized by allowing engaged polymerases to extend their transcripts in tagged precursors (Br-UTP or Br-U) and immuno-labeling the tagged nascent RNA. Transcription factories can also be localized using fluorescence in situ hybridization or marked by antibodies directed against polymerases. There are ~10,000 factories in the nucleoplasm of a HeLa cell, among which are ~8,000 polymerase II factories and ~2,000 polymerase III factories. Each polymerase II factory contains ~8 polymerases. As most active transcription units are associated with only one polymerase, each factory usually contains ~8 different transcription units. These units might be associated through promoters and/or enhancers, with loops forming a ‘cloud’ around the factor.


A molecule which allows the genetic material to be realized as a protein was first hypothesized by François Jacob and Jacques Monod. RNA synthesis by RNA polymerase was established in vitro by several laboratories by 1965; however, the RNA synthesized by these enzymes had properties that suggested the existence of an additional factor needed to terminate transcription correctly.

In 1972, Walter Fiers became the first person to actually prove the existence of the terminating enzyme.

Roger D. Kornberg won the 2006 Nobel Prize in Chemistry "for his studies of the molecular basis of eukaryotic transcription".[10]

Reverse transcription

Scheme of reverse transcription

Some viruses (such as HIV, the cause of AIDS), have the ability to transcribe RNA into DNA. HIV has an RNA genome that is duplicated into DNA. The resulting DNA can be merged with the DNA genome of the host cell. The main enzyme responsible for synthesis of DNA from an RNA template is called reverse transcriptase. In the case of HIV, reverse transcriptase is responsible for synthesizing a complementary DNA strand (cDNA) to the viral RNA genome. An associated enzyme, ribonuclease H, digests the RNA strand, and reverse transcriptase synthesises a complementary strand of DNA to form a double helix DNA structure. This cDNA is integrated into the host cell's genome via another enzyme (integrase) causing the host cell to generate viral proteins which reassemble into new viral particles. Subsequently, the host cell undergoes programmed cell death, apoptosis.

Some eukaryotic cells contain an enzyme with reverse transcription activity called telomerase. Telomerase is a reverse transcriptase that lengthens the ends of linear chromosomes. Telomerase carries an RNA template from which it synthesizes DNA repeating sequence, or "junk" DNA. This repeated sequence of DNA is important because every time a linear chromosome is duplicated it is shortened in length. With "junk" DNA at the ends of chromosomes, the shortening eliminates some of the non-essential, repeated sequence rather than the protein-encoding DNA sequence farther away from the chromosome end. Telomerase is often activated in cancer cells to enable cancer cells to duplicate their genomes indefinitely without losing important protein-coding DNA sequence. Activation of telomerase could be part of the process that allows cancer cells to become technically immortal. However, the true in vivo significance of telomerase has still not been empirically proven.


  1. ^ "Transcription definition". Retrieved 11 October 2009. 
  2. ^ Berg J, Tymoczko JL, Stryer L (2006). Biochemistry (6th ed.). San Francisco: W. H. Freeman. ISBN 0716787245. 
  3. ^ Littlefield, O., Korkhin, Y., and Sigler, P.B. (1999). "The structural basis for the oriented assembly of a TBP/TFB/promoter complex". PNAS 96: 13668–13673. doi:10.1073/pnas.96.24.13668. 
  4. ^ Hausner, W; Thomm, M (2001). "Events during Initiation of Archaeal Transcription: Open Complex Formation and DNA-Protein Interactions". Journal of Bacteriology 183 (10): 3025–3031. doi:10.1128/JB.183.10.3025-3031.2001. PMID 11325929. 
  5. ^ Qureshi, SA; Bell, SD; Jackson, SP (1997). "Factor requirements for transcription in the archaeon Sulfolobus shibatae". EMBO Journal 16 (10): 2927–2936. doi:10.1093/emboj/16.10.2927. PMID 9184236. 
  6. ^ Mohamed Ouhammouch, Robert E. Dewhurst, Winfried Hausner, Michael Thomm, and E. Peter Geiduschek (2003). "Activation of archaeal transcription by recruitment of the TATA-binding protein". Proceedings of the National Academy of Sciences of the United States of America 100 (9): 5097. doi:10.1073/pnas.0837150100. PMID 12692306. 
  7. ^ Goldman, R.; Ebright, H.; Nickels, E. (May 2009). "Direct detection of abortive RNA transcripts in vivo". Science (New York, N.Y.) 324 (5929): 927–928. doi:10.1126/science.1169237. ISSN 0036-8075. PMID 19443781.  edit
  8. ^ Dvir, A (Sep 2002). "Promoter escape by RNA polymerase II". Biochimica et biophysica acta 1577 (2): 208–223. ISSN 0006-3002. PMID 12213653.  edit
  9. ^ Raj, A. and van Oudenaarden, A. (2008). Nature, nurture, or chance: stochastic gene expression and its consequences. Cell 135, 216-26.
  10. ^ "Chemistry 2006". Nobel Foundation. Retrieved 2007-03-29. 

See also

Further reading

  • Lehninger Principles of Biochemistry, 5th edition, David L. Nelson & Michael M. Cox
  • Principles of Nuclear Structure and Function, Peter R. Cook
  • Essential Genetics, Peter J. Russell

External links

Study guide

Up to date as of January 14, 2010
(Redirected to Eukaryotic transcription article)

From Wikiversity

Figure 2. Eukaryotic RNA polymerase II in a complex with DNA and mRNA [1].

The Nobel Prize in Chemistry in 2006 was awarded to Roger D. Kornberg for his work on the molecular basis of eukaryotic transcription [2].

This learning project allows exploration of scientific research that is related to 2006 Nobel Prize in Chemistry. If you have questions, leave them on the discussion page.



Figure 1. Information flow in living cells from DNA to RNA to protein.

Eukaryotic organisms such as humans store their genetic information in the structure of DNA molecules. Most of the genetic instructions in genes are used to specify the structures of proteins (Figure 1). The many proteins of cells function as molecular machines that produce the living state. Intermediate between DNA and protein is RNA. The process by which cells convert the structure of a gene into the structure of a mRNA molecule is called transcription. In eukaryotes, a large molecular complex of specialized proteins is required to achieve the careful control of which subset of genes is transcribed in each type of cell. At the core of this complex is the enzyme RNA polymerase II.

Roger Kornberg's laboratory has used biochemical techniques to isolate the large complex of proteins that allow RNA polymerase II to produce messenger RNA. Working with yeast cells as an experimental system, Kornberg's research team has isolated and purified the functioning RNA pol II protein complex with attached template DNA, product mRNA and substrate nucleotides and captured images of the complex using electron microscopy and X-ray crystallography [3] (see Figure 2)

Figure 3. A phylogenetic tree of living things, based on rRNA sequence data, showing the separation of bacteria, archaea, and eukaryotes [4].

General transcription factors

There are three major forms of life on Earth, bacteria, archaea and eukaryotes (Figure 3). RNA polymerase in bacteria is less complex than RNA polymerase in eukaryotes. Some of the increased complexity of RNA polymerase in eukaryotes reflects differences between DNA in eukaryotes and DNA in bacteria. Two important differences are that eukaryotes organize their DNA into nucleosomes and have more complex mechanisms for regulation of gene transcription.

Figure 4. The DNA of eukaryotes wraps around histone proteins forming nucleosomes. [5].

Nucleosomes are a complex of DNA and histone proteins (Figure 4). In order for transcription to occur, DNA must be released from being tightly coiled in nucleosomes. Bacteria do not have nucleosomes. Another complication of eukaryotic gene expression regulation is that gene sequences controlling transcription are often distant from the DNA site where transcription starts. The RNA polymerase of bacteria is relatively small with a core of five protein subunits and one additional protein that recognizes the start points for transcription[6]. In contrast, RNA polymerase II of yeast (Baker’s yeast Saccharomyces cerevisiae) has 12 protein subunits[1] and requires five general transcription factor proteins (TFIIB, D, E, F and H). The general transcription factors are complex, for example, TFIIH has at least six protein subunits in various eukaryotic organisms from yeast to mammals.

Searching for a full understanding of transcription

A As discussed above, Kornberg has made important contributions to on-going attempts to discover the full complexity of transcription in eykaryotes. In addition to the polymerase core and its associated general transcription factors, another large protein complex called Mediator is involved in the control of RNA polymerase II [7]. In some cells, Mediator is about 4 times larger than the RNA polymerase core complex (with as many as 20 different protein subunits) and is important for transmitting the effects of positive and negative regulators of gene transcription (often quite distant from the transcription start site) to the core polymerase. Efforts continue to reveal the details of how the Mediator complex functions.

Learning project; where to next?

Explore one of the following questions and then describe what you learn on this page:

  • How many types of RNA polymerase are there in humans and what does each type do?
  • What is known about transcription in archaea? (hint)
  • How do HIV proteins interaction with RNA polymerase II and induce transcription from the HIV-1 promoter? (hint: search here)


  1. 1.0 1.1 Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 A resolution by Averell L. Gnatt, Patrick Cramer, Jianhua Fu, David A. Bushnell and Roger D. Kornberg in Science (2001) Volume 292, pages 1876-1882.
  2. 2006 award at the Nobel Prize website.
  3. Advanced information on the Nobel Prize in Chemistry 2006: Molecular basis of eukaryotic transcription by Lars Thelander.
  4. "Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya" by C. R. Woese, O. Kandler, and M. L. Wheelis in Proceedings of the National Academy of Sciences U.S.A. (1990) Volume 87, pages 4576-4579. Full text online.
  5. McDonald D, "Milestone 9, (1973-1974) The nucleosome hypothesis: An alternative string theory, Nature Milestones: Gene Expression. (2005)
  6. The sigma subunit of Escherichia coli RNA polymerase senses promoter spacing by A. J. Dombroski, B. D. Johnson, M. Lonetto, and C. A. Gross in Proceedings of the National Academy of Sciences U.S.A. (1996) Volume 93, pages 8858–8862.
  7. "A multiprotein mediator of transcriptional activation and its interaction with the C-terminal repeat domain of RNA polymerase II" by Y. J. Kim, S. Bjorklund, Y. Li,, M. H. Sayre and R. D. Kornberg in Cell (1994) Volume 77, pages 599-608.

See also

External links

Simple English

Transcription is when RNA is made from DNA. The information is copied from one molecule to the other. The DNA sequence is copied by a special enzyme called RNA polymerase to make a matching RNA strand. This is called messenger RNA (mRNA), because it carries a genetic message from the DNA to the protein-making machinery of the cell. Transcription is the first step that leads to the expression of the genes.

The stretch of DNA that is transcribed into an RNA molecule is called a transcription unit. This contains:

  1. sequences which regulate the protein synthesis
  2. the code for amino acid sequence in the protein.[1]
File:Simple transcription
Simple diagram of transcription initiation. RNAP = RNA polymerase

As in DNA replication, only one of the two DNA strands is transcribed. This strand is called the template strand, because it provides the template for ordering the sequence of nucleotides in an RNA transcript. The other strand is called the coding strand. Its sequence is the same as the newly created RNA transcript (except for thymine being substituted for uracil).

The DNA template strand is read 3' → 5' direction by RNA polymerase and the new RNA strand is synthesized in the 5'→ 3' direction. RNA polymerase binds to the 3' end of a gene (promoter) on the DNA template strand and travels toward the 5' end.

File:Simple transcription
Simple diagram of transcription elongation
File:Simple transcription
Simple diagram of transcription termination


  1. Berg J. Tymoczko J.L. Stryer L. (2006). Biochemistry (6th ed. ed.). San Francisco: Freeman. ISBN 0716787245. 


Got something to say? Make a comment.
Your name
Your email address