# 16.E: Transcription regulation via effects on RNA polymerases (Exercises) - Biology

16.1 The ratio [RDs]/[Ds] is the concentration of a hypothetical repressor (R) bound to its specific site on DNA divided by the concentration of unbound DNA, i.e. it is the ratio of bound DNA to free DNA. When the measured [RDs]/[Ds] is plotted versus the concentration of free repressor [R], the slope of the plot showed that the ratio [RDs]/[Ds] increased linearly by 60 for every increase of 1x10-11 M in [R]. What is the binding constant Ks for association of the repressor with its specific site?

16.2 The binding of the protein TBP to a labeled short duplex oligonucleotide containing a TATA box (the probe) was investigated quantitatively. The following table gives the fraction of total probe bound (column 2) and the ratio of bound to free probe (column 3) as a function of [TBP]. These data are provided courtesy of Rob Coleman and Frank Pugh.

[TBP]

nM

0.10

0.040

0.042

0.20

0.16

0.19

0.30

0.33

0.5

0.40

0.44

0.78

0.50

0.52

1.1

0.70

0.62

1.6

1.0

0.71

2.45

2.0

0.83

4.88

3.0

0.87

6.69

5.0

0.93

14

10

0.97

32.3

20

0.99

99

Plot the data for the two different measures of bound probe. Note that since the denominator for column 2 is a constant, the ratio of bound to total probe will level off, whereas the amount of free probe can continue to decrease with increasing [TBP], and thereby getting a continuing increase in the ration of bound to free probe.

What is the equilibrium constant for TBP binding to the TATA box?

16.3 What is the fate of the lac repressor after it binds the inducer?

16.4 How does the lac repressor prevent transcription of the lacoperon?

For the next two questions, let's imagine that you mixed increasing amounts of the DNA binding protein called AP1 with a constant amount of a labeled duplex oligonucleotide containing the binding site (TGACTCA). After measuring the fraction of DNA bound by AP1 (i.e. the fractional occupancy) as a function of [AP1], the data were analyzed by nonlinear, least squares regression analysis at a wide range of possible values for DG. The error associated with the fit of each of those values to experimental data is shown below; the higher the variance of fit, the larger the error.

16.5 What is the most accurate value of DG for binding of AP1 to this duplex oligonucleotide?

16.6 What is the most accurate measure of the equilibrium constant, Ks, for binding of AP1 to this duplex oligonucleotide?

For the next two problems, consider a hypothetical eubacterial operon in which the operator overlaps the -10 region of the promoter. Measurement of the lag time before production of abortive transcripts (in an abortive initiation assay) as a function of the inverse of the RNA polymerase concentration (1/[RNAP]) gave the results shown below. The filled circles are the results of the assay in the absence of repressor, and the open circles are the results in the presence of repressor bound to the operator.

16.7 What is the value of the forward rate constant (kf ) for closed to open complex formation under the two different conditions?

16.8 What is the value of the equilibrium constant (KB ) for binding of the RNA polymerase to the promoter under the 2 conditions?

## Contents

The 2006 Nobel Prize in Chemistry was awarded to Roger D. Kornberg for creating detailed molecular images of RNA polymerase during various stages of the transcription process. [3]

In most prokaryotes, a single RNA polymerase species transcribes all types of RNA. RNA polymerase "core" from E. coli consists of five subunits: two alpha (α) subunits of 36 kDa, a beta (β) subunit of 150 kDa, a beta prime subunit (β′) of 155 kDa, and a small omega (ω) subunit. A sigma (σ) factor binds to the core, forming the holoenzyme. After transcription starts, the factor can unbind and let the core enzyme proceed with its work. [4] [5] The core RNA polymerase complex forms a "crab claw" or "clamp-jaw" structure with an internal channel running along the full length. [6] Eukaryotic and archaeal RNA polymerases have a similar core structure and work in a similar manner, although they have many extra subunits. [7]

All RNAPs contain metal cofactors, in particular zinc and magnesium cations which aid in the transcription process. [8] [9]

Control of the process of gene transcription affects patterns of gene expression and, thereby, allows a cell to adapt to a changing environment, perform specialized roles within an organism, and maintain basic metabolic processes necessary for survival. Therefore, it is hardly surprising that the activity of RNAP is long, complex, and highly regulated. In Escherichia coli bacteria, more than 100 transcription factors have been identified, which modify the activity of RNAP. [10]

RNAP can initiate transcription at specific DNA sequences known as promoters. It then produces an RNA chain, which is complementary to the template DNA strand. The process of adding nucleotides to the RNA strand is known as elongation in eukaryotes, RNAP can build chains as long as 2.4 million nucleotides (the full length of the dystrophin gene). RNAP will preferentially release its RNA transcript at specific DNA sequences encoded at the end of genes, which are known as terminators.

(mRNA)—template for the synthesis of proteins by ribosomes. or "RNA genes"—a broad class of genes that encode RNA that is not translated into protein. The most prominent examples of RNA genes are transfer RNA (tRNA) and ribosomal RNA (rRNA), both of which are involved in the process of translation. However, since the late 1990s, many new RNA genes have been found, and thus RNA genes may play a much more significant role than previously thought.
(tRNA)—transfers specific amino acids to growing polypeptide chains at the ribosomal site of protein synthesis during translation (rRNA)—a component of ribosomes —regulates gene activity
• Catalytic RNA (Ribozyme)—enzymatically active RNA molecules

RNAP accomplishes de novo synthesis. It is able to do this because specific interactions with the initiating nucleotide hold RNAP rigidly in place, facilitating chemical attack on the incoming nucleotide. Such specific interactions explain why RNAP prefers to start transcripts with ATP (followed by GTP, UTP, and then CTP). In contrast to DNA polymerase, RNAP includes helicase activity, therefore no separate enzyme is needed to unwind DNA.

### Initiation Edit

RNA polymerase binding in bacteria involves the sigma factor recognizing the core promoter region containing the −35 and −10 elements (located before the beginning of sequence to be transcribed) and also, at some promoters, the α subunit C-terminal domain recognizing promoter upstream elements. [11] There are multiple interchangeable sigma factors, each of which recognizes a distinct set of promoters. For example, in E. coli, σ 70 is expressed under normal conditions and recognizes promoters for genes required under normal conditions ("housekeeping genes"), while σ 32 recognizes promoters for genes required at high temperatures ("heat-shock genes"). In archaea and eukaryotes, the functions of the bacterial general transcription factor sigma are performed by multiple general transcription factors that work together. The RNA polymerase-promoter closed complex is usually referred to as the "transcription preinitiation complex." [12] [13]

After binding to the DNA, the RNA polymerase switches from a closed complex to an open complex. This change involves the separation of the DNA strands to form an unwound section of DNA of approximately 13 bp, referred to as the "transcription bubble". Supercoiling plays an important part in polymerase activity because of the unwinding and rewinding of DNA. Because regions of DNA in front of RNAP are unwound, there are compensatory positive supercoils. Regions behind RNAP are rewound and negative supercoils are present. [13]

### Promoter escape Edit

RNA polymerase then starts to synthesize the initial DNA-RNA heteroduplex, with ribonucleotides base-paired to the template DNA strand according to Watson-Crick base-pairing interactions. As noted above, RNA polymerase makes contacts with the promoter region. However these stabilizing contacts inhibit the enzyme's ability to access DNA further downstream and thus the synthesis of the full-length product. In order to continue RNA synthesis, RNA polymerase must escape the promoter. It must maintain promoter contacts while unwinding more downstream DNA for synthesis, "scrunching" more downstream DNA into the initiation complex. [14] During the promoter escape transition, RNA polymerase is considered a "stressed intermediate." Thermodynamically the stress accumulates from the DNA-unwinding and DNA-compaction activities. Once the DNA-RNA heteroduplex is long enough (

10 bp), RNA polymerase releases its upstream contacts and effectively achieves the promoter escape transition into the elongation phase. The heteroduplex at the active center stabilizes the elongation complex.

However, promoter escape is not the only outcome. RNA polymerase can also relieve the stress by releasing its downstream contacts, arresting transcription. The paused transcribing complex has two options: (1) release the nascent transcript and begin anew at the promoter or (2) reestablish a new 3'OH on the nascent transcript at the active site via RNA polymerase's catalytic activity and recommence DNA scrunching to achieve promoter escape. Abortive initiation, the unproductive cycling of RNA polymerase before the promoter escape transition, results in short RNA fragments of around 9 bp in a process known as abortive transcription. The extent of abortive initiation depends on the presence of transcription factors and the strength of the promoter contacts. [15]

### Elongation Edit

The 17-bp transcriptional complex has an 8-bp DNA-RNA hybrid, that is, 8 base-pairs involve the RNA transcript bound to the DNA template strand. [ citation needed ] As transcription progresses, ribonucleotides are added to the 3' end of the RNA transcript and the RNAP complex moves along the DNA. The characteristic elongation rates in prokaryotes and eukaryotes are about 10–100 nts/sec. [16]

Aspartyl (asp) residues in the RNAP will hold on to Mg 2+ ions, which will, in turn, coordinate the phosphates of the ribonucleotides. The first Mg 2+ will hold on to the α-phosphate of the NTP to be added. This allows the nucleophilic attack of the 3'OH from the RNA transcript, adding another NTP to the chain. The second Mg 2+ will hold on to the pyrophosphate of the NTP. [17] The overall reaction equation is:

#### Fidelity Edit

Unlike the proofreading mechanisms of DNA polymerase those of RNAP have only recently been investigated. Proofreading begins with separation of the mis-incorporated nucleotide from the DNA template. This pauses transcription. The polymerase then backtracks by one position and cleaves the dinucleotide that contains the mismatched nucleotide. In the RNA polymerase this occurs at the same active site used for polymerization and is therefore markedly different from the DNA polymerase where proofreading occurs at a distinct nuclease active site. [18]

The overall error rate is around 10 −4 to 10 −6 . [19]

### Termination Edit

In bacteria, termination of RNA transcription can be rho-dependent or rho-independent. The former relies on the rho factor, which destablizes the DNA-RNA heteroduplex and causes RNA release. [20] The latter, also known as intrinsic termination, relies on a palindromic region of DNA. Transcribing the region causes the formation of a "hairpin" structure from the RNA transcription looping and binding upon itself. This hairpin structure is often rich in G-C base-pairs, making it more stable than the DNA-RNA hybrid itself. As a result, the 8 bp DNA-RNA hybrid in the transcription complex shifts to a 4 bp hybrid. These last 4 base pairs are weak A-U base pairs, and the entire RNA transcript will fall off the DNA.

Transcription termination in eukaryotes is less well understood than in bacteria, but involves cleavage of the new transcript followed by template-independent addition of adenines at its new 3' end, in a process called polyadenylation. [21]

Given that DNA and RNA polymerases both carry out template-dependent nucleotide polymerization, it might be expected that the two types of enzymes would be structurally related. However, x-ray crystallographic studies of both types of enzymes reveal that, other than containing a critical Mg 2+ ion at the catalytic site, they are virtually unrelated to each other indeed template-dependent nucleotide polymerizing enzymes seem to have arisen independently twice during the early evolution of cells. One lineage led to the modern DNA polymerases and reverse transcriptases, as well as to a few single-subunit RNA polymerases (ssRNAP) from phages and organelles. [2] The other multi-subunit RNAP lineage formed all of the modern cellular RNA polymerases. [22] [1]

### Bacteria Edit

In bacteria, the same enzyme catalyzes the synthesis of mRNA and non-coding RNA (ncRNA).

RNAP is a large molecule. The core enzyme has five subunits (

• β': The β' subunit is the largest subunit, and is encoded by the rpoC gene. [24] The β' subunit contains part of the active center responsible for RNA synthesis and contains some of the determinants for non-sequence-specific interactions with DNA and nascent RNA. It is split into two subunits in Cyanobacteria and chloroplasts. [25]
• β: The β subunit is the second-largest subunit, and is encoded by the rpoB gene. The β subunit contains the rest of the active center responsible for RNA synthesis and contains the rest of the determinants for non-sequence-specific interactions with DNA and nascent RNA.
• α: The α subunit is the third-largest subunit and is present in two copies per molecule of RNAP, α I and α II (one and two). Each α subunit contains two domains: αNTD (N-Terminal domain) and αCTD (C-terminal domain). αNTD contains determinants for assembly of RNAP. αCTD (C-terminal domain) contains determinants for interaction with promoter DNA, making non-sequence-non-specific interactions at most promoters and sequence-specific interactions at upstream-element-containing promoters, and contains determinants for interactions with regulatory factors.
• ω: The ω subunit is the smallest subunit. The ω subunit facilitates assembly of RNAP and stabilizes assembled RNAP. [26]

In order to bind promoters, RNAP core associates with the transcription initiation factor sigma (σ) to form RNA polymerase holoenzyme. Sigma reduces the affinity of RNAP for nonspecific DNA while increasing specificity for promoters, allowing transcription to initiate at correct sites. The complete holoenzyme therefore has 6 subunits: β'βα I and α II ωσ (

### Eukaryotes Edit

Eukaryotes have multiple types of nuclear RNAP, each responsible for synthesis of a distinct subset of RNA. All are structurally and mechanistically related to each other and to bacterial RNAP:

synthesizes a pre-rRNA 45S (35S in yeast), which matures into 28S, 18S and 5.8S rRNAs, which will form the major RNA sections of the ribosome. [27] synthesizes precursors of mRNAs and most snRNA and microRNAs. [28] This is the most studied type, and, due to the high level of control required over transcription, a range of transcription factors are required for its binding to promoters. synthesizes tRNAs, rRNA 5S and other small RNAs found in the nucleus and cytosol. [29] synthesizes siRNA in plants. [30] synthesizes RNAs involved in siRNA-directed heterochromatin formation in plants. [31]

Eukaryotic chloroplasts contain an RNAP very highly similar to bacterial RNAP ("plastid-encoded polymerase, PEP"). They use sigma factors encoded in the nuclear genome. [32]

Chloroplast also contain a second, structurally and mechanistically unrelated, single-subunit RNAP ("nucleus-encoded polymerase, NEP"). Eukaryotic mitochondria use POLRMT (human), a nucleus-encoded single-subunit RNAP. [2] Such phage-like polymerases are referred to as RpoT in plants. [32]

### Archaea Edit

Archaea have a single type of RNAP, responsible for the synthesis of all RNA. Archaeal RNAP is structurally and mechanistically similar to bacterial RNAP and eukaryotic nuclear RNAP I-V, and is especially closely structurally and mechanistically related to eukaryotic nuclear RNAP II. [7] [33] The history of the discovery of the archaeal RNA polymerase is quite recent. The first analysis of the RNAP of an archaeon was performed in 1971, when the RNAP from the extreme halophile Halobacterium cutirubrum was isolated and purified. [34] Crystal structures of RNAPs from Sulfolobus solfataricus and Sulfolobus shibatae set the total number of identified archaeal subunits at thirteen. [7] [35]

Archaea has the subunit corresponding to Eukaryotic Rpb1 split into two. There is no homolog to eukaryotic Rpb9 (POLR2I) in the S. shibatae complex, although TFS (TFIIS homolog) has been proposed as one based on similarity. There is an additional subunit dubbed Rpo13 together with Rpo5 it occupies a space filled by an insertion found in bacterial β' subunits (1,377–1,420 in Taq). [7] An earlier, lower-resolution study on S. solfataricus structure did not find Rpo13 and only assigned the space to Rpo5/Rpb5. Rpo3 is notable in that it's an iron–sulfur protein. RNAP I/III subunit AC40 found in some eukaryotes share similar sequences, [35] but does not bind iron. [36] This domain, in either case, serves a structural function. [37]

Archaeal RNAP subunit previously used an "RpoX" nomenclature where each subunit is assigned a letter in a way unrelated to any other systems. [1] In 2009, a new nomenclature based on Eukaryotic Pol II subunit "Rpb" numbering was proposed. [7]

### Viruses Edit

Orthopoxviruses and some other nucleocytoplasmic large DNA viruses synthesize RNA using a virally encoded multi-subunit RNAP. They are most similar to eukaryotic RNAPs, with some subunits minified or removed. [38] Exactly which RNAP they are most similar to is a topic of debate. [39] Most other viruses that synthesize RNA use unrelated mechanics.

Many viruses use a single-subunit DNA-dependent RNAP (ssRNAP) that is structurally and mechanistically related to the single-subunit RNAP of eukaryotic chloroplasts (RpoT) and mitochondria (POLRMT) and, more distantly, to DNA polymerases and reverse transcriptases. Perhaps the most widely studied such single-subunit RNAP is bacteriophage T7 RNA polymerase. ssRNAPs cannot proofread. [2]

B. subtilis prophage SPβ uses YonO, a homolog of the β+β' subunits of msRNAPs to form a monomeric (both barrels on the same chain) RNAP distinct from the usual "right hand" ssRNAP. It probably diverged very long ago from the canonical five-unit msRNAP, before the time of the last universal common ancestor. [40] [41]

Other viruses use a RNA-dependent RNAP (an RNAP that employs RNA as a template instead of DNA). This occurs in negative strand RNA viruses and dsRNA viruses, both of which exist for a portion of their life cycle as double-stranded RNA. However, some positive strand RNA viruses, such as poliovirus, also contain RNA-dependent RNAP. [42]

RNAP was discovered independently by Charles Loe, Audrey Stevens, and Jerard Hurwitz in 1960. [43] By this time, one half of the 1959 Nobel Prize in Medicine had been awarded to Severo Ochoa for the discovery of what was believed to be RNAP, [44] but instead turned out to be polynucleotide phosphorylase.

## The Evolutionary Scheme of RNA Pol From Eubacteria to Eukaryotes

All RNA pols, from eubacteria to higher eukaryotes, share basic mechanistic functioning: use of a DNA template, processive translocation on the template during RNA synthesis, utilization of ribonucleoside triphosphate as substrates, Watson𠄼rick base pairing of the new added nucleotide with the complementary one in the template DNA, and formation of a new phosphodiester bound by a metal-dependent mechanism. To perform these basic functions, all RNA pols contain two largest subunits ( Figure 1 ) with double-ψβ-barrel motifs that create an active site at the interface of the subunits with three key aspartic residues conserved across all domains of life. Additionally, multisubunit RNA pols contain a variable number of additional smaller subunits ( Figure 1 ). The two largest catalytic subunits of RNA pols are thought to have evolved from the duplication and diversification of a gene that encoded a protein cofactor of a common ancestral ribozyme, which performed RNA polymerase activity in the primal RNA world (Iyer et al., 2003). At some point of evolution, the new protein heterodimer would have gained polymerase activity and acquired different subunits with specialized assembly and auxiliary functions. Thus, all multisubunit RNA pols share a common structural core and similar basic molecular mechanisms and must derive from the RNA pol of the last universal common ancestor (LUCA) of archaea, eubacteria, and eukaryotes, assumed to have existed 3.5𠄳.8 billion years ago (Burton, 2014). This ancestral multisubunit RNA pol was probably similar to the simple RNA pol found today in eubacteria, which is formed (see Figure 1 ) by two large β and β’ catalytic subunits, two assembly subunits (2α), and one auxiliary subunit (ω), as all these five subunits are highly conserved in the structure/function of all organisms (Werner, 2007 Werner and Grohmann, 2011).

Evolutionary history and subunit organization of nuclear eukaryotic RNA polymerases. (A) The last universal common ancestor (LUCA) of all organisms is assumed to have a multisubunit DNA-dependent RNA polymerase. Nowadays, all living beings have RNA pols with a core of five to seven subunits. After Eubacteria separation, the common ancestor of Archaea and Eukarya added additional peripheral subunits. Finally, after eukaryote emergence, the Archaea-derived nucleus started to develop specialized RNA polymerases. Specialized RNA pols I and III integrated some transcription factors as permanent subunits which, in RNA pol II, remain independent (TFIIS, TFIIF, TFIIE). RNA pol IV and V are not fully described. Only the branching after RNA pol I separation is indicated. See the main text for further descriptions. (B) The table shows a comparative scheme of the RNA pol subunits aligned according to sequence and/or functional homology. Colors correspond to the structural scheme of part (A). Note that the Rpb5 and 6 subunits are part of both the core and the five unit sets of common subunits to all three eukaryotic RNA pols. Archaeal Rpo13 has no equivalent in eukaryotes, and the TFS from Archaea is an independent homologous factor to eukaryotic TFIIS. See Werner and Grohmann (2011) Vannini and Cramer (2012), and Huang et al. (2015) for more details on RNA pol subunit structure and evolution.

RNA pol gained greater complexity in terms of acquiring new subunits following the split of the eubacterial and archaeal𠄾ukaryotic branches from the universal tree of life (Werner, 2007 Spang et al., 2015). Archaeal RNA pol has three or four catalytic polypeptides and three assembly and auxiliary subunits, which are closely related to bacterial ones ( Figure 1 ). However, archaeal RNA pol has gained five additional periphery subunits with no homologs in eubacteria but resembling eukaryotic subunits, which stabilize the interactions of polymerase with template DNA, newly synthesized RNA, and different transcription factors to ensure efficient functioning in the transcription cycle (Werner, 2007 Werner and Grohmann, 2011 Fouqueau et al., 2017). The more complex transcription machineries of archaea and eukaryotes are linked with the fact that their genomes, which differ from the eubacterial genome, are stabilized and compacted by histone or histone-like proteins that impose more restrictive access to DNA and the need for additional basal transcription factors (Reeve, 2003 Geiduschek and Ouhammouch, 2005 Kwapisz et al., 2008 Jun et al., 2011 Werner and Grohmann, 2011 Koster et al., 2015).

Archaeal and eukaryotic lineages diverged more than 2 billion years ago, with eukaryotes originating from an archaeal linage with already diverse eukaryotic signature proteins (Spang et al., 2015). Other important differences include that eukaryotes have an extended system of intracellular membranes that compartmentalizes the intracellular space, and the cellular volume is three to four orders of magnitude larger than that of archaea and bacteria (Lane and Martin, 2010 Koonin, 2015). They also contain organelles (mitochondria and chloroplasts) that derive from two kinds of eubacteria and have their own RNA pol (De Duve, 2007). The most prominent difference for nuclear transcription that arises with eukaryotes is diversification into three different nuclear RNA pols with specialized functions: RNA pol I is responsible for the synthesis of a single transcript, namely, precursor ribosomal RNA, which is processed into 28S, 5.8S, and 18S rRNAs RNA pol II synthesizes a wide diversity of transcripts, including protein-coding messenger RNA (mRNA) and many ncRNAs, such as microRNAs (mi), small nuclear (sn), and small nucleolar (sno) RNAs RNA pol III synthesizes diverse transfer RNA (tRNA) and 5S rRNA, and also U6 small nuclear RNA and other non-coding small RNAs (Dieci et al., 2007). There are two additional nuclear RNA pols in plants (IV and V), involved in the transcription of ncRNAs that are required for transcriptional gene silencing via the RNA-directed DNA methylation (Zhou and Law, 2015). In this review, we will focus on the structure and function of RNA pols I, II, and III.

The most well-studied eukaryotic RNA pols are those of the budding yeast Saccharomyces cerevisiae, and it is thought that they are good models for other eukaryotic RNA pols. For this reason, we use the names of yeast RNA pols genes and subunits throughout this review ( Figure 1 ). Yeast RNA pols I, II, and III have a structurally conserved horseshoe-shaped core formed by 10 subunits ( Figure 1 ) homologous to archaeal RNA pol subunits and a different number of additional periphery eukaryote specific subunits (Darst, 2001 Werner, 2007 Cramer et al., 2008). The 10 subunit cores include the two largest catalytic subunits (the two upper rows in Figure 1B ), five additional subunits (Rpb5, 6, 8, 10, and 12) common to the three nuclear RNA pol, the A12/Rpb9/C11 subunit involved in proofreading (see below) and the AC40� heterodimer, shared between RNA pols I and III and homologous to Rpb3–Rpb11 in RNA pol II (Fernández-Tornero et al., 2013). The additional periphery yeast RNA pol subunits are mostly essential for cell viability but are not strictly required for RNA polymerization. Instead, they increase the regulatory potential and allow the specialization of each RNA pol in the transcription of a non-redundant subset of genes (Werner, 2007 Cramer et al., 2008 Koster et al., 2015). RNA pol II has a dissociable dimer (Rpb4/7) that plays important roles during the multifaceted transcription elongation of this RNA pol. This dimer has a homology with the Rpo4/7 dimer of archaeal RNA pol and has a counterpart (with low homology) in the A14/A43 and C17/C25 dimers of RNA pols I and III, respectively ( Figure 1 ). RNA pol I has a further dimer (A49/A34) that has an equivalent in RNA pol III (C37/C53) but is not a constitutive part of RNA pol II where its function is conducted by the independent TFIIF factor (α/β dimer Vannini and Cramer, 2012). This dimer plays a specific role in the particular mode of initiation of all three RNA pols (Abascal-Palacios et al., 2018) and in RNA pol III termination (Hoffmann et al., 2015 Arimbasseri and Maraia, 2016) that very much differs from the other two RNA pols in this stage (Proshkina et al., 2006 Werner and Grohmann, 2011). RNA pol III has an additional and totally specific trimer (C31/C34/C82) that is homologous to RNA pol II TFIIE and is proposed to be involved in the mechanism of RNA pol III initiation (Hoffmann et al., 2015). This trimer has been proposed to be TFIIF–TFIIE hybrid rather than simply a TFIIE-like subcomplex (Abascal-Palacios et al., 2018).

The coexistence of the conserved, but different, largest core subunits of the three RNA pols (A190/A135, Rpb1/Rpb2, and C160/C128 in RNA pols I, II, and III, respectively) in all eukaryotes is remarkable and suggests their early evolutionary divergence. At the same time, the substantial conservation of the central RNA pol core since LUCA indicates that it performs essential processes required for gene expression that allows very little innovation. Therefore, in order to generate complex eukaryotes, most evolutionary innovation is expected to occur in periphery subunits, especially in RNA pol II, which specifies the cellular proteome that confers unique characteristics to different cell types through mRNA synthesis. Additionally, the unique C-terminal domain (CTD) of the largest catalytic subunit (Rpb1) of RNA pol II is also one source for innovation in mRNA transcription regulation and a mark of the eukaryotic lineage (Burton, 2014). CTD consists of a repeating structure that is rich in serine and other phosphorylable amino acids, which increases in number of repetitions with greater evolutionary complexity. Another consequence of eukaryotic innovation is the complex structure of RNA pol III with 17 subunits, which are all conserved to a certain degree in eukaryotes from yeast to humans. This supports the notion of the early divergence of RNA pol III from RNA pols I and II (Proshkina et al., 2006 Figure 1 ). Of all these considerations, it can be suggested that the last eukaryote common ancestor is likely to have already had distinct RNA pols I, II, and III, as well as the repetitive structure at the CTD of RNA pol II (Proshkina et al., 2006 Yang and Stiller, 2014). It can be concluded that the existence and evolution of the three specialized RNA pols in eukaryotic cells would have allowed the division of labor and enabled intricate gene regulation in multicellular complex organisms that requires the cell cycle, tissue-specific, environmental, and developmental regulation of gene expression (Dieci et al., 2007 Cramer, 2019). RNA pols IV and V are thought to have evolved more recently from RNA pol II through subfunctionalization of silencing activities performed by RNA pol II in fungi and metazoans in the earliest land plants (Huang et al., 2015).

The genetic material is stored in the form of DNA in most organisms. In humans, the nucleus of each cell contains 3 × 10 9 base pairs of DNA distributed over 23 pairs of chromosomes, and each cell has two copies of the genetic material. This is known collectively as the human genome. The human genome contains around 30 000 genes, each of which codes for one protein.

Large stretches of DNA in the human genome are transcribed but do not code for proteins. These regions are called introns and make up around 95% of the genome. The nucleotide sequence of the human genome is now known to a reasonable degree of accuracy but we do not yet understand why so much of it is non-coding. Some of this non-coding DNA controls gene expression but the purpose of much of it is not yet understood. This is a fascinating subject that is certain to advance rapidly over the next few years.

The Central Dogma of Molecular Biology states that DNA makes RNA makes proteins (Figure 1).

Figure 1 | The Central Dogma of Molecular Biology: DNA makes RNA makes proteins

The process by which DNA is copied to RNA is called transcription, and that by which RNA is used to produce proteins is called translation.

### DNA replication

Each time a cell divides, each of its double strands of DNA splits into two single strands. Each of these single strands acts as a template for a new strand of complementary DNA. As a result, each new cell has its own complete genome. This process is known as DNA replication. Replication is controlled by the Watson-Crick pairing of the bases in the template strand with incoming deoxynucleoside triphosphates, and is directed by DNA polymerase enzymes. It is a complex process, particularly in eukaryotes, involving an array of enzymes. A simplified version of bacterial DNA replication is described in Figure 2.

Figure 2 | DNA replication in bacteria Simplified representation of DNA replication in bacteria.

DNA biosynthesis proceeds in the 5&prime- to 3&prime-direction. This makes it impossible for DNA polymerases to synthesize both strands simultaneously. A portion of the double helix must first unwind, and this is mediated by helicase enzymes.

The leading strand is synthesized continuously but the opposite strand is copied in short bursts of about 1000 bases, as the lagging strand template becomes available. The resulting short strands are called Okazaki fragments (after their discoverers, Reiji and Tsuneko Okazaki). Bacteria have at least three distinct DNA polymerases: Pol I, Pol II and Pol III it is Pol III that is largely involved in chain elongation. Strangely, DNA polymerases cannot initiate DNA synthesis de novo, but require a short primer with a free 3&prime-hydroxyl group. This is produced in the lagging strand by an RNA polymerase (called DNA primase) that is able to use the DNA template and synthesize a short piece of RNA around 20 bases in length. Pol III can then take over, but it eventually encounters one of the previously synthesized short RNA fragments in its path. At this point Pol I takes over, using its 5&prime- to 3&prime-exonuclease activity to digest the RNA and fill the gap with DNA until it reaches a continuous stretch of DNA. This leaves a gap between the 3&prime-end of the newly synthesized DNA and the 5&prime-end of the DNA previously synthesized by Pol III. The gap is filled by DNA ligase, an enzyme that makes a covalent bond between a 5&prime-phosphate and a 3&prime-hydroxyl group (Figure 3). The initiation of DNA replication at the leading strand is more complex and is discussed in detail in more specialized texts.

Figure 3 | DNA polymerases in DNA replication Simplified representation of the action of DNA polymerases in DNA replication in bacteria.

#### Mistakes in DNA replication

DNA replication is not perfect. Errors occur in DNA replication, when the incorrect base is incorporated into the growing DNA strand. This leads to mismatched base pairs, or mispairs. DNA polymerases have proofreading activity, and a DNA repair enzymes have evolved to correct these mistakes. Occasionally, mispairs survive and are incorporated into the genome in the next round of replication. These mutations may have no consequence, they may result in the death of the organism, they may result in a genetic disease or cancer or they may give the organism a competitive advantage over its neighbours, which leads to evolution by natural selection.

### Transcription

Transcription is the process by which DNA is copied (transcribed) to mRNA, which carries the information needed for protein synthesis. Transcription takes place in two broad steps. First, pre-messenger RNA is formed, with the involvement of RNA polymerase enzymes. The process relies on Watson-Crick base pairing, and the resultant single strand of RNA is the reverse-complement of the original DNA sequence. The pre-messenger RNA is then "edited" to produce the desired mRNA molecule in a process called RNA splicing.

#### Formation of pre-messenger RNA

The mechanism of transcription has parallels in that of DNA replication. As with DNA replication, partial unwinding of the double helix must occur before transcription can take place, and it is the RNA polymerase enzymes that catalyze this process.

Unlike DNA replication, in which both strands are copied, only one strand is transcribed. The strand that contains the gene is called the sense strand, while the complementary strand is the antisense strand. The mRNA produced in transcription is a copy of the sense strand, but it is the antisense strand that is transcribed.

Ribonucleoside triphosphates (NTPs) align along the antisense DNA strand, with Watson-Crick base pairing (A pairs with U). RNA polymerase joins the ribonucleotides together to form a pre-messenger RNA molecule that is complementary to a region of the antisense DNA strand. Transcription ends when the RNA polymerase enzyme reaches a triplet of bases that is read as a "stop" signal. The DNA molecule re-winds to re-form the double helix.

Figure 4 | Transcription Simplified representation of the formation of pre-messenger RNA (orange) from double-stranded DNA (blue) in transcription.

#### RNA splicing

The pre-messenger RNA thus formed contains introns which are not required for protein synthesis. The pre-messenger RNA is chopped up to remove the introns and create messenger RNA (mRNA) in a process called RNA splicing (Figure 5).

Figure 5 | RNA splicing Introns are spliced from the pre-messenger RNA to give messenger RNA (mRNA).

#### Alternative splicing

In alternative splicing, individual exons are either spliced or included, giving rise to several different possible mRNA products. Each mRNA product codes for a different protein isoform these protein isoforms differ in their peptide sequence and therefore their biological activity. It is estimated that up to 60% of human gene products undergo alternative splicing. Several different mechanisms of alternative splicing are known, two of which are illustrated in Figure 6.

Figure 6 | Alternative splicing Several different mechanisms of alternative splicing exist − a cassette exon can be either included in or excluded from the final RNA (top), or two cassette exons may be mutually exclusive (bottom).

Alternative splicing contributes to protein diversity − a single gene transcript (RNA) can have thousands of different splicing patterns, and will therefore code for thousands of different proteins: a diverse proteome is generated from a relatively limited genome. Splicing is important in genetic regulation (alteration of the splicing pattern in response to cellular conditions changes protein expression). Perhaps not surprisingly, abnormal splicing patterns can lead to disease states including cancer.

#### Reverse transcription

In reverse transcription, RNA is "reverse transcribed" into DNA. This process, catalyzed by reverse transcriptase enzymes, allows retroviruses, including the human immunodeficiency virus (HIV), to use RNA as their genetic material. Reverse transcriptase enzymes have also found applications in biotechnology, allowing scientists to convert RNA to DNA for techniques such as PCR.

### Translation

The mRNA formed in transcription is transported out of the nucleus, into the cytoplasm, to the ribosome (the cell's protein synthesis factory). Here, it directs protein synthesis. Messenger RNA is not directly involved in protein synthesis − transfer RNA (tRNA) is required for this. The process by which mRNA directs protein synthesis with the assistance of tRNA is called translation.

The ribosome is a very large complex of RNA and protein molecules. Each three-base stretch of mRNA (triplet) is known as a codon, and one codon contains the information for a specific amino acid. As the mRNA passes through the ribosome, each codon interacts with the anticodon of a specific transfer RNA (tRNA) molecule by Watson-Crick base pairing. This tRNA molecule carries an amino acid at its 3&prime-terminus, which is incorporated into the growing protein chain. The tRNA is then expelled from the ribosome. Figure 7 shows the steps involved in protein synthesis.

Figure 7 | Translation (a) and (b) tRNA molecules bind to the two binding sites of the ribosome, and by hydrogen bonding to the mRNA (c) a peptide bond forms between the two amino acids to make a dipeptide, while the tRNA molecule is left uncharged (d) the uncharged tRNA molecule leaves the ribosome, while the ribosome moves one codon to the right (the dipeptide is translocated from one binding site to the other) (e) another tRNA molecule binds (f) a peptide bond forms between the two amino acids to make a tripeptide (g) the uncharged tRNA molecule leaves the ribosome.

### Transfer RNA

Transfer RNA adopts a well defined tertiary structure which is normally represented in two dimensions as a cloverleaf shape, as in Figure 7. The structure of tRNA is shown in more detail in Figure 8.

Figure 8 | Two-dimensional structures of tRNA (transfer RNA) In some tRNAs the DHU arm has only three base pairs.

Each amino acid has its own special tRNA (or set of tRNAs). For example, the tRNA for phenylalanine (tRNAPhe) is different from that for histidine (tRNAHis). Each amino acid is attached to its tRNA through the 3&prime-OH group to form an ester which reacts with the α-amino group of the terminal amino-acid of the growing protein chain to form a new amide bond (peptide bond) during protein synthesis (Figure 9). The reaction of esters with amines is generally favourable but the rate of reaction is increased greatly in the ribosome.

Figure 9 | Protein synthesis Reaction of the growing polypeptide chain with the 3&prime-end of the charged tRNA. The amino acid is transferred from the tRNA molecule to the protein.

Each transfer RNA molecule has a well defined tertiary structure that is recognized by the enzyme aminoacyl tRNA synthetase, which adds the correct amino acid to the 3&prime-end of the uncharged tRNA. The presence of modified nucleosides is important in stabilizing the tRNA structure. Some of these modifications are shown in Figure 10.

Figure 10 | Modified bases in tRNA Structures of some of the modified bases found in tRNA.

### The Genetic code

The genetic code is almost universal. It is the basis of the transmission of hereditary information by nucleic acids in all organisms. There are four bases in RNA (A,G,C and U), so there are 64 possible triplet codes (4 3 = 64). In theory only 22 codes are required: one for each of the 20 naturally occurring amino acids, with the addition of a start codon and a stop codon (to indicate the beginning and end of a protein sequence). Many amino acids have several codes (degeneracy), so that all 64 possible triplet codes are used. For example Arg and Ser each have 6 codons whereas Trp and Met have only one. No two amino acids have the same code but amino acids whose side-chains have similar physical or chemical properties tend to have similar codon sequences, e.g. the side-chains of Phe, Leu, Ile, Val are all hydrophobic, and Asp and Glu are both carboxylic acids (see Figure 11). This means that if the incorrect tRNA is selected during translation (owing to mispairing of a single base at the codon-anticodon interface) the misincorporated amino acid will probably have similar properties to the intended tRNA molecule. Although the resultant protein will have one incorrect amino acid it stands a high probability of being functional. Organisms show "codon bias" and use certain codons for a particular amino acid more than others. For example, the codon usage in humans is different from that in bacteria it can sometimes be difficult to express a human protein in bacteria because the relevant tRNA might be present at too low a concentration.

Figure 11 | The Genetic code − triplet codon assignments for the 20 amino acids. As well as coding for methionine, AUG is used as a start codon, initiating protein biosynthesis

#### An exercise in the use of the genetic code

One strand of genomic DNA (strand A, coding strand) contains the following sequence reading from 5&prime- to 3&prime-:

This strand will form the following duplex:

The sequence of bases in the other strand of DNA (strand B) written 5&prime- to 3&prime- is therefore

The sequence of bases in the mRNA transcribed from strand A of DNA written 5&prime- to 3&prime- is

The amino acid sequence coded by the above mRNA is

However, if DNA strand B is the coding strand the mRNA sequence will be:

and the amino-acid sequence will be:

### The Wobble hypothesis

Close inspection of all of the available codons for a particular amino acid reveals that the variation is greatest in the third position (for example, the codons for alanine are GCU, GCC, GCA and GCG). Crick and Brenner proposed that a single tRNA molecule can recognize codons with different bases at the 3&prime-end owing to non-Watson-Crick base pair formation with the third base in the codon-anticodon interaction. These non-standard base pairs are different in shape from A·U and G·C and the term wobble hypothesis indicates that a certain degree of flexibility or "wobbling" is allowed at this position in the ribosome. Not all combinations are possible examples of "allowed" pairings are shown in Figure 12.

Figure 12 | Structures of wobble base pairs found in RNA

The ability of DNA bases to form wobble base pairs as well as Watson-Crick base pairs can result in base-pair mismatches occurring during DNA replication. If not repaired by DNA repair enzymes, these mismatches can lead to genetic diseases and cancer.

## 3 Answers 3

There is no fundamental difference between viral RNA and and native cellular RNA other than the sequence of RNA bases in them. The sequence differences are not biochemically apparent in the RNA, only in the protein products produced from the RNAs.

Regulation of native RNA is done in a huge variety of ways (nothing in the cell is designed everything is ad hoc, so any random happenstance can become established). Native regulation is essential, otherwise runaway production will happen leading to cell death or to cancer.

Much, if not most, of the native regulation is done by regulating transcription of DNA into RNA. Once the RNA is in the cytosol, little further regulation is done, so viral RNA in the cytosol has the run of the cellular protein factory. But at least some viruses have acquired further advantages which help their RNA out-compete the native mRNAs for use of the protein machinery (see, e.g., Hijacking the translation apparatus by RNA viruses)

So it is not really an easy thing to "label a sequence of RNA as dangerous simply because it is unregulated" as the wide range of normal RNAs in the cytosol does not provide a basis for determining if any are "unregulated" and the cell cannot be protected in that manner.

Self-assembly is widespread in normal cellular functions as well as in viral production. The molecules involved have long-honed attractive points for their mate molecules, and in the jiggling molecular soup of the cell the molecules can soon bump into each other and join up. Thanks to the virus's uncontrolled production, new viral particles have plenty of component molecules available for this assembly.

(If you wonder I'm posting two answers, it's because this q-post asked two questions in one.)

Regarding the issue of virus self-assembly. studying it has indeed been a challenging issue. There's a recent (2019) PNAS paper on that:

Despite decades of study, how capsids self-assemble has remained a mystery, because there were no methods to measure the assembly kinetics of individual capsids. We surmount this obstacle using a sensitive microscopy technique based on laser interferometry. The measurements show that a small nucleus of proteins must form on the viral RNA before the capsid assembles.

Overview of the measurement. (A) A structural model of the MS2 capsid (PDB ID: 2ms2) shows its small size and T=3 structure. The 2 coat-protein dimer configurations are shown in gray and purple. (B) We inject a solution of unassembled dimers over a coverslip on which MS2 RNA strands are tethered by DNA linkages. As dimers bind to the RNA, the resulting particles scatter light. The particles appear as dark, diffraction-limited spots because of destructive interference between the scattered light and a reference beam. (C) We monitor many such spots in parallel. Shown is a typical image of the field of view, taken 126 s after adding 2-μM dimers and representing an average of 1,000 frames taken at 1,000 frames per second. (D) The intensity of each spot is proportional to the number of bound proteins within each particle and changes in the intensity as a function of time reveal the assembly kinetics of each particle. The darker the spot is, the larger its intensity. (D, Top) Time series of images for the boxed spot in C. (D, Bottom) Intensity trace for the same spot using a 1,000-frame average. We discuss the relationship between intensity and number of bound proteins, as well as how we calculate the spot intensity, in Materials and Methods and SI Appendix.

And a bit more on the observed dynamics:

We find that the “growth time,” the time required for a particle to reach the size of a complete capsid after it nucleates, varies from particle to particle, ranging from 30 to over 200 s (Fig. 2A and SI Appendix, Fig. S1). Of the particles that grow into a complete capsid, some grow at a constant rate, others slow as they approach completion, and still others contain intermediate pauses lasting up to 25 s (Fig. 2A and SI Appendix, Fig. S1). Despite these differences, essentially all of the traces are monotonic, with little or no observable disassembly steps.

So yeah, quite interestingly, the virus capsid just build and builds itself even if it takes "lunch break" it's not a problem reversing the build progress in a significant manner.

The failure mode(s) of this self-assembly process are actually related to building/attracting "too much stuff", or two (or more) viral RNAs getting entangled by common protein shards they both (all) attract. Basically too much density of building materials can lead to problems.

The growth times also decrease with increasing protein concentration, but less rapidly than do the nucleation times (Fig. 4A) [below]. Thus, we conclude that the RNA, which promotes nucleation at low protein concentration, creates a competition between nucleation and growth that can lead to overgrown structures at higher concentration, as sketched in Fig. 4B. This pathway provides a possible explanation for the “monster” (5) and “multiplet” (19) structures observed with other RNA viruses.

I mostly agree with mgkrebbs (+1), but note that while viruses generally have some kind of RNA replicase subunit (RdRP) to help them out, viroids completely lack those, yet manage to reproduce:

While RNA viruses encode subunits of the enzymatic complex (RNA replicase) that catalyzes initiation and elongation of viral RNA strands, viroids must rely for this replication step on pre-existing host RNA polymerases. In principle, the best candidates would be RNA-dependent RNA polymerases, whose existence in plants has been known for a long time. However, viroids do not use these enzymes for their replication, but DNA-dependent RNA polymerases redirected to accept RNA templates.

The closest thing to a viroid in humans is Hepatitis D. Until last year, it was thought that it was mostly a human disease, but a 2019 survey found that HepD analogs are quite widespread in invertebrates, fish, snakes etc.

Viruses are "much more nasty" than viroids in that replication regard because

Positive-strand RNA viruses replicate by using virally encoded RNA-dependent RNA polymerases (RdRPs) that provide a direct RNA-to-RNA replication function not found in host cells.

Some recent research suggests that RdRPs have been independently lost in many animal clades. Ironically, the roles of these (former) host RdRPs was [most likely] to provide an anti-viral mechanism via RNA silencing (see [section] below for details), but evolution has provided with alternative host mechanisms for producing the same.

While it's possible to recognize the [core of the] viral RdRPs in software, it's not clear/known how to translate this into a general anti-viral strategy, as far as I know, due to their fairly substantial variation. (In general it's not enough to know the genetic sequence encoding the RdRP of some virus to even screen inhibitor drugs for it additionally one must know the crystal structure of the RdRP complex for that virus.)

If you're curious about "intra-cell" (i.e. RNA-based) anti-viral defense. there exist mechanisms like that, but are substantially more complicated (that what you suggest), being based instead on RNA silencing:

In eukaryotic RNA-based antiviral immunity, viral double-stranded RNA is recognized as a pathogen-associated molecular pattern and processed into small interfering RNAs (siRNAs) by the host ribonuclease Dicer. After amplification by host RNA-dependent RNA polymerases in some cases, these virus-derived siRNAs guide specific antiviral immunity through RNA interference and related RNA silencing effector mechanisms.

Key steps in RNA-based antiviral immunity induced in Drosophila melanogaster by infection of positive-strand RNA viruses such as flock house virus. Following entry and uncoating of flock house virus (FHV) virions, the genomic positive-strand RNA ((+)RNA) serves as both mRNA for the translation of viral RNA-dependent RNA polymerase (RdRP) and as a template for the synthesis of antigenomic negative-strand RNA ((–)RNA). Preferential production of (+)RNA by viral RdRP is achieved by multiple rounds of initiation of RNA synthesis from the 3ʹ end of the low abundant (–)RNA. The resulting double-stranded RNA (dsRNA) formed between the 5ʹ-terminal nascent progeny (+)RNA and the (–)RNA template is recognized by Dicer 2 (DCR2) and cleaved into small interfering RNAs (siRNAs), thereby triggering RNA-based antiviral immunity. The viral siRNAs are assembled with Argonaute 2 (AGO2) into the RNA-induced silencing complex (RISC), methylated at the 3ʹ end (depicted by a black circle) by HEN1 and used to guide specific clearance of FHV RNAs. As a counter-defence, FHV encodes a viral suppressor of RNA silencing, the B2 protein, which targets two steps in this immune pathway: inhibition of viral siRNA production by binding to viral RdRP and the viral dsRNA precursor, and sequestration of viral siRNAs by binding duplex siRNAs. Loqs-PD, loquacious-isoform PD.

And yes, the RdRP of viruses (the most significant part of the self-copier machine they come with) is an important drug target, e.g.

CoVs employ a multi-subunit replication/transcription machinery. A set of non-structural proteins (nsp) produced as cleavage products of the ORF1a and ORF1ab viral polyproteins (5) assemble to facilitate viral replication and transcription. A key component, the RNA-dependent RNA polymerase (RdRp, also known as nsp12), catalyzes the synthesis of viral RNA and thus plays a central role in the replication and transcription cycle of COVID-19 virus, possibly with the assistance of nsp7 and nsp8 as co-factors (6). Nsp12 is therefore considered a primary target for nucleotide analog antiviral inhibitors such as remdesivir, which shows potential for the treatment of COVID-19 viral infections (7, 8). [. ]

The efficacy of chain-terminating nucleotide analogs requires viral RdRps to recognize and successfully incorporate the active form of the inhibitors into the growing RNA strand.

## New Discovery Shows Human Cells Can Write RNA Sequences Into DNA – Challenges Central Principle in Biology

Cells contain machinery that duplicates DNA into a new set that goes into a newly formed cell. That same class of machines, called polymerases, also build RNA messages, which are like notes copied from the central DNA repository of recipes, so they can be read more efficiently into proteins. But polymerases were thought to only work in one direction DNA into DNA or RNA. This prevents RNA messages from being rewritten back into the master recipe book of genomic DNA. Now, Thomas Jefferson University researchers provide the first evidence that RNA segments can be written back into DNA, which potentially challenges the central dogma in biology and could have wide implications affecting many fields of biology.

“This work opens the door to many other studies that will help us understand the significance of having a mechanism for converting RNA messages into DNA in our own cells,” says Richard Pomerantz, PhD, associate professor of biochemistry and molecular biology at Thomas Jefferson University. “The reality that a human polymerase can do this with high efficiency, raises many questions.” For example, this finding suggests that RNA messages can be used as templates for repairing or re-writing genomic DNA.

The work was published June 11th, 2021, in the journal Science Advances.

Together with first author Gurushankar Chandramouly and other collaborators, Dr. Pomerantz’s team started by investigating one very unusual polymerase, called polymerase theta. Of the 14 DNA polymerases in mammalian cells, only three do the bulk of the work of duplicating the entire genome to prepare for cell division. The remaining 11 are mostly involved in detecting and making repairs when there’s a break or error in the DNA strands. Polymerase theta repairs DNA, but is very error-prone and makes many errors or mutations. The researchers therefore noticed that some of polymerase theta’s “bad” qualities were ones it shared with another cellular machine, albeit one more common in viruses — the reverse transcriptase. Like Pol theta, HIV reverse transcriptase acts as a DNA polymerase, but can also bind RNA and read RNA back into a DNA strand.

In a series of elegant experiments, the researchers tested polymerase theta against the reverse transcriptase from HIV, which is one of the best studied of its kind. They showed that polymerase theta was capable of converting RNA messages into DNA, which it did as well as HIV reverse transcriptase, and that it actually did a better job than when duplicating DNA to DNA. Polymerase theta was more efficient and introduced fewer errors when using an RNA template to write new DNA messages, than when duplicating DNA into DNA, suggesting that this function could be its primary purpose in the cell.

The group collaborated with Dr. Xiaojiang S. Chen’s lab at USC and used x-ray crystallography to define the structure and found that this molecule was able to change shape in order to accommodate the more bulky RNA molecule — a feat unique among polymerases.

“Our research suggests that polymerase theta’s main function is to act as a reverse transcriptase,” says Dr. Pomerantz. “In healthy cells, the purpose of this molecule may be toward RNA-mediated DNA repair. In unhealthy cells, such as cancer cells, polymerase theta is highly expressed and promotes cancer cell growth and drug resistance. It will be exciting to further understand how polymerase theta’s activity on RNA contributes to DNA repair and cancer-cell proliferation.”

Reference: “Polθ reverse transcribes RNA and promotes RNA-templated DNA repair” by Gurushankar Chandramouly, Jiemin Zhao, Shane McDevitt, Timur Rusanov, Trung Hoang, Nikita Borisonnik, Taylor Treddinick, Felicia Wednesday Lopezcolorado, Tatiana Kent, Labiba A. Siddique, Joseph Mallon, Jacklyn Huhn, Zainab Shoda, Ekaterina Kashkina, Alessandra Brambati, Jeremy M. Stark, Xiaojiang S. Chen and Richard T. Pomerantz, 11 June 2021, Science Advances.

This research was supported by NIH grants 1R01GM130889-01 and 1R01GM137124-01, and R01CA197506 and R01CA240392. This research was also supported in part by a Tower Cancer Research Foundation grant. The authors report no conflicts of interest.

## References

Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).

Weiss, S. B. & Gladstone, L. A mammalian system for the incorporation of cytidine triphosphate into ribonucleic acid. J. Am. Chem. Soc. 81, 4118–4119 (1959).

Weiss, S. B. Enzymatic incorporation of ribonucleoside triphosphates into the interpolynucleotide linkages of ribonucleic acid. Proc. Natl Acad. Sci. USA 46, 1020–1030 (1960).

Hurwitz, J. The discovery of RNA polymerase. J. Biol. Chem. 280, 42477–42485 (2005).

Burgess, R. R., Travers, A. A., Dunn, J. J. & Bautz, E. K. Factor stimulating transcription by RNA polymerase. Nature 221, 43–46 (1969).

Roeder, R. G. & Rutter, W. J. Multiple forms of DNA-dependent RNA polymerase in eukaryotic organisms. Nature 224, 234–237 (1969).

Widnell, C. C. & Tata, J. R. Evidence for two DNA-dependent RNA polymerase activities in isolated rat liver nuclei. Biochim. Biophys. Acta 87, 531–533 (1964).

Roeder, R.G. Multiple RNA Polymerases and RNA Synthesis in Eukaryotic Systems. Ph.D. Thesis. University of Washington (1969).

Roeder, R. G. & Rutter, W. J. DNA dependent RNA polymerase in sea urchin development. Fed. Proc. 28, 599 (1969).

Kedinger, C., Gniazdowski, M., Mandel, J. L. Jr., Gissinger, F. & Chambon, P. Alpha-amanitin: a specific inhibitor of one of two DNA-dependent RNA polymerase activities from calf thymus. Biochem. Biophys. Res. Commun. 38, 165–171 (1970).

Lindell, T. J., Weinberg, F., Morris, P. W., Roeder, R. G. & Rutter, W. J. Specific inhibition of nuclear RNA polymerase II by alpha-amanitin. Science 170, 447–449 (1970).

Blatti, S. P. et al. Structure and Regulatory Properties of Eucaryotic RNA Polymerase. Cold Spring Harb. Symp. Quant. Biol. 35, 649–657 (1970).

Chambon, P. et al. Purification and Properties of Calf Thymus DNA-Dependent RNA Polymerases A and B. Cold Spring Harb. Symp. Quant. Biol. 35, 693–707 (1970).

Roeder, R. G. & Rutter, W. J. Specific nucleolar and nucleoplasmic RNA polymerases. Proc. Natl Acad. Sci. USA 65, 675–682 (1970).

Roeder, R. G., Reeder, R. H. & Brown, D. D. Multiple forms of RNA polymerase in Xenopus laevis: Their relationship to RNA synthesis in vivo and their fidelity of transcription in vitro. Cold Spring Harb. Symp. Quant. Biol. 35, 727–735 (1970).

Sklar, V. E., Schwartz, L. B. & Roeder, R. G. Distinct molecular structures of nuclear class I, II, and III DNA-dependent RNA polymerases. Proc. Natl Acad. Sci. USA 72, 348–352 (1975).

Weinmann, R., Raskas, H. J. & Roeder, R. G. Role of DNA-dependent RNA polymerases II and III in transcription of the adenovirus genome late in productive infection. Proc. Natl Acad. Sci. USA 71, 3426–3439 (1974).

Weinmann, R. & Roeder, R. G. Role of DNA-dependent RNA polymerase 3 in the transcription of the tRNA and 5S RNA genes. Proc. Natl Acad. Sci. USA 71, 1790–1794 (1974).

Sentenac, A. et al. Yeast RNA polymerase subunits and genes, 27–54 (Cold Spring Harbor Press, Cold Spring Harbor Laboratory, N.Y., 1992).

Young, R. A. RNA polymerase II. Annu. Rev. Biochem. 60, 689–715 (1991).

Vannini, A. & Cramer, P. Conservation between the RNA polymerase I, II, and III transcription initiation machineries. Mol. Cell 45, 439–446 (2012).

Cramer, P. et al. Architecture of RNA polymerase II and implications for the transcription mechanism. Science 288, 640–649 (2000).

Cramer, P., Bushnell, D. A. & Kornberg, R. D. Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863–1876 (2001).

Kuhn, C. D. et al. Functional architecture of RNA polymerase I. Cell 131, 1260–1272 (2007).

Pilsl, M. et al. Structure of the initiation-competent RNA polymerase I and its implication for transcription. Nat. Commun. 7, 12126 (2016).

Jasiak, A. J., Armache, K. J., Martens, B., Jansen, R. P. & Cramer, P. Structural biology of RNA polymerase III: subcomplex C17/25 X-ray structure and 11 subunit enzyme model. Mol. Cell 23, 71–81 (2006).

Zhang, G. et al. Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 A resolution. Cell 98, 811–824 (1999).

Parker, C.S., Ng, S.Y. & Roeder, R.G. Selective transcription of the 5S RNA genes in isolated chromatin by RNA polymerase III. in Molecular Mechanisms in the Control of Gene Expression (eds. Nierlich, D.P., Rutter, W.J. & Fox, C.F.) 223–42 (Academic Press, New York, 1976).

Parker, C. S. & Roeder, R. G. Selective and accurate transcription of the Xenopus laevis 5S RNA genes in isolated chromatin by purified RNA polymerase III. Proc. Natl Acad. Sci. USA 74, 44–48 (1977).

Ng, S. Y., Parker, C. S. & Roeder, R. G. Transcription of cloned Xenopus 5S RNA genes by X. laevis RNA polymerase III in reconstituted systems. Proc. Natl Acad. Sci. USA 76, 136–140 (1979).

Weil, P. A., Luse, D. S., Segall, J. & Roeder, R. G. Selective and accurate initiation of transcription at the Ad2 major late promotor in a soluble system dependent on purified RNA polymerase II and DNA. Cell 18, 469–484 (1979).

Birkenmeier, E. H., Brown, D. D. & Jordan, E. A nuclear extract of Xenopus laevis oocytes that accurately transcribes 5S RNA genes. Cell 15, 1077–1086 (1978).

Wu, G. J. Adenovirus DNA-directed transcription of 5.5S RNA in vitro. Proc. Natl Acad. Sci. USA 75, 2175–2179 (1978).

Manley, J. L., Fire, A., Cano, A., Sharp, P. A. & Gefter, M. L. DNA-dependent transcription of adenovirus genes in a soluble whole-cell extract. Proc. Natl Acad. Sci. USA 77, 3855–3859 (1980).

Grummt, I. Specific transcription of mouse ribosomal DNA in a cell-free system that mimics control in vivo. Proc. Natl Acad. Sci. USA 78, 727–731 (1981).

Segall, J., Matsui, T. & Roeder, R. G. Multiple factors are required for the accurate transcription of purified genes by RNA polymerase III. J. Biol. Chem. 255, 11986–11991 (1980).

Matsui, T., Segall, J., Weil, P. A. & Roeder, R. G. Multiple factors required for accurate initiation of transcription by purified RNA polymerase II. J. Biol. Chem. 255, 11992–11996 (1980).

Mishima, Y., Financsek, I., Kominami, R. & Muramatsu, M. Fractionation and reconstitution of factors required for accurate transcription of mammalian ribosomal RNA genes: identification of a species-dependent initiation factor. Nucleic Acids Res. 10, 6659–6670 (1982).

Dumay-Odelot, H., Durrieu-Gaillard, S., El Ayoubi, L., Parrot, C. & Teichmann, M. Contributions of in vitro transcription to the understanding of human RNA polymerase III transcription. Transcription 5, e27526 (2014).

Schramm, L. & Hernandez, N. Recruitment of RNA polymerase III to its target promoters. Genes Dev. 16, 2593–2620 (2002).

Geiduschek, E. P. & Kassavetis, G. A. The RNA polymerase III transcription apparatus. J. Mol. Biol. 310, 1–26 (2001).

Orphanides, G., Lagrange, T. & Reinberg, D. The general transcription factors of RNA polymerase II. Genes Dev. 10, 2657–2683 (1996).

Roeder, R. G. The role of general initiation factors in transcription by RNA polymerase II. Trends Biochem. Sci. 21, 327–335 (1996).

Thomas, M. C. & Chiang, C. M. The general transcription machinery and general cofactors. Crit. Rev. Biochem. Mol. Biol. 41, 105–178 (2006).

Drygin, D., Rice, W. G. & Grummt, I. The RNA polymerase I transcription machinery: an emerging target for the treatment of cancer. Annu. Rev. Pharmacol. Toxicol. 50, 131–156 (2010).

Goodfellow, S. J. & Zomerdijk, J. C. Basic mechanisms in RNA polymerase I transcription of the ribosomal RNA. genes. Subcell. Biochem. 61, 211–236 (2013).

Lassar, A. B., Martin, P. L. & Roeder, R. G. Transcription of class III genes: formation of preinitiation complexes. Science 222, 740–748 (1983).

Bieker, J. J., Martin, P. L. & Roeder, R. G. Formation of a rate-limiting intermediate in 5S RNA gene transcription. Cell 40, 119–127 (1985).

Van Dyke, M. W., Roeder, R. G. & Sawadogo, M. Physical analysis of transcription preinitiation complex assembly on a class II gene promoter. Science 241, 1335–1338 (1988).

Buratowski, S., Hahn, S., Guarente, L. & Sharp, P. A. Five intermediate complexes in transcription initiation by RNA polymerase II. Cell 56, 549–561 (1989).

Flores, O., Lu, H. & Reinberg, D. Factors involved in specific transcription by mammalian RNA polymerase II. Identification and characterization of factor IIH. J. Biol. Chem. 267, 2786–2793 (1992).

Parker, C. S. & Topol, J. A Drosophila RNA polymerase II transcription factor contains a promoter-region-specific DNA-binding activity. Cell 36, 357–369 (1984).

Sawadogo, M. & Roeder, R. G. Interaction of a gene-specific transcription factor with the adenovirus major late promoter upstream of the TATA box region. Cell 43, 165–175 (1985).

Learned, R. M., Cordes, S. & Tjian, R. Purification and characterization of a transcription factor that confers promoter specificity to human RNA polymerase I. Mol. Cell. Biol. 5, 1358–1369 (1985).

Clos, J., Buttgereit, D. & Grummt, I. A purified transcription factor (TIF-IB) binds to essential sequences of the mouse rDNA promoter. Proc. Natl Acad. Sci. USA 83, 604–608 (1986).

Zawel, L., Kumar, K. P. & Reinberg, D. Recycling of the general transcription factors during RNA polymerase II transcription. Genes Dev. 9, 1479–1490 (1995).

Rudloff, U., Eberhard, D., Tora, L., Stunnenberg, H. & Grummt, I. TBP-associated factors interact with DNA and govern species specificity of RNA polymerase I transcription. EMBO J. 13, 2611–2616 (1994).

Beckmann, H., Chen, J. L., O’Brien, T. & Tjian, R. Coactivator and promoter-selective properties of RNA polymerase I TAFs. Science 270, 1506–1509 (1995).

Kadonaga, J. T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).

Louder, R. K. et al. Structure of promoter-bound TFIID and model of human pre-initiation complex assembly. Nature 531, 604–609 (2016).

Goodrich, J. A. & Tjian, R. Unexpected roles for core promoter recognition factors in cell-type-specific transcription and gene regulation. Nat. Rev. Genet. 11, 549–558 (2010).

Vermeulen, M. et al. Selective anchoring of TFIID to nucleosomes by trimethylation of histone H3 lysine 4. Cell 131, 58–69 (2007).

Lauberth, S. M. et al. H3K4me3 interactions with TAF3 regulate preinitiation complex assembly and selective gene activation. Cell 152, 1021–1036 (2013).

Luse, D. S. & Roeder, R. G. Accurate transcription initiation on a purified mouse beta-globin DNA fragment in a cell-free system. Cell 20, 691–699 (1980).

Grummt, I. Life on a planet of its own: regulation of RNA polymerase I transcription in the nucleolus. Genes Dev. 17, 1691–1702 (2003).

White, R. J. RNA polymerases I and III, growth control and cancer. Nat. Rev. Mol. Cell Biol. 6, 69–78 (2005).

Willis, I. M. & Moir, R. D. Signaling to and from the RNA polymerase III transcription and processing machinery. Annu. Rev. Biochem. 87, 75–100 (2018).

Hu, X. et al. A Mediator-responsive form of metazoan RNA polymerase II. Proc. Natl Acad. Sci. USA 103, 9506–9511 (2006).

Jishage, M. et al. Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1. Nat. Struct. Mol. Biol. 25, 859–867 (2018).

Sikorski, T. W. & Buratowski, S. The basal initiation machinery: beyond the general transcription factors. Curr. Opin. Cell Biol. 21, 344–351 (2009).

Nikolov, D. B. & Burley, S. K. RNA polymerase II transcription initiation: a structural view. Proc. Natl Acad. Sci. USA 94, 15–22 (1997).

Cramer, P. et al. Structure of eukaryotic RNA polymerases. Annu. Rev. Biophys. 37, 337–352 (2008).

Murakami, K. et al. Structure of an RNA polymerase II preinitiation complex. Proc. Natl Acad. Sci. USA 112, 13543–13548 (2015).

Robinson, P. J. et al. Structure of a complete mediator-RNA polymerase II pre-initiation complex. Cell 166, 1411–1422.e16 (2016).

Plaschka, C. et al. Transcription initiation complex structures elucidate DNA opening. Nature 533, 353–358 (2016).

He, Y., Fang, J., Taatjes, D. J. & Nogales, E. Structural visualization of key steps in human transcription initiation. Nature 495, 481–486 (2013).

He, Y. et al. Near-atomic resolution visualization of human transcription promoter opening. Nature 533, 359–365 (2016).

Engel, C. et al. Structural basis of RNA polymerase I transcription initiation. Cell 169, 120–131.e22 (2017).

Sadian, Y. et al. Structural insights into transcription initiation by yeast RNA polymerase I. EMBO J. 36, 2698–2709 (2017).

Abascal-Palacios, G., Ramsay, E. P., Beuron, F., Morris, E. & Vannini, A. Structural basis of RNA polymerase III transcription initiation. Nature 553, 301–306 (2018).

Zhang, Z. et al. Rapid dynamics of general transcription factor TFIIB binding during preinitiation complex assembly revealed by single-molecule analysis. Genes Dev. 30, 2106–2118 (2016).

Darnell, J. E., Jelinek, W. R. & Molloy, G. R. Biogenesis of mRNA: genetic regulation in mammalian cells. Science 181, 1215–1221 (1973).

Engelke, D. R., Ng, S. Y., Shastry, B. S. & Roeder, R. G. Specific interaction of a purified transcription factor with an internal control region of 5S RNA genes. Cell 19, 717–728 (1980).

Sakonju, S., Bogenhagen, D. F. & Brown, D. D. A control region in the center of the 5S RNA gene directs specific initiation of transcription: I. The 5′ border of the region. Cell 19, 13–25 (1980).

Ginsberg, A. M., King, B. O. & Roeder, R. G. Xenopus 5S gene transcription factor, TFIIIA: characterization of a cDNA clone and measurement of RNA levels throughout development. Cell 39, 479–489 (1984).

Miller, J., McLachlan, A. D. & Klug, A. Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J. 4, 1609–1614 (1985).

Lambert, S. A. et al. The human transcription factors. Cell 175, 598–599 (2018).

Payvar, F. et al. Purified glucocorticoid receptors bind selectively in vitro to a cloned DNA fragment whose transcription is regulated by glucocorticoids in vivo. Proc. Natl Acad. Sci. USA 78, 6628–6632 (1981).

Dynan, W. S. & Tjian, R. Isolation of transcription factors that discriminate between different promoters recognized by RNA polymerase II. Cell 32, 669–680 (1983).

Parker, C. S. & Topol, J. A Drosophila RNA polymerase II transcription factor binds to the regulatory site of an hsp 70 gene. Cell 37, 273–283 (1984).

Carthew, R. W., Chodosh, L. A. & Sharp, P. A. An RNA polymerase II transcription factor binds to an upstream element in the adenovirus major late promoter. Cell 43, 439–448 (1985).

Bram, R. J. & Kornberg, R. D. Specific protein binding to far upstream activating sequences in polymerase II promoters. Proc. Natl Acad. Sci. USA 82, 43–47 (1985).

Horikoshi, M., Hai, T., Lin, Y. S., Green, M. R. & Roeder, R. G. Transcription factor ATF interacts with the TATA factor to facilitate establishment of a preinitiation complex. Cell 54, 1033–1042 (1988).

Roberts, S. G., Ha, I., Maldonado, E., Reinberg, D. & Green, M. R. Interaction between an acidic activator and transcription factor TFIIB is required for transcriptional activation. Nature 363, 741–744 (1993).

Rochette-Egly, C., Adam, S., Rossignol, M., Egly, J. M. & Chambon, P. Stimulation of RAR alpha activation function AF-1 through binding to the general transcription factor TFIIH and phosphorylation by CDK7. Cell 90, 97–107 (1997).

Rio, D., Robbins, A., Myers, R. & Tjian, R. Regulation of simian virus 40 early transcription in vitro by a purified tumor antigen. Proc. Natl Acad. Sci. USA 77, 5706–5710 (1980).

Davis, R. L., Weintraub, H. & Lassar, A. B. Expression of a single transfected cDNA converts fibroblasts to myoblasts. Cell 51, 987–1000 (1987).

Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006).

Schaffner, W. Enhancers, enhancers - from their discovery to today’s universe of transcription enhancers. Biol. Chem. 396, 311–327 (2015).

Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

Maniatis, T. et al. Structure and function of the interferon-beta enhanceosome. Cold Spring Harb. Symp. Quant. Biol. 63, 609–620 (1998).

Yu, M. & Ren, B. The three-dimensional organization of mammalian genomes. Annu. Rev. Cell Dev. Biol. 33, 265–289 (2017).

van Steensel, B. & Furlong, E. E. M. The role of transcription in shaping the spatial organization of the genome. Nat. Rev. Mol. Cell Biol. 20, 327–337 (2019).

Malik, S. & Roeder, R. G. The metazoan Mediator co-activator complex as an integrative hub for transcriptional regulation. Nat. Rev. Genet. 11, 761–772 (2010).

Allen, B. L. & Taatjes, D. J. The Mediator complex: a central integrator of transcription. Nat. Rev. Mol. Cell Biol. 16, 155–166 (2015).

Albright, S. R. & Tjian, R. TAFs revisited: more data reveal new twists and confirm old ideas. Gene 242, 1–13 (2000).

Flanagan, P. M., Kelleher, R. J. III, Sayre, M. H., Tschochner, H. & Kornberg, R. D. A mediator required for activation of RNA polymerase II transcription in vitro. Nature 350, 436–438 (1991).

Thompson, C. M., Koleske, A. J., Chao, D. M. & Young, R. A. A multisubunit complex associated with the RNA polymerase II CTD and TATA-binding protein in yeast. Cell 73, 1361–1375 (1993).

Kim, Y. J., Björklund, S., Li, Y., Sayre, M. H. & Kornberg, R. D. A multiprotein mediator of transcriptional activation and its interaction with the C-terminal repeat domain of RNA polymerase II. Cell 77, 599–608 (1994).

Meisterernst, M., Roy, A. L., Lieu, H. M. & Roeder, R. G. Activation of class II gene transcription by regulatory factors is potentiated by a novel activity. Cell 66, 981–993 (1991).

Malik, S., Gu, W., Wu, W., Qin, J. & Roeder, R. G. The USA-derived transcriptional coactivator PC2 is a submodule of TRAP/SMCC and acts synergistically with other PCs. Mol. Cell 5, 753–760 (2000).

Fondell, J. D., Ge, H. & Roeder, R. G. Ligand induction of a transcriptionally active thyroid hormone receptor coactivator complex. Proc. Natl Acad. Sci. USA 93, 8329–8333 (1996).

Chen, W. & Roeder, R. G. Mediator-dependent nuclear receptor function. Semin. Cell Dev. Biol. 22, 749–758 (2011).

Ge, K. et al. Transcription coactivator TRAP220 is required for PPAR gamma 2-stimulated adipogenesis. Nature 417, 563–567 (2002).

Malik, S. & Roeder, R. G. Mediator: a drawbridge across the enhancer-promoter divide. Mol. Cell 64, 433–434 (2016).

Tsai, K. L. et al. Subunit architecture and functional modular rearrangements of the transcriptional mediator complex. Cell 157, 1430–1444 (2014).

Cevher, M. A. et al. Reconstitution of active human core Mediator complex reveals a critical role of the MED14 subunit. Nat. Struct. Mol. Biol. 21, 1028–1034 (2014).

Tsai, K. L. et al. Mediator structure and rearrangements required for holoenzyme formation. Nature 544, 196–201 (2017).

Plaschka, C. et al. Architecture of the RNA polymerase II-Mediator core initiation complex. Nature 518, 376–380 (2015).

Nozawa, K., Schneider, T. R. & Cramer, P. Core Mediator structure at 3.4 Å extends model of transcription initiation complex. Nature 545, 248–251 (2017).

Buratowski, S., Hahn, S., Sharp, P. A. & Guarente, L. Function of a yeast TATA element-binding protein in a mammalian transcription system. Nature 334, 37–42 (1988).

Cavallini, B. et al. A yeast activity can substitute for the HeLa cell TATA box factor. Nature 334, 77–80 (1988).

Hoffman, A. et al. Highly conserved core domain and unique N terminus with presumptive regulatory motifs in a human TATA factor (TFIID). Nature 346, 387–390 (1990).

Pugh, B. F. & Tjian, R. Mechanism of transcriptional activation by Sp1: evidence for coactivators. Cell 61, 1187–1197 (1990).

Chen, W. Y. et al. A TAF4 coactivator function for E proteins that involves enhanced TFIID binding. Genes Dev. 27, 1596–1609 (2013).

Patel, A. B. et al. Structure of human TFIID and mechanism of TBP loading onto promoter DNA. Science 362, eaau8872 (2018).

Luo, Y., Fujii, H., Gerster, T. & Roeder, R. G. A novel B cell-derived coactivator potentiates the activation of immunoglobulin promoters by octamer-binding transcription factors. Cell 71, 231–241 (1992).

Kim, U. et al. The B-cell-specific transcription coactivator OCA-B/OBF-1/Bob-1 is essential for normal production of immunoglobulin isotypes. Nature 383, 542–547 (1996).

Lin, J., Handschin, C. & Spiegelman, B. M. Metabolic control through the PGC-1 family of transcription coactivators. Cell Metab. 1, 361–370 (2005).

Wallberg, A. E., Yamamura, S., Malik, S., Spiegelman, B. M. & Roeder, R. G. Coordination of p300-mediated chromatin remodeling and TRAP/mediator function through coactivator PGC-1alpha. Mol. Cell 12, 1137–1149 (2003).

Ge, H. & Roeder, R. G. Purification, cloning, and characterization of a human coactivator, PC4, that mediates transcriptional activation of class II genes. Cell 78, 513–523 (1994).

Ge, H., Si, Y. & Roeder, R. G. Isolation of cDNAs encoding novel transcription coactivators p52 and p75 reveals an alternate regulatory mechanism of transcriptional activation. EMBO J. 17, 6723–6729 (1998).

Wysocka, J. & Herr, W. The herpes simplex virus VP16-induced complex: the makings of a regulatory switch. Trends Biochem. Sci. 28, 294–304 (2003).

Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707–719 (2007).

Ito, T., Bulger, M., Pazin, M. J., Kobayashi, R. & Kadonaga, J. T. ACF, an ISWI-containing and ATP-utilizing chromatin assembly and remodeling factor. Cell 90, 145–155 (1997).

Knezetic, J. A. & Luse, D. S. The presence of nucleosomes on a DNA template prevents initiation by RNA polymerase II in vitro. Cell 45, 95–104 (1986).

Lorch, Y., LaPointe, J. W. & Kornberg, R. D. Nucleosomes inhibit the initiation of transcription but allow chain elongation with the displacement of histones. Cell 49, 203–210 (1987).

Workman, J. L. & Roeder, R. G. Binding of transcription factor TFIID to the major late promoter during in vitro nucleosome assembly potentiates subsequent initiation by RNA polymerase II. Cell 51, 613–622 (1987).

Han, M. & Grunstein, M. Nucleosome loss activates yeast downstream promoters in vivo. Cell 55, 1137–1145 (1988).

Workman, J. L., Abmayr, S. M., Cromlish, W. A. & Roeder, R. G. Transcriptional regulation by the immediate early protein of pseudorabies virus during in vitro nucleosome assembly. Cell 55, 211–219 (1988).

Kraus, W. L. & Kadonaga, J. T. p300 and estrogen receptor cooperatively activate transcription via differential enhancement of initiation and reinitiation. Genes Dev. 12, 331–342 (1998).

Utley, R. T. et al. Transcriptional activators direct histone acetyltransferase complexes to nucleosomes. Nature 394, 498–502 (1998).

An, W., Palhan, V. B., Karymov, M. A., Leuba, S. H. & Roeder, R. G. Selective requirements for histone H3 and H4 N termini in p300-dependent transcriptional activation from chromatin. Mol. Cell 9, 811–821 (2002).

An, W., Kim, J. & Roeder, R. G. Ordered cooperative functions of PRMT1, p300, and CARM1 in transcriptional activation by p53. Cell 117, 735–748 (2004).

Tang, Z. et al. SET1 and p300 act synergistically, through coupled histone modifications, in transcriptional activation by p53. Cell 154, 297–310 (2013).

Gu, W. & Roeder, R. G. Activation of p53 sequence-specific DNA binding by acetylation of the p53 C-terminal domain. Cell 90, 595–606 (1997).

Sheikh, B. N. & Akhtar, A. The many lives of KATs - detectors, integrators and modulators of the cellular environment. Nat. Rev. Genet. 20, 7–23 (2019).

Wang, S. P. et al. A UTX-MLL4-p300 Transcriptional Regulatory Network Coordinately Shapes Active Enhancer Landscapes for Eliciting Transcription. Mol. Cell 67, 308–321.e6 (2017).

Tan, M. et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell 146, 1016–1028 (2011).

Sabari, B. R. et al. Intracellular crotonyl-CoA stimulates transcription through p300-catalyzed histone crotonylation. Mol. Cell 58, 203–215 (2015).

Guermah, M., Palhan, V. B., Tackett, A. J., Chait, B. T. & Roeder, R. G. Synergistic functions of SII and p300 in productive activator-dependent transcription of chromatin templates. Cell 125, 275–286 (2006).

Kim, J., Guermah, M. & Roeder, R. G. The human PAF1 complex acts in chromatin transcription elongation both independently and cooperatively with SII/TFIIS. Cell 140, 491–503 (2010).

Li, G. et al. Highly compacted chromatin formed in vitro reflects the dynamics of transcription activation in vivo. Mol. Cell 38, 41–53 (2010).

Shimada, M. et al. Gene-Specific H1 Eviction through a Transcriptional activator→p300→NAP1→H1 Pathway. Mol. Cell 74, 268–283.e5 (2019).

Roeder, R. G. Lasker Basic Medical Research Award. The eukaryotic transcriptional machinery: complexities and mechanisms unforeseen. Nat. Med. 9, 1239–1244 (2003).

Kaikkonen, M. U. & Adelman, K. Emerging roles of non-coding RNA transcription. Trends Biochem. Sci. 43, 654–667 (2018).

Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).

Coleman, R. A. et al. Imaging transcription: past, present, and future. Cold Spring Harb. Symp. Quant. Biol. 80, 1–8 (2015).

Wang, H., La Russa, M. & Qi, L. S. CRISPR/Cas9 in genome editing and beyond. Annu. Rev. Biochem. 85, 227–264 (2016).

Hnisz, D., Shrinivas, K., Young, R. A., Chakraborty, A. K. & Sharp, P. A. A phase separation model for transcriptional control. Cell 169, 13–23 (2017).

Hahn, S. Phase separation, protein disorder, and enhancer function. Cell 175, 1723–1725 (2018).

## 1 Introduction and historical perspective

Eukaryotic genomes are complex (up to 25 000 genetic loci in human) and organized within compact nucleoprotein (chromatin) structures. The mechanisms by which individual genes are activated are of intense interest and physiological importance, and studies over the past 35 years have revealed several levels of control [1] .

First, eukaryotes contain three functionally distinct classes of nuclear RNA polymerases [2] that selectively transcribe large ribosomal RNA genes (RNA polymerase I), protein-coding and some small structural RNA genes (RNA polymerase II) and tRNA, 5S RNA and other small structural RNA genes (RNA polymerase III) [3, 4] . These specificities are reflected in the structurally distinct subunit compositions of the three RNA polymerases [5] , which contain both common subunits and unique subunits related to those of the bacterial RNA polymerase [6] , and allow for independent global regulation of the major classes of RNA.

Second, eukaryotic cells contain RNA polymerase-specific general initiation factors that, despite the structural complexity of the enzymes (14, 12 and 17 subunits in RNA polymerases I, II and III, respectively), are necessary for accurate transcription initiation on corresponding core promoter elements by purified RNA polymerases [7-11] . These factors are now known to include TFIIIC and TFIIIB for RNA polymerase III [12] TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH for RNA polymerase II [13] and several factors for RNA polymerase I [14] . Following identification of core promoter recognition factors (TFIIIC for class III and TFIID for class II promoters), mechanistic studies revealed pathways for the ordered assembly of initiation factors and RNA polymerases into corresponding preinitiation complexes (PICs) [15-18] (Figs. 1, left, and 2, Figs. 1, left, and 2 , right). The structural complexity of the basic preinitiation complexes (∼25 and 44 polypeptides, respectively, for RNA polymerases III and II) is remarkable and a variety of biochemical and genetic analyses have provided much detail regarding the structure and function of the individual polypeptides during PIC formation and during subsequent transcription initiation and post-initiation events [12-14] . In the case of RNA polymerase II, additional insights into PIC formation and RNA polymerase function have been provided by structural studies of TBP-TATA and higher order complexes [19] and of RNA polymerase II itself [6, 20] .

Given that RNA polymerases and general initiation factors are the ultimate targets of regulatory factors, these complex assembly pathways offer many points for regulatory interactions. In this regard, it is important to note that while purified RNA polymerases and corresponding general initiation factors (comprising the basal transcription machinery) have an intrinsic ability to accurately transcribe DNA templates through core promoter elements, thus allowing the fundamental transcription mechanisms to be elucidated, these activities are generally suppressed in the cell by the packaging of DNA within chromatin and by negative cofactors that directly interfere with the function of the basal transcription factors (Fig. 3 ). As discussed below, this imposes requirements for transcriptional activators and corresponding cofactors that act in a gene-specific manner both to reverse the repression (anti-repression) and to effect a net activation above the intrinsic activity of the basal transcription machinery (Fig. 3).

Third, eukaryotes contain diverse sequence-specific DNA binding transcriptional regulatory factors that facilitate RNA polymerase function on corresponding target genes. The 5S gene-specific TFIIIA was the first of these to be identified as such, purified and cloned [21, 22] , and also represents the prototype zinc finger protein [23] . [Zinc finger proteins are the most common of the 2500 or so presumptive DNA binding regulatory proteins in the human genome and studies of their DNA recognition mechanism have allowed the design of novel gene-selective targeting proteins [24] ]. Mechanistically, TFIIIA was shown to facilitate transcription of the 5S gene by RNA polymerase III through interactions with TFIIIC, which otherwise does not bind to the 5S promoter this in turn facilitates TFIIIB and RNA polymerase III recruitment [15] (Fig. 1, right). While distinct from the activation mechanism in prokaryotes, involving direct activator-RNA polymerase interactions [25] , this mechanism (involving indirect effects on RNA polymerase) has proved to be general in eukaryotes and allows additional regulatory inputs. Following the paradigm established by TFIIIA for eukaryotes, a large number of sequence-specific DNA-binding regulatory factors, often with distinct DNA-binding and activation (or repression) domains, have been identified and characterized both structurally [26] and functionally. The vast majority of these are involved in the regulation of the large group of protein-coding genes transcribed by RNA polymerase II.

Fourth, the DNA-binding factors that regulate the transcription of protein coding genes act in conjunction with an expanding group of cofactors that act either through modifications of chromatin structure or, more directly, to regulate formation or function (transcription initiation or elongation) of the preinitiation complex (Fig. 2). The requirements for cofactors involved more directly in transcription [27] are somewhat surprising in view of the specificity intrinsic to the various DNA-binding regulatory factors, the structural complexity of their ultimate target (the basal transcription machinery), and documented interactions between regulatory factors and components of the basal transcription machinery [13, 28] . However, this additional layer of complexity again allows a variety of novel regulatory mechanisms. Select transcriptional coactivators are the main subject of this minireview and are discussed further below.

## Materials and Methods

### Sample Preparation.

Oligonucleotides used for elongation complex formation and PCR amplification of transcription templates are given in Table S2. Templates were prepared via PCR reactions using one primer with a 5′ digoxygenin NHS ester (Dig) and the other with a 5′ restriction enzyme site as noted in the primer name. PCR templates were genomic preparations from Saccharomyces cerevisiae (AT-rich), bacteriophage lambda (Random, New England Biolabs), and Myxococcus xanthus (GC-rich, courtesy of D. Zusman, University of California, Berkeley, CA). After column purification, PCR products were digested using the restriction enzyme denoted in the corresponding primer name and then column purified again.

Preparation of stable elongation complexes (ECs) and their ligation to the template DNA was similar to that previously described (5, 29). ECs were created by annealing the TDS and RNA9 oligonucleotides, followed by addition of the biotinylated RNA polymerase and then the NDS oligonucleotide. ECs were then ligated to the PCR products described above, resulting in ligated ECs of 4137 bp (AT-rich), 5017 bp (random), and 4240 bp (GC-rich).

The ligated elongation complexes were incubated with 2.1 μm streptavidin-coated polystyrene beads (Spherotech). Polymerases were stalled by addition of ATP, CTP, and GTP to a final concentration of 10 μM and incubated at RT for 5 min. This reaction was then diluted 1∶100 in TB40 (20 mM Tris-HCl, pH 7.9, 40 mM KCl, 10 mM MgCl2, 10 mM DTT) and introduced into the fluidics apparatus of the optical chamber. Once a single bead was trapped in one of the optical traps, 2.1 μm anti-digoxygenin (Roche Diagnostics) cross-linked IgG-coated polystyrene beads (Spherotech) were introduced into the optical chamber and trapped in the other trap. These two beads were then rubbed together by moving the position of one of the optical traps, until an increase in the force was detected upon separation of the beads. Single tethers were distinguished by confirmation that the distance between the beads corresponded to the tether length of the template. Once single tethers were confirmed, transcription was restarted by flowing TB40 supplemented with 1 mM NTPs and 1 μM pyrophosphate into the chamber.

### Expression/Purification of Biotinylated Rpo41.

A DNA sequence encoding for the 13 amino-acid biotinylation tag (GLNDIFEAQ K IEWHE the site of biotinylation is underlined) was inserted into the StuI site of the Rpo41 expression plasmid pProExHtb-RPO41 (45) to yield plasmid pProExHtb-Avi-RPO41. This plasmid was transformed into BirA (biotin ligase) expression strain AVB100 (Avidity, Inc.) along with the CodonPlus plasmid (Stratagene) and then expressed as previously described (45), except with additional induction of BirA as per the manufacturer’s instructions. Purification was similar to that previously described (45) except for an additional chromatographic step using the SoftLink Avidin column (Promega). All chromatographic steps used buffer TB300 (20 mM Tris-HCl, pH 7.9, 300 mM KCl, 10 mM MgCl2, 10 mM DTT, 10% glycerol) with 5 mM biotin for elution from the avidin column and 500 mM imidazole for elution from the nickel column. Purified protein was dialyzed with TB300, snap-frozen using liquid nitrogen, and stored at -80 °C.

### Data Acquisition/Analysis.

The instrumentation and data acquisition methods were as previously described (5, 21). Data were taken at 2,000 Hz. Force and position data from traces containing active transcription were averaged by decimation to 50 Hz and then smoothed using a second order Savitsky–Golay filter with a time constant of 1 s. Because the optical tweezers method only returns changes in extension, we stalled the polymerases at a defined site on DNA to compute the initial extension of the tether. We calculated the DNA contour length at later times using the updated end-to-end extension of the tether and the force applied using the worm-like chain model of polymer statistics (22). Then, we converted changes in distance between the beads into the number of base pairs transcribed by the enzyme.

### Statistical Methods.

Due to errors inherent in pause-picking algorithms (see SI Text), a statistical algorithm for pause-free velocity determination was used. This method relies on the notion that pausing is a diffusive process and, therefore, the velocities during a pause arise from one-dimensional Brownian motion of the polymerase moving along the template. Consequently, the velocities of polymerases observed during pauses should be Gaussian distributed and centered on zero velocity. A histogram of all of the velocities for a given dataset was composed, and a zero-centered Gaussian was fit to the bins containing negative velocities. This fit (see yellow solid line in Fig. S4) was then extrapolated over all data (dashed yellow line) and subtracted from the data. The leftover velocities correspond to the data cleaned of pauses (i.e., the pause-free velocity distribution). The average of this leftover data is the pause-free velocity of the dataset. The errors are the standard deviation of the means determined by sampling the traces with replacement (bootstrapping).

A pause-picking method commonly used in the single-molecule transcriptional field was used for pause analysis (5, 29, 46), and it is described in detail in SI Text. The pause duration distribution was fit to the theoretical distribution as described by Eq. 1 by performing a two-sample Kolmogorov–Smirnov test between the two distributions, keeping only those that were deemed statistically indistinguishable. This was done within the bounds determined by fitting the theoretical mean pause durations and pause densities with those defined by the observed means plus and minus the observed standard errors (see SI Text).

### Cotranscriptional Folding Simulation.

The first 400 bp of the template sequences used in this study were input into Kinefold, a cotranscriptional RNA folding simulation (35). Each simulation was run in batch mode using 56 and 40 ms as the nucleotide addition time for Pol II and Rpo41, respectively (corresponding to a pause-free velocity of 18 and 25 bp/s, respectively). These simulations were repeated with at least three different seeds for each enzyme/template combination. The difference in mean simulated folding energy was less than 5% of its mean value for all seeds and all templates, and it varied less than 1% between enzymes despite the difference in nucleotide addition times. These energies were corrected for salt concentration (47) and were denoted ΔGsim in the main text. Two of the structures from the GC-rich template were highly sensitive to this salt correction, leading to a significant range in the corrected simulated energy barriers for that template.