Why are most transcription factors enhancing the expression rather than repressing?

One can classify the effects of Transcription Factors (TF) on gene expression into two types: it either enhance or repress the gene expression.

I have always been told that most of transcription factors functions act to enhance gene expression rather than repress gene expression. Assuming this is true, my question is why are activators more common than repressors?. Is there for example a mechanistic explanation for that (cost less energy to enhance or the enhancement pathway is easier to evolve)?

From wikipedia

While activators can interact directly or indirectly with the core machinery of transcription through enhancer binding, repressors predominantly recruit co-repressor complexes leading to transcriptional repression by chromatin condensation of enhancer regions.

The mechanisms are different and it sounds a likely cause of the overrepresentation of enhancement function of transcription factors in genomes but it is not obvious to me why activators and repressors have to have use different mechanisms and it is not obvious to me either why these different mechanisms would yield to a ratio of repressors to activators other than $frac{1}{2}$.

I can answer this question only based on guesses because I am not really sure about your claim that activators are higher in number than repressors. So consider this as an extended comment.

While activators can interact directly or indirectly with the core machinery of transcription through enhancer binding, repressors predominantly recruit co-repressor complexes leading to transcriptional repression by chromatin condensation of enhancer regions

Not really true. A repressor can simply sit on the the promoter and prevent RNAP from binding the to the latter.


Transcriptional repressors are usually viewed as proteins that bind to promoters in a way that impedes subsequent binding of RNA polymerase. Although this repression mechanism is found at several promoters, there is a growing list of repressors that inhibit transcription initiation in other ways. For example, several repressors allow the simultaneous binding of RNA polymerase to the promoter, but interfere with subsequent events of the initiation process, eventually inhibiting transcription initiation. The recent increase in the number of repressors for which the repression mechanism has been characterized in detail has shown an amazing variety of strategies to repress transcription initiation. It is not surprising to find that the repression mechanism used is usually exquisitely adapted to the characteristics of the promoter and of the repressor involved.

Some justification on why the number activators would be more:

A network perspective:

Lets assume an that gene-A somehow causes the repression of the gene-B. This repression can be either direct (A being a repressor of B) or indirect via some other genes. In the case of indirect regulation you just need one repressor in the network path A → B to result in repression of the latter.

So all repressive paths would minimally require only one repressor, all other steps in the path can be activation; whereas in an activating path you would need 'n' activators for 'n' steps.

Post-transcriptional Regulation:

Now if you include post-transcriptional regulation then you can find more number of repressors: miRNAs (2588 reported in humans: miRbase-21) and many proteins as well (For an e.g. see here). Most cases of events leading to translational activation are actually derepression. It is logical in a way because an mRNA is a temporary product; it should not by default require an additional signal to start producing proteins.

12.3: Eukaryotic Gene Regulation

  • Contributed by Todd Nickle and Isabelle Barrette-Ng
  • Professors (Biology) at Mount Royal University & University of Calgary

Like prokaryotes, transcriptional regulation in eukaryotes involves both cis-elements and trans-factors, only there are more of them and they interact in a more complex way. A diagram of a typical eukaryotic gene, including several types of cis-elements, is shown in Figure (PageIndex<7>).

Transcription Factors

Transcription Factors: What They Do

Transcription factors play many different roles, which vary according to the organism in question. For example, in vertebrates, transcription factors are directly responsible for development, with groups of different factors coming into play in specific tissues. Transcription factors are especially important during embryonic development and thus specific factors are essential for the differentiation of pluripotent embryonic stem cells. Similarly, the activity of other factors must be maintained for stem cells to retain their ability to turn into any cell type and to self-renew. It is not surprising that many human diseases or abnormalities are caused by the misfunction of transcription factors. Similarly, somatic mutation or chromosomal rearrangements that affect certain transcription factors play a key role in the development of some human cancers. Understanding how the sequential deployment of transcription factors controls differentiation and development is a vibrant current area of research and it is important to note the value of studies with mice, zebra fish, fruit flies, and nematodes in understanding how transcription factors drive development. The situation in unicellular organisms is different where the primary role of transcription factors is to manage adaptation to environmental change, for example, sensing nutrients or coping with life in stressful niches. Detailed information on the number and nature of transcription factors in different organisms can be found on many websites.

Nuclear transcription factors as direct regulators of mitochondrial gene expression

The nuclear transcription factors best characterized as direct regulators of mitochondrial gene expression in mammals are the T3 receptor p43, CREB, the tumor suppressor p53, signal transducer and activator of transcription 3 (Stat3) and the estrogen receptor. p43 and CREB are transcription factors that can bind mitochondrial DNA to regulate gene expression. p53, Stat3, and, potentially, the estrogen receptor are thought to act as co-regulators, affecting mitochondrial gene expression through protein-protein interactions.

The T3receptor

The thyroid hormone T3 is a primary regulator of mammalian mitochondrial biogenesis [19] and can influence mitochondrial function both indirectly and directly. In its indirect role, it binds to members of the nuclear receptor superfamily of transcription factors known as T3 receptor α and β (T3Rα and T3Rβ) to regulate nuclear gene transcription [20] (Figure 3). Nuclear transcriptional targets of these receptors include genes that stimulate mitochondrial biogenesis, such as those encoding the transcription factor nuclear respiratory factor 1 (NRF-1) and the cofactor PGC-1α [21, 22] as well as the mitochondrial basal transcription factor Tfam [23, 24].

Regulation of mitochondrial function by a thyroid hormone. Indirect regulation: binding the thyroid hormone tri-iodothyronine (T3) to the T3 receptor (T3R) leads to upregulation of transcriptional regulators of mitochondrial biogenesis, such as NRF-1 and PGC-1α. NRF-1 and PGC-1α then can upregulate transcription of the nuclear-encoded mitochondrial basal transcription machinery (Tfam, Polrmt), which stimulates mitochondrial DNA (mtDNA) replication and mitochondrial biogenesis. Direct regulation: thyroid hormone binds directly to two mitochondrial proteins, the inner mitochondrial membrane adenine nucleotide transporter (AdNT) and a truncated version of T3R located in the mitochondrial matrix. T3 regulates expression from the mitochondrial genome via T3R, which may bind directly to the mitochondrial DNA.

T3 also regulates mitochondrial function directly via two pathways: the regulation of nucleotide transport across the inner membrane via a T3-binding adenine nucleotide transporter (AdNT) [25, 26] and control of mitochondrial transcription via the mitochondrially localized T3Rα1 isoform known as p43 (Figure 3 reviewed in [27, 28], see also [10]). Most proteins are imported into mitochondria in an ATP-dependent manner via the protein translocator channel TOM, which recognizes an amino-terminal mitochondrial localization signal that is then cleaved during import. p43, however, is imported into rat liver mitochondria via a different pathway, previously shown for the yeast mitochondrial transcription factor MTF1 [15], which is independent of both TOM and mitochondrial ATP levels, and does not result in cleavage of the imported protein.

One potential mechanism by which transcription factors could be accurately sorted among different cellular compartments is revealed when considering the T3 receptors. It was noticed that a protein construct mimicking a T3Rβ isoform with a truncated amino terminus (which is the form present in most non-mammalian vertebrates) is specifically imported into isolated rat mitochondria, suggesting a role for the amino-terminal truncation in mitochondrial import [15]. p43 is itself a truncated form of the full-length nuclear transcription factor T3Rα and is translated from an alternative start site in the T3Rα mRNA [15, 29]. The expression of p43 is, however, regulated independently from the full-length T3Rα and it shows a distinct tissue-specific pattern of expression [29].

T3 receptors associate with nuclear DNA in a sequence-specific manner via T3-response elements (T3REs), DNA motifs first recognized in the promoters of T3-responsive genes. Multiple T3REs have been identified in the mouse mitochondrial genome that confer responsiveness to thyroid hormone in nuclear reporter assays, and p43 binds to these sequences in vitro in EMSAs. These techniques do not, however, address the question of whether p43 binds mitochondrial DNA in vivo under physiological conditions. This was partly addressed in a series of studies utilizing induced-hypothyroid rats, in which physiological T3 levels were found to regulate mitochondrial gene expression directly in vivo. Mitochondria isolated from the livers of these rats showed that changes in physiological thyroid hormone levels altered the relative levels of mitochondrial mRNA and rRNA, which correlated with altered protein occupation of the mitochondrial D-loop as determined by DNA footprinting [30]. The independence of this direct mitochondrial role for T3 from the well-characterized indirect role was shown when isolated mitochondria from hypothyroid rats were treated with T3 and the mitochondrial mRNA:rRNA ratio and the pattern of DNA footprinting returned to that of normal rats [30]. This indicated that T3 regulates mitochondria directly and suggested that this pathway may involve a mitochondrial T3 receptor with similar binding preferences to the nuclear form.

The role of p43 in T3-mediated regulation of mitochondrial transcription was confirmed using the same in organello system from induced-hypothyroid rats to show that the addition of p43 (translated in vitro to avoid possible contamination by cellular components) stimulated mitochondrial gene transcription in the presence of T3 [15], whereas T3 treatment in the absence of p43 did not stimulate mitochondrial gene expression [15]. In validation of the mitochondrial role of p43 in vivo, mice overexpressing p43 under the control of a muscle-specific promoter exhibited increased mitochondrial gene expression and mitochondrial biogenesis in muscle, and had increased oxidative metabolism, with body temperature 0.8°C higher than control mice [31].

The direct regulation of mitochondrial transcription by T3 is complex and highly tissue specific. In organello studies that demonstrate the responsiveness of liver mitochondrial transcription to T3 also demonstrate that mitochondria from the heart are not regulated in this manner. Rather, T3 regulation of mitochondria from the hearts of the induced-hypothyroid rats is indirect - via the nucleus - and primarily at the level of regulating mitochondrial DNA copy number [32]. This complexity is likely to be shared by other transcription factors with a direct mitochondrial activity, and it may explain why early work did not detect direct binding of a protein to the proposed mitochondrial T3REs [30] despite the requirement of a DNA-binding domain in p43 for the observed mitochondrial function [15]. Regardless of this outstanding debate over the location of p43 binding to mitochondrial DNA, evidence is overwhelming that p43 is localized to the mitochondria in rat liver, where it binds to the mitochondrial genome and regulates mitochondrial transcription. A great deal remains to be studied regarding p43 in mitochondria - for example, it is not clear whether this regulatory pathway is conserved within mammals, or in which other tissues it is utilized. Nevertheless, the studies on thyroid hormone and thyroid hormone receptor were the first direct illustration that mitochondrial gene expression is regulated independently from nuclear gene expression and introduced a key model system for the study of nuclear transcription factors in mitochondria.

Cyclic-AMP response element binding protein (CREB)

The transcription factor CREB regulates nuclear gene expression in response to a diverse range of stimuli [33, 34]. CREB is activated by phosphorylation, either by the cyclic-AMP responsive protein kinase A (PKA) or by other kinases, including mitogen-activated protein kinases (MAPKs) and Ca 2+ /calmodulin-dependent kinases (CaMKs) [35]. A self-contained CREB pathway exists in mitochondria, which involves PKA [36], cyclic AMP [37] and CREB [38]. On stimulation, this pathway induces binding of phosphorylated CREB to cyclic-AMP response elements (CREs) in the mitochondrial DNA D-loop and regulation of mitochondrial gene expression [39, 40] (Figure 1).

CREB was first localized to rat brain mitochondria by subcellular fractionation followed by immunoelectron microscopy [38]. Despite not having a classical mitochondrial localization signal, the transport of labeled CREB into isolated rat liver mitochondria depends on the mitochondrial translocator TOM, the import route for most proteins into mitochondria [39]. The mitochondrial pool of CREB can co-immunoprecipitate with the chaperone protein mtHSP70 [40], suggesting a mechanism of targeting to the mitochondria that is dependent on chaperone proteins rather than on a mitochondrial localization signal, as has been shown for p53 [41]. Once in mitochondria, CREB is regulated by phosphorylation in response to the same stimuli as in the nucleus, and in vitro can bind oligonucleotides bearing the consensus CRE sequence [38]. Binding of CREB to the mitochondrial D-loop (Figure 1) has been detected in vivo using ChIP [36] and DNase footprinting, and is dependent on mitochondrial PKA activity [40, 41]. Unlike p43, mitochondrial localization of CREB has been identified in multiple mammalian species and tissues [36, 38, 39].

An overexpression construct that selectively increases levels of CREB in mitochondria was used to distinguish CREB nuclear and mitochondrial regulatory roles in primary cultured neurons from rat brain [40]. These increases in mitochondrial CREB perturbed mitochondrial gene expression without altering the expression levels of CREB's nuclear target, c-fos [40]. The mRNAs of mitochondrially encoded NADH dehydrogenase subunits 2, 4 and 5 (Figure 1) were specifically upregulated conversely, these mRNAs were downregulated on treatment with a dominant-negative form of CREB in the mitochondria [40].

Tumor suppressor protein p53

The tumor suppressor protein p53 is a well-known example of a nuclear transcription factor with a role in mitochondria [42]. First identified by its transcriptional regulatory function [43], p53 also has non-transcriptional functions, and has been implicated in apoptosis [44], senescence [45], autophagy [46], DNA-damage repair and cell-cycle arrest [47].

In mitochondria, p53 directly regulates apoptosis via protein-protein interactions at the outer membrane, and this function has been reviewed thoroughly elsewhere [48, 49]. However, there is considerable evidence for a second mitochondrial role for p53, in mitochondrial DNA maintenance and in mitochondrial DNA-damage repair. Co-immunoprecipitation of p53 with the mitochondrion-specific transcription and mitochondrial DNA packaging factor Tfam suggests that p53 may regulate DNA-damage repair in mitochondria [50], as it does in the nucleus [51, 52]. In KB human epidermoid cancer cells and in HCT116 adenocarcinoma cells, p53 physically interacts with Tfam, with the effect of enhancing the binding of Tfam to cisplatin-damaged DNA at the expense of oxidized DNA, in a reversal of Tfam's normal binding pattern [50].

p53 also seems to play a role in mitochondrial base-excision repair. In a nucleus-free in vitro system derived from the mitochondria of mouse liver, p53 can stimulate the gap-filling function of the mitochondrial DNA polymerase mtPOLγ [53]. A physical interaction between p53 and mtPOLγ in vivo has been detected in HCT116 cells [54], where p53 enhances the replication function of mtPOLγ and interacts with the mitochondrial genome. The observed binding of p53 to the mitochondrial genome was stimulated by, but not dependent on, DNA damage, suggesting that the role of p53 at the mitochondrial DNA may not be confined to the DNA-damage response. Furthermore, in studies comparing mitochondria from p53-deficient cell lines with those from isogenic p53-positive lines, p53 appears to provide an endogenous proofreading function for mtPOLγ during mitochondrial DNA replication [55].

Despite clear localization of p53 to the mitochondrial matrix and a number of direct associations with mitochondrial DNA, evidence that p53 can bind sequence-specifically to regulate expression of mitochondrially encoded genes remains elusive. Sequences from the mouse mitochondrial genome that resemble the nuclear binding motif of p53 confer p53 responsiveness in nuclear reporter assays [56], but there is no evidence that p53 regulates transcription from the mitochondrial genome. Regardless of whether p53 directly regulates mitochondrial transcription, it plays important mitochondrial roles in apoptosis, DNA integrity and response to stress.

Signal transducer and activator of transcription 3 (Stat3)

Stat3 was first detected in mitochondria as a result of its functional association with GRIM-19, a subunit of the respiratory electron-transport chain NADH dehydrogenase (Complex I), which functions in the transfer of electrons from NADH to the respiratory chain [57, 58]. In the nucleus, Stat3 mediates the transcriptional response to growth factors such as interleukin-6 and epithelial growth factor [59]. Differences between Stat3 function in the mitochondria and nucleus are exemplified by the fact that the mitochondrial pool of Stat3 mediates oncogenic transformation by the small GTPase H-Ras, a process that is mechanistically distinct from how nuclear Stat3 supports oncogenic transformation by the viral oncogene v-Src [60]. Both Stat3-knockdown cell lines and Stat3-knockout mice show disrupted electron-transport chain function [61], which suggests that Stat3 directly regulates mitochondrial function via its effects on the electron-transport chain. Engineered Stat3 mutant proteins have shown that the nuclear role and the mitochondrial role can be functionally isolated [61].

The estrogen receptor

The estrogen receptor was first found to localize to mitochondria of rabbit uterus and ovary in 2001 [62]. This receptor regulates gene expression by binding to estrogen-response elements (EREs) in gene promoters following the binding of the steroid hormone estrogen to the receptor [63]. Nuclear targets include NRF-1, which, as noted earlier, encodes a transcription factor that stimulates mitochondrial biogenesis and is a transcriptional regulator of genes encoding the mitochondrial basal transcription machinery [64] (Figure 2). The indirect regulation of mitochondrial function by the actions of the estrogen receptor has been reviewed elsewhere [65, 66].

There is evidence that estrogen acts directly in mitochondria by two pathways: one utilizing the receptor and the other independent of it [27, 67, 68]. The presence of the estrogen receptor in mitochondria is well established both isoforms, ERα and ERβ, localize to mitochondria in diverse cell lines and tissues, yet their functions remain contentious. EMSAs suggest that ERβ may bind directly to the D-loop of the mitochondrial genome in MCF-7 breast cancer cells (Figure 1). This binding was stimulated by treatment of the cells with estrogen and inhibited by treatment with ERβ-specific antibodies [69]. It has not, however, been shown that isolated mitochondria respond to estrogen treatment by altering gene expression in an ERβ-dependent fashion.

Other putative functions for the mitochondrial estrogen receptor have focused on protein-protein interactions identified using a bacterial two-hybrid screen. This screen revealed that ERα can interact stably and reproducibly with the mitochondrial protein 17β-hydroxysteroid dehydrogenase type 10, suggesting a role for mitochondrial ERα in regulating cellular steroid metabolism and response [70].

Since cancer is in part a metabolic disease [71], and altered mitochondrial DNA sequence and transcription levels have been observed in both primary tumors and in cancer cell lines [72, 73], a mitochondrial role for the estrogen receptor would be relevant to both estrogen receptor biology and the study of hormone-sensitive breast cancer. As with other nuclear factors, however, the indirect action of the estrogen receptor on mitochondrial gene expression is a confounding factor that complicates investigation. Furthermore, as noted above, estrogen seems to regulate mitochondria directly, even in the absence of its receptor [68]. The disagreement over the role of mitochondrial estrogen receptors could be, in part, due to cell-specific functions. Nevertheless, mitochondria are clearly an important target of estrogen hormone action.

Other transcription factors

Although only a small number of nuclear transcription factors have had a mitochondrial role validated, either in a nucleus-free in organello system or by detection of binding to the mitochondrial genome, there are a number of nuclear transcription factors that have been localized to mitochondria but where the mitochondrial role remains understudied. The glucocorticoid receptor [74], the heterodimeric transcription factor AP-1 [75], and the peroxisome proliferator-activated receptor γ (PPARγ) [76] have all been detected in mammalian mitochondria, and there is some evidence for the glucocorticoid receptor [77] and AP-1 [78] binding to the mitochondrial genome to potentially regulate gene expression.

Part 2: Gene Regulation: Why So Complex?

00:00:01.03 My name is Bob Tjian,
00:00:02.19 I'm a professor at the University of California at Berkeley,
00:00:06.19 where I've taught many years in Molecular Biology and Biochemistry,
00:00:11.02 and more recently, I've also taken on the job of being the President of the Howard Hughes Medical Institute.
00:00:16.26 And it's my pleasure today to continue with my second lecture in this series,
00:00:22.29 to describe to you some exciting ideas about how gene regulation works,
00:00:30.25 particularly in more complex organisms.
00:00:34.21 Now, in my last set of lectures, I left you with this view of the type of complexity that has to be evolving
00:00:48.26 to allow the type of gene expression patterns that we see in the many, many organisms that we know exist on this planet.
00:01:00.29 And so, there's some really intriguing questions that I'm going to address in this second lecture.
00:01:09.15 And one thing I left you with was an image of the interplay of many molecules that have to come together,
00:01:18.26 and to land on a particular site of the DNA molecule that's part of the chromosome of an organism
00:01:24.25 or within a cell of an organism, and how this process might work.
00:01:31.04 But, I think the question that's plagued us for decades,
00:01:35.20 now that we had a better idea of what this molecular machinery looks like that's involved in decoding DNA information into gene expression,
00:01:49.07 we wondered why is it so complex?
00:01:53.20 And to sort of begin to address this issue, let me just take you back to a simple concept.
00:02:01.10 And you remember that different organisms have different sizes of their genomes,
00:02:08.00 that is, the amount of DNA that is required to encode the particular organism.
00:02:14.23 And here are some examples of both bacteria, simple, single-celled prokaryotic organisms,
00:02:22.18 as well as single-cell eukaryotic organisms like the baker's yeast, and then there's the little, round soil worm C. elegans,
00:02:30.25 and then you can go up to up mammals and vertebrates.
00:02:33.29 And you'll see first of all that the amount of DNA can vary a lot from a few million base pairs
00:02:40.17 all the way up to 3 billion base pairs or more.
00:02:45.00 To go along with this sort of expanding level of DNA and chromosome length, you also have different levels of genes.
00:02:54.13 Now, you'll notice that the range of genes is a lot less than the range of DNA length,
00:03:00.20 so this partly informs us about maybe why we need the complexity that we ultimately discovered is involved
00:03:10.26 in forming this molecular machinery that's responsible for reading the genetic information.
00:03:17.22 So this is just a little table to reemphasize that these more complex genomes, which also means more complex organisms,
00:03:28.15 which really means a lot of different cell types, many different behaviors, complex interactions with their environment and so forth,
00:03:38.24 how is all this information really decoded from our genomes?
00:03:43.03 And on one side here, you see the prokaryotic core gene regulatory machinery,
00:03:50.07 or the core transcription machinery, and in almost all bacteria,
00:03:54.20 it's only a few polypeptides. 5, 6, 7 polypeptides.
00:04:00.00 Then, on this side, you'll see that the so-called eukaryotic organisms,
00:04:05.00 and particularly when you talk about multicellular metazoan organisms, now you see huge diversity and number of proteins or,
00:04:15.17 as we call, transcription factors, that are necessary to assemble into very large, multi-subunit ensembles
00:04:25.18 that are required to transcribe the 10,000 to 30,000 genes that define these more complex organisms.
00:04:33.21 So, right away you can see that there's this proliferation of the subunits and the machinery and the complexity.
00:04:43.00 So, in this lecture, I'm going to give you a little sense of maybe why this is the case,
00:04:49.11 and what's special about the more complex, multicellular organism,
00:04:55.11 and why this machinery may have to have been more elaborated through evolution, compared to simpler organisms.
00:05:05.07 Now, one of the first things that you realize when you look into the cell,
00:05:10.03 or particularly the nucleus of a higher organism, let's say our own cells, versus a bacteria,
00:05:17.10 is that the DNA, the very molecule that makes up the genetic information, is kind of packaged away in a very different way.
00:05:25.11 So, in all eukaryotes, the double-stranded DNA doesn't sit there in the form that we would call
00:05:33.29 the "naked" DNA, which is shown up at the top here. But rather,
00:05:38.17 this DNA is wrapped up with a set of proteins, very basic proteins, called "nucleosomes,"
00:05:46.00 and these are in turn further packaged all the way to highly condensed form
00:05:52.09 that ultimately forms the chromosomes that you'll be able to see under a microscope.
00:05:57.26 And the blue figures over here and green figures just give you a view of the
00:06:02.29 high-resolution structure of a nucleosome with DNA wrapped around it.
00:06:09.00 So, what is the consequence of having all of our DNA,
00:06:14.04 all our chromosomes, condensed and wrapped up in this way?
00:06:18.09 You can think of it as packaged away.
00:06:20.15 Well, one thing is that you can shove all this down into a small nucleus,
00:06:25.15 so if we strung out our DNA in every cell in our body, out from end to end
00:06:31.20 and stretched it out like a string, it's almost a meter long.
00:06:35.12 And yet, you have to cram all that into a tiny, little volume.
00:06:39.10 And part of the way that that happens is that you can compact the DNA by these structures.
00:06:46.24 Now, the consequence of that is, of course, you somehow have to negotiate
00:06:52.18 through this highly compacted form of DNA to get access to the DNA information and the genes.
00:07:00.12 So, to put it another way, you have to have a machinery,
00:07:04.24 a transcriptional apparatus whose job is to read DNA and, you remember from the first lecture,
00:07:10.27 convert that DNA information into RNA, an intermediate molecule
00:07:14.23 which ultimately then gets translated into a protein product.
00:07:18.19 Well, clearly one of the reasons we have this highly elaborated transcriptional machinery
00:07:25.02 is in part to deal with having to navigate through a chromatin template, as opposed to a naked DNA template.
00:07:33.24 And so there are various proteins and protein complexes that are called
00:07:38.29 "chromatin remodeling complexes," "chromatin modifying complexes,"
00:07:44.12 and these have to coordinate with the transcriptional machinery down here in the yellow and the orange,
00:07:50.01 in order to navigate and basically express a series of interactions
00:07:55.08 that are transactions between the protein machinery and the DNA.
00:08:00.29 So this is a very challenging problem.
00:08:04.10 So that's part of the problem, or part of the reason why we think there's such complexity.
00:08:09.11 So, how did we come to this picture?
00:08:12.05 How did we finally get to figuring out that there were over 85 proteins that all have to assemble on a chromatin template,
00:08:21.16 to give you gene expression and transcription, in the right place, in the right time?
00:08:26.25 And I want to just give you one sort of quick look into a technology that one can use to address the issue of,
00:08:37.03 how do we break down this complex machinery into understandable units?
00:08:42.29 And as I said in the first lecture, there are many tools that molecular biologists
00:08:47.18 and biochemists can use to try to tease out these complex molecular transactions.
00:08:53.18 One of them, of course, is to use genetics, which is to use genetic mutation
00:08:58.29 to either remove or alter one particular gene product and then ask what is the consequence.
00:09:05.06 The other way to do it is to actually take a cell with all of its complexity
00:09:10.02 and break it down literally into its component parts, and then try to put it back together
00:09:14.09 again in a functional form. And that's what I'm going to show you today.
00:09:18.00 And it's a technology I kind of call the "biochemical complementation assay."
00:09:23.03 And it's very simple: You ask, what are the minimal components,
00:09:27.24 for example, in the case of a human gene. what are the minimal protein components of the transcriptional apparatus
00:09:33.29 that you can extract from the nucleus of a cell that you need to put into a test tube
00:09:38.19 that will allow you to essentially reconstruct or, as we say,
00:09:42.19 reconstitute the activity that will allow you to read the gene in an accurate fashion?
00:09:48.19 And you can keep adding or taking away different proteins,
00:09:53.13 the yellow ones, the green ones, the orange ones, and so forth,
00:09:56.24 and ask, does it make any difference?
00:09:59.11 And by playing this adding and subtracting, or "biochemical complementation," assay,
00:10:04.29 you can very quickly discover what are the minimal components you need to activate a gene in a regulated fashion,
00:10:11.12 and what are other things that might be necessary to support this activity.
00:10:16.15 So, the first question that was asked was from the biochemical analysis of about
00:10:24.02 four dozen different proteins: What are really necessary and sufficient?
00:10:30.07 In other words, what's the minimal component set that you need to give you regulated transcription?
00:10:36.29 So we're now asking a more complicated question.
00:10:39.09 Not only what is necessary to just simply give you transcription, in other words the conversion of DNA into RNA,
00:10:45.29 but to do it in a regulated fashion.
00:10:47.27 Because after all, that's what's really interesting.
00:10:50.23 is why one cell does it in one way and a different cell has a different program.
00:10:55.25 And this experiment here says that our sequence-specific classical transcription factor that
00:11:02.00 binds DNA at its regulatory promoter region, together with what we will call the "core" or
00:11:09.07 "basal" machinery of transcription, is necessary but not sufficient.
00:11:14.07 So, plus or minus the activator Sp1 doesn't make any difference,
00:11:19.16 even though we know that in a living cell, Sp1 is highly activating this gene that we're looking at.
00:11:26.16 So, that means there's something missing in this reconstitution experiment.
00:11:31.16 So, how do we go find what's missing?
00:11:35.03 And this biochemical complementation really relies on our ability to take the cells that contain the necessary components
00:11:43.25 and the sufficient components, and then start to extract it
00:11:47.28 and to find which molecules are missing that we're not adding to our reaction yet.
00:11:54.05 And to do that, we basically have to take the cells, in this case, human cells,
00:11:58.28 break the cells apart, extract the nucleus, remove all the proteins from the nucleus,
00:12:04.03 and begin to separate the thousands of different proteins
00:12:08.05 that are in the nucleus into different pools, if you like.
00:12:12.08 And we separate them based on their physical and chemical properties,
00:12:17.19 and some of you probably have had some experience in running column chromatographs.
00:12:23.16 This is basically a way of separating proteins based on their positive charge, negative charge, molecular size,
00:12:32.16 hydrophobicity (in other words, how greasy they are. how well they interact with water), and so forth.
00:12:38.20 So if you do that iteratively, as is shown here in a series of different anion exchange
00:12:45.23 and cation exchange, as well as gel filtration, chromatographs,
00:12:50.12 you can eventually separate the thousands of different components of a nuclear extract into its individual parts.
00:12:59.00 And then you can test each one to see if they're the missing piece.
00:13:03.15 And when you do that, lo and behold, you find that there are a couple of missing pieces
00:13:07.22 that are necessary for you to add back, in other words, reconstitute, the reaction
00:13:13.12 so that now you have regulated transcription.
00:13:15.27 So unlike the previous data that I showed you,
00:13:19.14 now you can see that the machinery is more complex and, most importantly,
00:13:24.23 you can see also that the machinery is now responsive to the activator.
00:13:29.19 So, the signal with the activator, plus Sp1, is much darker than in the signal without Sp1.
00:13:36.13 That means that there is activated transcription that is Sp1-, a classical transcription factor, dependent.
00:13:43.11 So that allowed us to identify two very important, key components that we didn't know about before we did this experiment:
00:13:51.29 One is a multi-subunit complex called the "Transcription factor II D,"
00:13:57.25 and the other one is called the Mediator complex.
00:14:01.01 And these turn out to actually define an entirely new class of transcription factors, which are the so-called co-factors.
00:14:09.27 So I'm going to tell you a little bit more about one of these co-factors,
00:14:13.15 because they both really perform similar functions,
00:14:16.13 but we happen to know quite a bit more about one of them than the other.
00:14:20.17 So this so-called TFIID complex has roughly 15 subunits, in other words,
00:14:25.28 15 separate proteins that have to mesh together to form a complex.
00:14:31.22 And it's a very large macromolecule, so it's a million daltons.
00:14:36.23 that's a very, very large, floppy molecule, with many pieces to it.
00:14:41.12 One of its functions you already know about,
00:14:43.16 because it contains as one of its subunits the so-called "TATA-binding protein."
00:14:48.12 That's that saddle-shaped molecule that binds to double-stranded DNA,
00:14:55.02 at the AT-rich sequence called a TATA box,
00:14:58.06 which is associated with many genes in animal cells.
00:15:02.09 But what we've come to learn in the last decade or so is that this little complex
00:15:07.22 is doing much more than just simply binding to the TATA box
00:15:11.18 it's doing a whole bunch of other things that we didn't have any idea about.
00:15:15.12 And now that we knew the existence of this activity and that it was critical not only for TATA binding,
00:15:22.08 but also for mediating or potentiating transcription activation, we then could break down
00:15:28.04 more of its functions of individual subunits, because you remember there's 15 different polypeptides here.
00:15:34.09 And this is just a little summary showing you that this complex of proteins
00:15:38.11 is doing a lot of different functions.
00:15:41.18 It's recognizing the nucleosomes, which have a basic protein called a "histone,"
00:15:49.26 and so it recognizes histones only when it's got
00:15:53.09 a certain chemical modification called an acetylation event.
00:15:59.12 This big orange complex also itself has enzymatic activity, including kinase activity,
00:16:06.06 which can put phosphate groups on other proteins and enzymes.
00:16:10.04 It has acetylase activity, and of course, it has to interact directly
00:16:15.05 with activators in order to potentiate their function in turning on transcriptional activation.
00:16:22.07 And I'm probably safe in speculating that
00:16:26.20 there are yet unknown functions of this large complex that we still have to discover,
00:16:32.23 because we've really only understood maybe half of the subunits, and even there,
00:16:37.25 only partially understood the functions of that half of the subunits that are part of this complex.
00:16:44.05 So, there's clearly much more work to be done, but I think what's clear from these experiments
00:16:48.25 is that these proteins are doing a lot more than just binding DNA.
00:16:53.08 They're what I would think of as integrators of information.
00:16:58.01 So, this integrator of information means that this structure and the function is very complex, and so,
00:17:04.27 one of the things that we've had to do.
00:17:08.00 it's been a very challenging problem that remains challenging,
00:17:11.21 because we haven't solved all the technical problems.
00:17:13.18 is that because it's a large, megadalton, floppy molecule, solving the three-dimensional
00:17:19.25 structure of such large assemblies has proven to be rather technically challenging.
00:17:26.08 And we have to use many different techniques to try to address this in:
00:17:31.28 X-ray crystallography, NMR.
00:17:34.07 but one of the techniques that's emerging, that's very, very powerful
00:17:38.09 for solving the structures of these large assemblies is something called "cryo-electron microscopy."
00:17:46.24 It's basically a way of freezing these large assemblies in place,
00:17:52.18 and then solving their structure by microscopy.
00:17:55.29 And this is just about a 25 angstrom, so relatively low-resolution structural determination,
00:18:03.17 of the human TFIID complex and, most importantly,
00:18:08.24 its relationship to two other transcription factors that are part of the assembly
00:18:14.03 that has to align itself up on the promoter to start transcription,
00:18:18.08 and that's the other two transcription factors TFIIA and B, which are shown in green and purple here.
00:18:24.13 So you can slowly start building up the entire complex in pretty accurate three-dimensional space
00:18:33.00 to figure out what its shape will inform us about its function,
00:18:37.15 and that's something that's an ongoing project in many laboratories in molecular biology.
00:18:43.01 So, this cartoon. and again I want to emphasize that
00:18:46.27 all the figures there and the colored blobs are more a part of our imagination at this point,
00:18:53.16 although, as I just showed you, we actually have real structures of some components of this pre-initiation complex.
00:19:02.25 This slide just emphasizes the point that there's a lot of information integration going on,
00:19:09.22 and that there is protein-protein and protein-nucleic acid interactions
00:19:15.06 that are critical for the regulatory functions of these large, macromolecular assemblies.
00:19:21.02 And this also reminds you that there are at least three separate classes of transcription factors
00:19:27.17 that are playing a key role in the regulation of genes:
00:19:30.28 the classical activator and repressor that are sequence-specific DNA-binding proteins,
00:19:35.26 like the Sp1 protein I talked to you about earlier, just shown here in pink
00:19:41.11 there are the components of the core machinery, which are shown in yellow
00:19:45.27 and then you have these things we call co-factors or co-activators,
00:19:49.12 that are integrating information between the activators and the core machinery.
00:19:56.06 So this kind of gives you a slightly better view of why there's this kind of complexity,
00:20:04.06 but it still doesn't really address all of the issues with respect to:
00:20:09.22 Why do you need 85 proteins to do this?
00:20:12.09 So, let me dig a little deeper into this.
00:20:15.00 So, first, let me just pose some of the questions that are really still largely unresolved in the field,
00:20:21.14 even though this is a pretty mature area of study
00:20:24.10 we've been trying to address these issues for a couple of decades,
00:20:28.14 and it goes to show how difficult it is to really tease apart this complex molecular machinery.
00:20:34.23 And I should say that the complexity of this machinery is not unique
00:20:38.19 to the transcriptional apparatus. Many other biological processes are also dependent on
00:20:43.23 macromolecular machines that are very similar in complexity to this one.
00:20:49.04 So I think things that we learn about the transcriptional machinery could be applied in principle to many other machineries.
00:20:56.20 So, couple of interesting questions:
00:21:00.20 What are the transcriptional mechanisms that regulate complex cell types?
00:21:06.29 Because, after all, multicellular organisms evolved to having many, many different
00:21:13.12 cell types, so our bodies are made up of many different cell types,
00:21:18.23 which means that each cell's performing a different function.
00:21:22.02 Our hair follicle cells are producing hair, our red blood cells are
00:21:27.02 producing hemoglobin and doing something else, our skin cells are protecting us.
00:21:31.17 Each cell type is doing a different thing, so how does this happen,
00:21:36.01 how do we generate this diversity of cell types through the gene regulatory networks?
00:21:42.17 And then, knowing what we now know about the first level of complexity of the machinery
00:21:49.04 that's responsible for decoding this information, what more can we learn about the process of regulation now?
00:21:57.03 Particularly, what is the division of labor between the core machinery
00:22:03.24 (which binds to the promoter), the activators, and the co-activators?
00:22:08.22 So, what is their relationship, and what's their respective roles in defining cell type-specific gene expression?
00:22:16.19 That's really the last topic that I want to cover in this lecture.
00:22:21.06 So, let's review a few basic facts about individual cell types.
00:22:26.13 So, let's take two well-recognized cell types: fat cells and muscle cells.
00:22:33.12 Very different cells that perform very different functions,
00:22:36.27 but every cell in a particular organism has the same genetic information.
00:22:43.04 It has the same DNA, it has the same set of chromosomes.
00:22:46.08 That means that these two cells have to be using different parts of the information
00:22:52.09 from the genome to give it their distinct identities.
00:22:56.20 So, each cell must only express some subset of the genes,
00:23:03.21 and that particular subset would define the function of a fat cell versus a muscle cell.
00:23:10.11 And, so then the question becomes:
00:23:12.26 Okay, that makes sense, but how do you get there?
00:23:15.02 How do you get cell type-dependent differential gene expression patterns?
00:23:20.16 How do you turn on the right genes to make fat
00:23:22.27 versus keeping the muscle cell gene functions turned off, and vice versa?
00:23:29.06 So that is a fundamental question of trying to understand the process of cellular differentiation,
00:23:36.19 cell-specific function, and really, developmental biology.
00:23:41.09 Another set of interesting points to make is that, of the 20,000 to 30,000 genes
00:23:46.18 that a typical metazoan organism encodes, a pretty big chunk of it is devoted
00:23:54.03 to the very machinery that I'm talking about, in other words, the transcription factors.
00:23:59.04 So roughly somewhere between 5 and 10% of the entire coding capacity
00:24:04.02 of genes in a genome is devoted to encoding transcription factors.
00:24:10.11 So this is clearly a very important class of molecules.
00:24:13.02 So that means there are several thousand transcription factors.
00:24:16.28 But now if you start thinking about the many, many thousands of cell types and the behavior of different cells,
00:24:23.16 are a few thousand transcription factors, in and of themselves, enough to generate the diversity of function?
00:24:31.14 And this is where we have to start thinking about,
00:24:33.21 how do you create really large numbers of distinct transcriptional networks?
00:24:40.28 And they really are networks, as you'll see in a minute.
00:24:43.15 And one thing that became clear as we defined what genes look like and what a promoter as a transcriptional unit looks like,
00:24:51.22 we come to understand that the only way to create the kind of huge levels of diversity of distinct transcriptional
00:24:58.20 components and patterns, is to do it by combinatorial regulation.
00:25:03.12 And what do I mean by that?
00:25:04.27 So, one way to think about it is that you might only have ten cards,
00:25:09.25 but if you shuffle those ten cards and pick four at a time,
00:25:13.06 you can have many, many combinations.
00:25:15.11 So here's a perfect example of three different cell types, could be in the same organism,
00:25:20.11 and each of those symbols represents binding sites,
00:25:25.22 and then the little boxes and triangles above them represent the binding proteins.
00:25:32.18 And you can see that those three cell types might express these sets of genes in similar ways,
00:25:38.25 but they use different combinations of proteins to do it.
00:25:42.08 And this is really the notion of combinatorial mechanisms for gene regulation,
00:25:46.27 and we now know that that is indeed the way, at least in part,
00:25:51.05 that gives us the ability to create many different specific transcription patterns.
00:26:00.04 I have to now also tell you about another, I would say, defining,
00:26:04.23 unusual property of transcription in animal cells,
00:26:09.06 and this is a hard one sometimes to get your head around.
00:26:12.19 And that is that these different little units of DNA that specify the activity of a gene
00:26:18.20 don't have to be sitting, linearly and spatially, directly next to the gene that it's activating or repressing.
00:26:26.21 They can sit tens of thousands of base pairs away from the site.
00:26:32.04 So these we call long-distance enhancers or silencers, so they can both upregulate a gene.
00:26:39.01 in other words, make more of the gene or less or the gene.
00:26:41.27 And the thing that was so surprising was that the intervening DNA can be very, very long
00:26:48.04 it can be thousands and maybe even millions of base pairs.
00:26:53.00 So how does this work?
00:26:53.28 How can something sit so far away actually influence transcription at a very remote site?
00:27:00.26 And this is one of the big conundrums that we still face in the field.
00:27:05.15 We have some models and we have some ideas that we can test,
00:27:08.09 and I'll end my lecture with a few speculations about that.
00:27:11.24 But clearly, we don't fully understand this so-called long-distance regulation,
00:27:16.27 which clearly is regulated by activators and repressors just like
00:27:21.09 the same players that we've been talking about, like the Sp1 molecule and other activators.
00:27:26.12 But yet, how they can reach across long distances of the chromosome to grab on to the core machinery to actually impart information
00:27:36.06 and to create the kind of specific regulatory events is still somewhat obscure.
00:27:43.28 So, another thing that I should say is that,
00:27:47.07 because of the combinatorial mechanisms of generating diversity was so dependent
00:27:54.18 on the distinct sets of sequence-specific DNA-binding proteins,
00:28:00.12 over the last two decades we've come to kind of a traditional model that the core machinery stays relatively invariant.
00:28:10.04 In fact, we kind of think of it as universal, because if you break open a nucleus of a very
00:28:15.11 simple organism like yeast, or you break open the nucleus of a human cell,
00:28:21.22 that machinery looks remarkably similar to each other.
00:28:24.26 And yet, their gene networks are very, very different, so we thought,
00:28:29.05 well, maybe it's all having to do with the sequence-specific DNA-binding proteins,
00:28:34.07 that will generate the diversity through combinatorial regulation.
00:28:38.25 And that's probably true in fact, there's a lot of evidence to support that.
00:28:42.27 But it was only part of the story.
00:28:45.04 So, a kind of related question would be:
00:28:47.25 Are we really right in thinking that the core machinery is universal and invariant?
00:28:54.03 And that turns out to be an oversimplification.
00:28:57.06 So it turns out evolution didn't work that way.
00:29:02.26 And when we looked very carefully in the last few years, particularly at individual,
00:29:07.02 different, distinct cell types, let's say muscle versus fat, or neuron, or liver cell,
00:29:13.01 we certainly see differences in the activators, as we would expect, and indeed they are working in combinatorial fashion,
00:29:20.02 but they're not only working combinatorially with each other,
00:29:23.06 but they are combining in different combinations with the core machinery, which is itself variable.
00:29:29.23 And that was kind of a revelation that's really become more clear just in the last few years.
00:29:35.29 So, in addition to the sequence-specific binding proteins and their diversity,
00:29:41.10 there turns out to be a much greater degree of diversity in the core machinery,
00:29:46.09 the parts that we thought were invariant, than we ever imagined.
00:29:50.11 Now, once you realize that that's the case,
00:29:54.07 that opens up a whole other level of generating diversity that we didn't anticipate,
00:29:59.13 and that of course really allows multicellular organisms to diversify in unbelievable ways.
00:30:06.18 So, let's drill down finally a little bit at how did we find this out, and where are we going?
00:30:13.08 So now, unlike a few decades ago when we first began to study the process of transcription
00:30:20.02 and discovered all of this initial complexity, in those days we mainly worked on just a few different cell types.
00:30:28.13 But today, we have the ability technically to work with just about any cell type,
00:30:33.18 from the most complex, such as embryonic stem cells,
00:30:37.13 to perhaps the simple cell, like the skeletal muscle, and everything in between.
00:30:41.18 liver cells, neuronal cells, and so forth.
00:30:45.03 And this has really opened up our view of just how diverse, interesting,
00:30:51.17 and variable the transcriptional apparatus is, that is probably really necessary
00:30:56.18 from an evolutionary standpoint to drive the diversity of gene expression and cell types that we see.
00:31:04.10 The first hint that this core machinery that we thought was so invariant may not be so invariant,
00:31:10.14 came from studying the development of the skeletal muscle.
00:31:14.21 So when you go from a precursor cell called a myoblast, which looks like most every other mammalian cell,
00:31:21.13 with its standard, prototypic core machinery, and then when you look at it when that cell type differentiates, in other words,
00:31:29.16 specializes into a myotube (which will ultimately form skeletal muscle, which is the muscle around your large bones
00:31:37.03 that makes you be able to move),
00:31:39.16 it turns out that it not only shifts which transcriptional activators it uses,
00:31:45.05 but it also jettisons the prototypic core machinery and substitutes it with some modified versions of that core machinery,
00:31:53.21 which is shown down here in the purple and the bright blue.
00:31:57.29 So this was really a change in the paradigm of the way we're thinking about regulation,
00:32:03.08 and of course, this was just the first example.
00:32:07.08 One wanted to know if similar things were happening in other different cell types, and very quickly,
00:32:13.04 if you look at hepatocytes or liver cells, if you look at adipocytes or fat cells,
00:32:18.21 if you look at neuronal cells, and you compare what's going on in muscle,
00:32:23.07 in every case, one can find changes in the core machinery, either because a particular component
00:32:28.28 like one of the TBP-associated factors is highly upregulated (that means its concentration went way up,
00:32:35.15 when all the other ones went down), or some other permutation.
00:32:39.09 In other words, clearly, components of the so-called core machinery were variable from cell type to cell type,
00:32:46.20 and that really changed the way we thought about how regulation of multicellular organisms works.
00:32:54.28 At the same time that we were looking at these,
00:32:57.17 what we would call mature, terminally differentiated cell types,
00:33:01.20 we were also looking at perhaps one of the most interesting cell types that we could study,
00:33:06.18 particularly if we're interested in understanding the process of mammalian development,
00:33:11.23 and those are of course the embryonic stem cells.
00:33:14.14 These are those amazing cells that, when tickled with just the right chemicals or physiological signals,
00:33:21.08 can turn themselves into every cell type of an organism, maybe 10,000 different cell types.
00:33:31.06 So, this so-called pluripotency made these human and mouse embryonic stem cells very special for all kinds of reasons,
00:33:41.05 partly because they are amazing models to study this process of development and differentiation,
00:33:46.18 but partly because of biomedical possibilities for cell regeneration and therapeutics.
00:33:57.02 So we've studied this, and these are very, very new studies,
00:34:01.27 and I'll just very quickly touch on it. We really were curious,
00:34:05.23 how can these cells be so pluripotent?
00:34:09.08 That is, their capacity to turn into every other cell type seems so amazing, what is the mechanism,
00:34:15.07 what's the machinery that's going to allow these cells to be able to differentiate into every cell type in the body?
00:34:22.16 And so, we began to probe this.
00:34:24.27 In some cases, we did it by the genetic technology, which is we made
00:34:29.08 mutations in certain candidate regulatory factors and transcription factors,
00:34:34.10 and then asked, does that have a consequence on the development of different cell types?
00:34:41.10 In other cases, we used a standard biochemical complementation technology to figure out what's going on.
00:34:47.21 So, I'll finish with two quick stories.
00:34:51.06 So, using the genetic tools of knocking genes out and asking
00:34:55.02 what effect it has on differentiation and pluripotency, we discovered that
00:35:01.15 a component of the core machinery (or at least we used to think of it as being purely of the core machinery),
00:35:07.07 that is, one of the TBP-associated factors, particularly TAF3,
00:35:11.26 turns out to be extremely important for the regulation and
00:35:15.23 expression of genes that will ultimately define the so-called endoderm.
00:35:23.17 And that's true for both the so-called primitive endoderm and the definitive endoderm,
00:35:28.04 which ultimately will give rise to the placenta, the yolk sac, lungs, liver,
00:35:32.12 pancreas, intestines, and so forth.
00:35:34.17 At the same time, knocking out this TAF3 had the opposite effect on the other two major germ layers,
00:35:42.09 which are the mesoderm and the ectoderm.
00:35:44.13 So here was a really beautiful case of differential function of a transcription factor
00:35:50.28 that was not a standard sequence-specific binding protein.
00:35:55.09 This core machinery factor, which by the way, probably on its own doesn't even bind to DNA directly,
00:36:01.20 when you knock it out, you lose the ability to form endoderm,
00:36:05.22 but you elevate the probabilities of forming mesoderm and ectoderm.
00:36:09.18 In other words, the balance between these different cell types gets messed up,
00:36:13.20 and of course this will cause major difficulties for a developing embryo.
00:36:21.16 Even more interesting and intriguing, and this really goes to show the level of information that we still lack,
00:36:28.15 although TAF3 was originally defined both genetically
00:36:32.04 and biochemically as part of the TFIID core promoter recognition complex,
00:36:37.02 and it is absolutely true that that is the case,
00:36:40.04 it had another life that it led that we didn't know about.
00:36:43.17 So TAF3, it turns out, it doesn't have to strictly function as part of this large multi-subunit core promoter complex,
00:36:52.17 but it can also do other jobs, and in this case, it pairs up, or partners up,
00:36:57.06 with a different transcription factor called CTCF (doesn't really matter what the name is)
00:37:03.02 and now it does its job in a completely different way.
00:37:06.16 And in fact, the most recent experiments suggest that TAF3 and CTCF get together
00:37:12.03 to partly allow that amazing property of long-distance regulation.
00:37:17.25 So, regulators bound at thousands of base pairs away from the site of activity
00:37:24.21 can be brought together in three-dimensional space by what's known as "DNA looping,"
00:37:31.04 and it turns out that TAF3 is involved in this DNA looping, together with a whole bunch of other proteins,
00:37:38.00 whose relationship to TAF3 is still not entirely clear.
00:37:45.02 And we find it particularly intriguing and exciting that this type of long-distance function is being
00:37:50.06 carried by a TAF and in the context of embryonic stem cell differentiation potential to form endoderm.
00:37:57.19 So this is a very, very new type of way of thinking about the core transcription factors.
00:38:06.11 Likewise, when we looked at the embryonic stem cell transcriptional circuitry and asked,
00:38:13.20 what other transcriptional co-regulators, or regulators and co-factors,
00:38:17.18 are necessary to allow this so-called pluripotency program?
00:38:23.01 This amazing ability of these cells to be able to differentiate into every other cell type,
00:38:27.03 how does that happen? What is allowing that to happen
00:38:30.23 in this particular cell type, and not in other cell types?
00:38:33.23 And again, using the biochemical complementation technology,
00:38:38.00 we recently were able to identify a new co-factor complex, again a multi-subunit complex,
00:38:45.15 called the SCC, or "stem cell co-factor."
00:38:49.25 And remarkably, this SCC-B turns out to be a well-known protein that again had a different lifestyle in other cell types.
00:38:59.18 It's a protein complex that had previously been described as XPC,
00:39:03.29 which stands for "Xeroderma pigmentosum, complex C,"
00:39:08.22 which means that it's involved in DNA repair.
00:39:11.07 So up until now, we thought XPC was only functioning as a DNA repair complex,
00:39:16.14 and now we know that it's doing something quite different,
00:39:19.12 but only in the context of ES cells, which is to form a co-factor complex that will potentiate
00:39:25.27 the activity of two critical transcriptional activators, Oct4 and Sox2,
00:39:31.21 which define the pluripotent, self-renewing state of ES cells.
00:39:36.20 So these are just two examples of sort of what we're learning about,
00:39:41.21 the continuing saga of how transcriptional machinery evolved and works in animal cells.
00:39:50.26 And I'll finish with this last model slide,
00:39:54.00 which just simply reiterates what I just said:
00:39:57.00 We have to keep in mind that, in generating large sets of combinatorial, specific gene networks,
00:40:07.07 we have to use the diversity not only of sequence-specific DNA-binding proteins,
00:40:11.29 but we more and more see examples that components of the previously thought to be invariant core machinery
00:40:19.14 are an integral part of diversifying the combinatorial regulation of gene expression.
00:40:26.18 And this of course opens up many new possibilities,
00:40:30.08 and I suspect that there are many question marks yet about what exactly each of these components
00:40:36.27 is doing to drive complex regulation that gives rise to complexity like human beings,
00:40:44.12 the human brain, all the physiology that goes on.
00:40:48.03 And of course, as we understand these mechanisms in greater detail,
00:40:51.23 I think we have a much better chance of tackling the problems of human disease and diseases of other organisms.
00:40:59.10 Because ultimately, we have to understand the molecular basis of disease,
00:41:03.21 and I think a big part of that is understanding the mechanisms of gene regulation.

  • Part 1: Gene Regulation: An Introduction

Post-Translational Control of Gene Expression

RNA is transcribed, but must be processed into a mature form before translation can begin. This processing after an RNA molecule has been transcribed, but before it is translated into a protein, is called post-transcriptional modification. As with the epigenetic and transcriptional stages of processing, this post-transcriptional step can also be regulated to control gene expression in the cell. If the RNA is not processed, shuttled, or translated, then no protein will be synthesized.

RNA splicing, the first stage of post-transcriptional control

In eukaryotic cells, the RNA transcript often contains regions, called introns, that are removed prior to translation. The regions of RNA that code for protein are called exons (Figure 1). After an RNA molecule has been transcribed, but prior to its departure from the nucleus to be translated, the RNA is processed and the introns are removed by splicing.

Figure 1. Pre-mRNA can be alternatively spliced to create different proteins.

Alternative RNA Splicing

Figure 2. There are five basic modes of alternative splicing.

In the 1970s, genes were first observed that exhibited alternative RNA splicing. Alternative RNA splicing is a mechanism that allows different protein products to be produced from one gene when different combinations of introns, and sometimes exons, are removed from the transcript (Figure 2). This alternative splicing can be haphazard, but more often it is controlled and acts as a mechanism of gene regulation, with the frequency of different splicing alternatives controlled by the cell as a way to control the production of different protein products in different cells or at different stages of development. Alternative splicing is now understood to be a common mechanism of gene regulation in eukaryotes according to one estimate, 70 percent of genes in humans are expressed as multiple proteins through alternative splicing.

How could alternative splicing evolve? Introns have a beginning and ending recognition sequence it is easy to imagine the failure of the splicing mechanism to identify the end of an intron and instead find the end of the next intron, thus removing two introns and the intervening exon. In fact, there are mechanisms in place to prevent such intron skipping, but mutations are likely to lead to their failure. Such “mistakes” would more than likely produce a nonfunctional protein. Indeed, the cause of many genetic diseases is alternative splicing rather than mutations in a sequence. However, alternative splicing would create a protein variant without the loss of the original protein, opening up possibilities for adaptation of the new variant to new functions. Gene duplication has played an important role in the evolution of new functions in a similar way by providing genes that may evolve without eliminating the original, functional protein.

Visualize how mRNA splicing happens by watching the process in action in this video:

Control of RNA Stability

Before the mRNA leaves the nucleus, it is given two protective “caps” that prevent the end of the strand from degrading during its journey. The 5′ cap, which is placed on the 5′ end of the mRNA, is usually composed of a methylated guanosine triphosphate molecule (GTP). The poly-A tail, which is attached to the 3′ end, is usually composed of a series of adenine nucleotides. Once the RNA is transported to the cytoplasm, the length of time that the RNA resides there can be controlled. Each RNA molecule has a defined lifespan and decays at a specific rate. This rate of decay can influence how much protein is in the cell. If the decay rate is increased, the RNA will not exist in the cytoplasm as long, shortening the time for translation to occur. Conversely, if the rate of decay is decreased, the RNA molecule will reside in the cytoplasm longer and more protein can be translated. This rate of decay is referred to as the RNA stability. If the RNA is stable, it will be detected for longer periods of time in the cytoplasm.

Binding of proteins to the RNA can influence its stability. Proteins, called RNA-binding proteins, or RBPs, can bind to the regions of the RNA just upstream or downstream of the protein-coding region. These regions in the RNA that are not translated into protein are called the untranslated regions, or UTRs. They are not introns (those have been removed in the nucleus). Rather, these are regions that regulate mRNA localization, stability, and protein translation. The region just before the protein-coding region is called the 5′ UTR, whereas the region after the coding region is called the 3′ UTR (Figure 3). The binding of RBPs to these regions can increase or decrease the stability of an RNA molecule, depending on the specific RBP that binds.

Figure 3. The protein-coding region of mRNA is flanked by 5′ and 3′ untranslated regions (UTRs). The presence of RNA-binding proteins at the 5′ or 3′ UTR influences the stability of the RNA molecule.

Nodding on and off: transcription factor cis-elements that regulate nitrate-dependent gene expression for root nodule symbiosis

Kevin L Cox, Jr., Nodding on and off: transcription factor cis-elements that regulate nitrate-dependent gene expression for root nodule symbiosis, The Plant Cell, 2021, koab108,

Nitrogen is a crucial plant macronutrient, and yet often is in short supply in soils in a form, such as nitrate or ammonium, that plants can assimilate. Plants have developed a variety of mechanisms for enhancing nitrogen acquisition. One of the most fascinating and effective solutions is found in legumes, which develop a symbiotic relationship with nitrogen-fixing bacteria, housed inside specialized organs called root nodules, that allows the plants to grow in nitrogen-deficient soils. However, nodule development in nitrogen-sufficient soils can be harmful for plant growth, as carbon is needlessly diverted to nodule growth. Therefore, legumes need to maintain a balance between obtaining nitrogen and limiting the loss of energy during root nodule symbiosis. However, the molecular mechanisms that determine how legumes regulate nodulation in varying nitrate concentrations remain unclear. In this issue, Nishida et al. (2021) show that the cis-elements bound by two transcription factors (TFs) regulate nitrate-dependent gene expression to turn on and off nodulation in Lotus japonicus roots.

NODULE INCEPTION (NIN)-LIKE PROTEIN (NLP) TFs serve as important regulators of nitrate-inducible gene expression in plants ( Konishi and Yanagisawa, 2013). A previous study used a forward genetic screen to identify an NLP in L. japonicus, LjNLP4, as a negative regulator of nodulation in the presence of nitrate ( Nishida et al., 2018). In this study, the authors identified another mutant that continued to form nodules in the presence of high nitrate. Genome sequencing mapped the mutation to LjNLP1, an NLP TF related to LjNLP4. Real-time RT-PCR and transcriptome (RNA-seq) analyses demonstrated the genetic requirement of LjNLP4 and LjNLP1 for nitrate-inducible gene expression. NITRITE REDUCTASE 1 (LjNIR1) and CLAVATA3/ESR-related (CLE)-ROOT SIGNAL 2 (CLE-RS2) had reduced expression after nitrate treatment in Ljnlp4 and Ljnlp1 loss-of-function mutant plants compared with wild-type. Notably, the expression of these two nitrate-inducible genes was more greatly attenuated in the Ljnlp4/Ljnlp1 double mutant than in either single mutant alone. This suggests that LjNLP4 and LjNLP1 have an overlapping function in mediating nitrate-induced gene expression in roots.

To understand how LjNLP4 transcriptionally regulates genes involved in nitrate control of nodulation, the authors combined their RNA-seq data with DNA affinity purification (DAP)-seq to identify LjNLP4 DNA-binding sites in the genome. The analysis identified two conserved DNA sequence motifs that contained semi-palindromic structures. Interestingly, analysis of previously published chromatin immunoprecipitation (ChIP)-seq data revealed that LjNIN, an essential TF that targets genes with positive roles in nodulation, can also bind to these LjNLP4 target cis-elements. Further combinational analyses of the authors’ RNA-seq and the published ChIP-seq data of LjNIN revealed a key finding LjNLP4 did not bind to the cis-elements of some LjNIN-target genes whose expression was upregulated by rhizobia, but downregulated by nitrate, such as LjNF-YB. Using an electrophoretic mobility shift assay (EMSA), the authors examined the similarities and differences of the cis-elements bound by LjNLP4 and LjNIN. Both LjNLP4 and LjNIN bound to cis-elements in the LjNIR1 and CLE-RS2 promoters, whereas only LjNIN bound to the cis-elements in the LjNF-YB promoter. They found that one of the two motifs in the promoter of LjNF-YB has a less perfect palindromic structure, preventing LjNLP4 to bind to the promoter of LjNF-YB and transcriptionally activate it.

Given that TFs form dimers when they bind to cis-elements with palindromic structures, the authors hypothesized that LjNLP4 and LjNIN might dimerize. Size exclusion chromatography coupled with multi-angle scattering analysis demonstrated that LjNLP4 and LjNIN act as homodimers to induce gene expression. The authors performed additional EMSAs to investigate how DNA-binding by LjNLP4 or LjNIN could be influenced by the presence of the other protein. These experiments revealed that LjNIN preferred to form heterodimers with LjNLP4 rather than homodimers with itself. Additionally, the LjNLP4 homodimer occupies the LjNIN-binding sites, reducing the ability of LjNIN to bind to its target genes. Given that LjNLP4 and LjNIN are negative and positive regulators of nitrate-induced effects on nodulation, respectively, the authors predicted that the LjNLP4–LjNIN heterodimer may play an important role in inducing and repressing symbiotic gene expression. Indeed, transactivation assays in L. japonicus protoplasts showed that LjNLP4 interfered with the induction of genes mediated by LjNIN and that the differences in cis-elements targeted by these two TFs affect nitrate-dependent gene expression (see Figure 1).

Model for regulation of root nodule symbiosis gene expression in L. japonicus. A proposed model of how LjNLP4 (purple ellipses) and LjNIN (yellow rectangles) regulate gene expression for nodulation in a nitrate-dependent manner. In the absence of nitrate (upper panel), LjNIN homodimers bind to cis-elements of positive regulators for nodulation, outcompeting the negative regulators and promoting nodulation. In the presence of nitrate (lower panel), LjNIN forms heterodimers with LjNLP4. LjNLP4 homodimers and LjNLP4–LjNIN heterodimers can only bind to the most conserved motif (orange arrow), outcompeting the positive regulators and ultimately inhibiting nodulation. Adapted from Nishida et al. (2021, Figure 9).

Model for regulation of root nodule symbiosis gene expression in L. japonicus. A proposed model of how LjNLP4 (purple ellipses) and LjNIN (yellow rectangles) regulate gene expression for nodulation in a nitrate-dependent manner. In the absence of nitrate (upper panel), LjNIN homodimers bind to cis-elements of positive regulators for nodulation, outcompeting the negative regulators and promoting nodulation. In the presence of nitrate (lower panel), LjNIN forms heterodimers with LjNLP4. LjNLP4 homodimers and LjNLP4–LjNIN heterodimers can only bind to the most conserved motif (orange arrow), outcompeting the positive regulators and ultimately inhibiting nodulation. Adapted from Nishida et al. (2021, Figure 9).

This study advances our understanding in how legumes regulate nodulation. Through the use of TFs, plants can “nod on and off” depending on the concentration of nitrate in the environment.


Pioneer factors are among the master regulators of cell fate. They function by initiating chromatin targeting events on nucleosomal DNA, typically in low signal chromatin regions where the presence of linker histones represses transcription. The local exposure of chromatin brought about by pioneer factor binding allows other, non-pioneer transcription factors to access nucleosomal DNA, which in turn drives lineage-specific gene expression and selection of cell fate. The ability of pioneer factors to target silent genes and allow other factors to bind provides a mechanistic explanation for the long-standing phenomenon of developmental competence, in which a tissue gains the potential to execute a cell fate decision. However, pioneer factors do not occupy all in silico target sites in the genome they are actively excluded from heterochromatic domains spanned by H3K9me2/3 (type R chromatin) (Lupien et al., 2008 Soufi et al., 2012), among others. By presenting a barrier to factor binding, heterochromatic, repressive domains provide a means for cells to stably retain their fate (Becker et al., 2016). We suggest that a possible reason for cell conversions being typically of a low efficiency and failing to shut off their initial genetic program (Cahan et al., 2014) may relate to the inefficiency of reprogramming factors in engaging with heterochromatic domains that span genes for which expression is required for the desired cell type. It is possible to alter chromatin state broadly by applying small molecules to target chromatin-modifying enzymes, but such changes will occur globally throughout the genome. We speculate that understanding how cell type-specific heterochromatic domains are established and how pioneer factors can overcome such barriers during development will provide more targeted ways to manipulate cell fate in health and disease. More broadly, further work in the field should be aimed at understanding how different pioneer factors target silent, low signal-state chromatin and how heterochromatic features at highly repressed chromatin might block pioneer factor binding. These detailed mechanistic insights will pave the way for the future ability to program and reprogram cell fates at will.

Key points

TGFβ released and activated by malignant and non-cancer cells within the tumour microenvironment (TME) promotes cancer progression through highly regulated and differential effects on multiple cell types.

Enhanced TGFβ signalling promotes cancer cell invasion, dissemination and stem cell properties, and suppresses the sensitivity to anticancer drugs.

TGFβ shapes the TME through effects on cancer-associated fibroblasts, endothelial cells and pericytes, tumour architecture and suppression of protective immune cell functions.

TGFβ signalling in the TME represses the antitumour functions of various immune cell populations, including T cells and natural killer cells the resulting immune suppression severely limits the efficacy of immune-checkpoint inhibitors and other immunotherapeutic approaches.

Inhibition of TGFβ signalling is currently being evaluated in multiple clinical trials as a major avenue to enhance the efficacy of cancer immunotherapies but systemic adverse effects and the therapeutic index need careful consideration.

Diverse approaches towards more selective inhibition of TGFβ activation or signalling are being actively pursued in order to enhance effectiveness and reduce the toxicity of a diverse array of cancer immunotherapies.

This work was supported in part by the National Institutes of Health (grant R35GM131819 and U54-GM114838 to SS), the CompGen Initiative at UIUC (CompGen fellowship to SG), and the Mayo Clinic Center for Biomedical Discovery (Discovery Science Award to SMO). The funding agencies did not play any role in the design of the study collection, analysis, and interpretation of the data and writing of the manuscript.


Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA

Department of Genetics, Stanford University, Stanford, USA

Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Gonda 19-476, 200 First St SW, Rochester, MN, 55905, USA

Remington E. Schmidt, Kelly J. Bouchonville & Steven M. Offer

Department of Computer Science, Carl R. Woese Institute of Genomic Biology, and Cancer Center of Illinois, University of Illinois at Urbana-Champaign, 2122, Siebel Center, 201 N. Goodwin Ave., Urbana, IL, 61801, USA

Watch the video: Transcription Jobs for Beginners: The Complete Guide to Becoming a Paid Transcriber in 2021 (January 2022).