Information

How often do retrotransposons jump in human cells?


If I followed the life of a typical cell in my body, how many retrotransposon jumps would I observe per day? I'm only interested in a rough order-of-magnitude estimate: should I think of it as a common event that happens basically all the time in most of my cells, or is it a rare event that happens only occasionally in my body?


For just rough order-of-magnitude, try this reference. Their experimental design seems to maximize transpositions so it should be a reasonable upper bound.

There is a lot of variability, but some retrotransposons jumped in ~1000 cells per million, others only a few per million such that they were difficult to detect, after about 1 month in HeLa cells.

The same reference gives an estimate of 10,000 retrotransposon elements in a mammalian genome, though, again, their transposition frequencies vary considerably.


Super-effective 'Jumping Gene' Created

Johns Hopkins scientists have transformed a common "jumping gene" found in the human genome into one that moves hundreds of times more often than normal in mouse and human cells.

Writing in the May 20 issue of Nature, the scientists say their artificial jumping gene sets the stage for creating mice that lack -- at random -- at least one gene, without having to know in advance which gene is being "knocked-out." Such random knock-outs have been critical in studying genetics of other critters and will help shed light on jumping genes' effects -- past and present -- in human health and disease, say the researchers.

"Making this synthetic jumping gene was the home-run experiment we never thought was going to work," says Jef Boeke, Ph.D., professor of molecular biology and genetics and director of the High Throughput Biology Center in Hopkins' Institute for Basic Biomedical Sciences.

Jumping genes, aka retrotransposons, are bits of genetic material that copy themselves and move around in creatures' genomes. They have the potential to disrupt the genes they "land" in and are thought to contribute to the gradual --and perhaps the occasional major -- genetic shifts that drive evolution. While organisms like yeast have just a few dozen jumping genes in their genomes, mammals' genomes contain hundreds of thousands of copies of their jumping genes' DNA, making it difficult to know where or when -- or even if -- a jump has happened.

In a second paper in the same issue of Nature, M.D./Ph.D. candidate Jeffrey Han and Boeke report that the human jumping gene is relatively lethargic because its instructions are hard for cells to read. By replacing some of the gene's instructions with alternatives cells prefer, the researchers made the first highly active, artificial jumping gene that is potentially efficient enough to use in mice.

By inserting the artificial jumping gene into cells of a mouse embryo, scientists should be able to develop mice in which random genes are either silenced completely or simply "quieted" by the jumping gene's intrusion. Studying these mice should reveal the function and identity of the disrupted gene -- in that order.

"The ability to study genetics "backward" -- to disrupt a random gene, determine its role in the animal and then identify it -- has been crucial in understanding genes in fruit flies and yeast, and now we should be able to do it fairly efficiently in mice," says Boeke, whose lab is already starting to develop the necessary technology.

Based on his research, Boeke suggests that the genes' DNA is much more than just "junk" in our genome. Instead, he proposes that, long ago, jumping genes' multiple insertions likely played a major role in establishing the evolutionary shifts that now distinguish mice from men, and that even now their DNA affects how other genes are used.

Both mice and men share jumping genes known as L1 retrotransposons, some of which are "young" and still active, but most of which are "rusting hulks" that account for more than 30 percent of our respective DNA. But even the "young" ones aren't active enough to efficiently introduce genetic changes in mice that can be passed from generation to generation.

The reason, the researchers report, is that the human jumping gene's instructions consist of too much of one DNA building block, and not enough of the three others, an imbalance that bogs down the cell's machinery as it tries to transcribe the DNA into RNA. Instead of plodding through, the machinery just gives up.

To "improve" these genetic instructions, Han took advantage of the fact that multiple sets of three DNA building blocks call for the same protein building block, or amino acid. By swapping most of the gene's original DNA "triplets" with alternatives the cell prefers, Han balanced the gene's building blocks without changing its protein-making instructions. The researchers have filed a provisional patent on the artificial gene.

Han found that mammalian cells readily used the artificial gene, translating its genetic information into RNA and then into the proteins that help the gene jump. In a standard test of jumping genes' activity, the artificial gene jumped upwards of 200 times more often than natural jumping genes.

The discoveries have led the researchers to propose a new way retrotransposons could contribute to evolutionary changes and to disease, besides disrupting a gene entirely. The researchers' bottom line: The prevalence of hard-to-read jumping gene DNA inside a gene may subtly alter the extent to which it is used by the cell.

"Roughly 70 percent of human genes contain some bit of the natural jumping gene's DNA, and big genes have multiple scraps or even complete retrotransposons," says Boeke. "These bits quite likely reduce the amount of RNA made from these genes, and in some cases even change the gene's message and alter the gene's protein-encoding regions."

To test their model's likelihood, postdoctoral fellow Suzanne Szak, Ph.D., surveyed thousands of the most- and least-used human genes. Genes containing more of the jumping gene's hard-to-read DNA were used substantially less than those with shorter and fewer jumping gene bits, she found.

"The presence of jumping gene DNA in genes represents little experiments in the genome -- if the changes benefit or don't harm the cell or the person, they continue to be passed on from cell to cell or generation to generation," adds Boeke. "If not, they would gradually fade from the genome."

The studies were funded by the National Cancer Institute and the Medical Scientist Training Program at Johns Hopkins. Authors on the artificial jumping gene paper are Han and Boeke. Authors reporting the jumping gene's hard-to-read DNA are Han, Szak and Boeke. Szak is now at Biogen Inc.

Story Source:

Materials provided by Johns Hopkins Medical Institutions. Note: Content may be edited for style and length.


Inhibition of ‘jumping genes’ promotes healthy ageing

Bennett Childs is in the Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.

You can also search for this author in PubMed Google Scholar

Jan van Deursen is in the Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.

You can also search for this author in PubMed Google Scholar

Old 1 and diseased 2 tissues often contain cells that have entered a state called senescence, in which they stop dividing and become resistant to death-inducing pathways. These cells secrete a collection of factors, collectively known as the senescence-associated secretory phenotype (SASP), that have inflammatory, protein-degrading and other biologically active properties, and can impair tissue function. There is therefore interest in targeting the SASP to combat age-related diseases. The composition of the SASP varies, and might change over the lifetime of the senescent cell 3 . However, the molecular drivers involved in this evolution are incompletely understood. Writing in Nature, De Cecco et al. 4 identify a key contributor to the ‘late’ SASP: the reactivation of dormant DNA sequences called retrotransposons.

Read the paper: L1 drives IFN in senescent cells and promotes age-associated inflammation

Retrotransposons are often called ‘jumping genes’, because the messenger RNA transcribed from them can undergo a process called reverse transcription to produce an identical DNA sequence that then reinserts into the genome at a different site. Although retrotransposons comprise about 42% of the human genome, most carry mutations that render them functionally inactive 5 . Transcription of those that remain functional must be prevented by protein- or RNA-based regulatory mechanisms to prevent the jumping of retrotransposons, which can cause either genetic mutations or genomic instability and might lead to cancer 6 . However, retrotransposons can be reactivated during ageing 7 .

De Cecco et al. found that one type of retrotransposon, LINE-1, was highly activated in senescent human cells within 16 weeks after they had stopped dividing — a stage the authors term late senescence. The group showed that high levels of the transcriptional repressor protein RB1 and low levels of the transcriptional activator protein FOXA1 normally keep LINE-1 in check. These proteins are abnormally expressed in late-senescent cells, enabling LINE-1 reactivation (Fig. 1).

Figure 1 | From early to late senescence. Senescent cells have stopped dividing and secrete inflammatory proteins, collectively known as the senescence-associated secretory phenotype (SASP). De Cecco et al. 4 report changes in the SASP over time. a, During early senescence, the expression of DNA sequences called retrotransposons (such as LINE-1) is repressed by low levels of the transcriptional-activator protein FOXA1 and high levels of the transcriptional repressor RB1. The low levels of messenger RNA produced enter the cytoplasm and undergo a process called reverse transcription to produce DNA. This DNA is degraded by the protein TREX1 — as a result, retrotransposons have no effect on the early SASP, which involves the expression and secretion of proteins that include IL-1β. b, In late senescence, RB1 and TREX1 levels decline and FOXA1 levels rise, leading to increased cytoplasmic LINE-1 DNA. The DNA is sensed by a pathway involving the proteins cGAS and STING, leading to transcription of IFN genes that encode the proteins interferon-α and interferon-β. These interferon proteins contribute to the late SASP, and support the expression of early SASP factors.

At this late stage, the SASP is known to include two related inflammatory proteins called interferon-α and interferon-β. This signalling protein is part of an ancient antiviral mechanism called the cGAS–STING pathway, which is activated by the presence of DNA in the cell cytoplasm. When viral DNA is present in the cellular cytoplasm, the cGAS–STING system triggers the production of interferon proteins and related proteins that together drive an infected cell down a cell-death pathway called apoptosis, preventing the spread of infection. The cGAS–STING pathway has previously been linked to senescence 8 , 9 — cytoplasmic DNA accumulates in senescent cells because they produce abnormally low levels of the DNA-digesting enzyme TREX1 10 . However, the source of the DNA that accumulates in the cytoplasm of senescent cells has not been completely clear.

Because retrotransposons were originally derived from ancient viruses, they can activate cGAS–STING 11 . De Cecco et al. showed that the cytoplasmic DNA in senescent cells is produced, at least in part, by reactivated LINE-1 elements. The authors confirmed that abnormally low levels of TREX1 permit LINE-1-derived DNA to accumulate in the cytoplasm in late senescence. If they blocked LINE-1 transcription using inhibitory RNA molecules, or blocked reverse transcription using the drug lamivudine, the interferon response was not triggered in late senescence. Such LINE-1 inhibition had no effect on the ‘early’ SASP protein IL-1β, or on the cell-cycle arrest associated with senescence, but did cause loss of other SASP factors (including the proteins CCL2, IL-6 and MMP3) late in senescent-cell life. This suggests that the late interferon response is required to sustain the SASP in the long term, but that it is dispensable for the early SASP.

Next, De Cecco et al. showed that retrotransposon transcription promotes the late SASP in vivo in ageing mice. Moreover, by using lamivudine to block the reverse transcription of retrotransposons in mice from 20 to 26 months of age, the authors could prevent the animals from developing several age-related conditions, including degeneration of the blood-filtration system in the kidneys, atrophy of skeletal muscle fibres and hallmarks of chronic inflammation.

In a final set of experiments, the researchers demonstrated that ORF1, a protein encoded by LINE-1 elements, is expressed specifically in senescent cells in aged human skin, but that not all senescent cells express ORF1. Combined with in vivo experiments showing that the expression of LINE-1 peaks later than the expression of other senescence markers in mice, and in vitro data demonstrating that mouse cells can still enter senescence in the presence of lamivudine, these data suggest that LINE-1 reactivation is a consequence, rather than a cause, of senescence.

The implications of this study for human biology are speculative but encouraging. For instance, the importance of the interferon response for killing virus-infected cells raises the possibility that it has a central role in the body’s natural ability to clear senescent cells. However, a more thorough examination of aged or diseased human tissue will be required to determine whether the late-SASP mechanism generally applies to humans.

Senescent cells are rare, even in advanced age 2 , 3 . Nonetheless, eliminating these cells and their SASP prevents age-related declines in health 1 . As a result, strategies for killing or modifying senescent cells (referred to as senolytic and senomorphic approaches, respectively) have received much attention. The first senolytic compounds, which inhibit the anti-apoptosis protein Bcl, are generally accepted as effective 12 . De Cecco and colleagues’ work demonstrates that lamivudine can act as a senomorphic compound.

Because a senomorphic compound would have to be continuously present to suppress the SASP, it might require more-frequent administration than a senolytic compound, which removes senescent cells outright, so that no further treatment is needed until more accumulate. Encouragingly, lamivudine has been used in humans as a long-term antiretroviral therapy without major side effects — unlike other senomorphic compounds such as rapamycin, which blunts the SASP 13 but is a potent immunosuppressant. However, there are as yet no reports that lamivudine improves the healthy human lifespan or any indications that it represses retrotransposons in humans.

One potential risk of senomorphic compounds is the development of cancer, because, in preventing the proliferation of diseased or damaged cells, senescence can have a beneficial, tumour-suppressive role. De Cecco and co-workers’ cell-culture experiments suggest that lamivudine does not disrupt cell-cycle arrest, which is key to this beneficial effect of senescence. However, they monitored lamivudine-treated animals for just six months. Longer-term in vivo follow-up is required to prove that this therapy would not increase the risk of cancer. If this can be confirmed, the current study could open the door to the use of reverse-transcription inhibitors and, perhaps, inhibitors of the cGAS-STING pathway, as a way of combating diseases such as osteoarthritis and atherosclerosis, which have been linked to the accumulation of senescent cells.


B. Composite Transposons: Tn Elements

If a pair of IS elements should lie close to each other, separated by a short stretch of genomic or plasmid DNA, they can transpose together, carrying the DNA between them as part of a composite transposon, or Tn element. If some of the DNA between IS elements in a Tn element contains antibiotic resistance genes, its transposition can carry and spread these genes to other DNA in the cell. Tn elements (like IS elements) are present in low copy number. A generic Tn element is drawn below.

Antibiotic resistance genes have the medical community worried their spread has led to antibiotic-resistant pathogens that cause diseases that are increasingly hard and even impossible to treat. Earlier we saw genetic &lsquotransformation&rsquo of streptococcal cells that pick up virulence genes in DNA from dead cell. We routinely transform cells with plasmids as part of recombinant DNA experiments. But bacteria can transfer plasmid DNA between themselves quite naturally. During bacterial conjugation, an F (fertility) plasmid normally transfers DNA between compatible bacterial mating types (review bacterial conjugation elsewhere in this text for more details). An F plasmid containing a Tn element harboring an antibiotic resistance gene can thus is passed from donor to recipient during conjugation. The Tn element can transpose into to the recipient bacterial genome. In this way, transposition is a major pathway for the transfer and spread of antibiotic resistance.


Retro design

In 2005, with a freshly minted doctorate in molecular genetics, Nels Elde landed a job as a research fellow in Seattle and was tasked with studying the evolution of the immune system of gibbons, a type of ape. Each morning as he biked to the lab downtown, he would pass the city's zoo and hear its gibbons calling to each other. Occasionally, he would visit the zoo and look at them, but he had no idea at the time that the squirrel monkeys that he also saw there would feature so largely in his future research. At work, Elde's primate investigations focused on the gibbon DNA that he was responsible for extracting and analyzing using sequencing machinery.

Then, six years ago, Elde received his first lab of his own to run, at the University of Utah. He did not expect his team's first discovery there to come so swiftly, or that it would involve transposable elements. Elde had arrived at the university with the intention of learning how cells recognize and defeat invading viruses, such as HIV. But he hadn't yet obtained the equipment that he needed to run experiments, despite already having two employees who were eager to do work, including his lab manager, Diane Downhour. Given the lack of lab tools, the two lab staff members spent their time on their computers, poking around databases for interesting patterns in DNA. After just two weeks of this, Downhour came into Elde's office and told him that they had found a couple of extra copies of a particular gene in New World monkeys&mdashspecifically, in squirrel monkeys.

Elde initially brushed off Downhour's insight. &ldquoI said, 'Why don't you go back to the lab and not worry about it?'&rdquo he recalls. But a couple of days later, she returned to his office with the idea. &ldquoI was just in the sort of panicked mode of opening a lab, ordering freezers, trying to set up equipment and hiring people,&rdquo Elde explains. &ldquoDiane definitely had to come back and say, 'Come on, wake up here. Pay attention.'&rdquo

The gene that they detected multiple copies of in squirrel monkeys is called charged multivesicular body protein 3, or CHMP3. Each squirrel monkey seems to have three variants of the gene. By comparison, humans have only the one, original variant of CHMP3. The gene is thought to exist in multiple versions in the squirrel monkey genome thanks to transposable elements. At some point around 35 million years ago, in an ancestor of the squirrel monkey, LINE-1 retrotransposons are thought to have hopped out of the genome inside the cell nucleus and entered the cytoplasm of the cell. After associating with CHMP3 RNA in the cytoplasm, the transposable elements brought the code for CHMP3 back into the nucleus and reintegrated it into the genome. When the extra versions of CHMP3 were copied into the genome, they were not copied perfectly by the cellular machinery, and thus changes were introduced into the sequences. Upon a first look at the data, these imperfections seemed to render them nonfunctional 'pseudogenes'. But as Elde's team delved into the mystery of why squirrel monkeys had so many copies of CHMP3, an intriguing story emerged.

Hiding in plain sight: Squirrel monkeys carry extra copies of the CHMP3 gene. Credit: Ariadne Van Zandbergen Alamy

The discovery of pseudogenes is not wholly uncommon. There are more than 500,000 LINE-1 retrotransposons in the human genome11, and these elements have scavenged and reinserted the codes for other proteins inside the cell as well. Unlike with the endogenous retroviral elements in the genome, which can be clearly traced back to ancient viruses, the origin of LINE-1 retrotransposons is murky. However, both types of transposable elements contain the code for an enzyme called reverse transcriptase, which theoretically enables them to reinsert genetic code into the genome in the cell nucleus. This enzyme is precisely what allowed LINE-1 activity to copy CHMP3 back into the genome of the squirrel-monkey ancestor.

Elde couldn't stop thinking about the mystery of why squirrel monkeys had multiple variants of CHMP3. He knew that in humans, the functional variant of the CHMP3 gene makes a protein that HIV uses to bud off of the cell membrane and travel to and infect other cells of the body. A decade ago, a team of scientists used an engineered vector to prompt human cells in a dish to produce a truncated, inoperative version of the CHMP3 protein and showed that the truncated protein prevented HIV from budding off the cells12. There was hope that this insight would yield a new way of treating HIV infection and so prevent AIDS. Unfortunately, the protein also has a role in allowing other important molecular signals to facilitate the formation of packages that bud off of the cell membrane. As such, the broken CHMP3 protein that the scientists had coaxed the cells to produce soon caused the cells to die.

Given that viruses such as HIV use a budding pathway that relies on normal CHMP3 protein, Elde wondered whether the extra, altered CHMP3 copies that squirrel monkeys carry confers some protection against viruses at the cellular level. He coordinated with researchers around the globe, who sent squirrel-monkey blood from primate centers as far-reaching as Bastrop, Texas, to French Guiana. When Elde's team analyzed the blood, they found that the squirrel monkeys actually produced one of the altered versions of CHMP3 they carry. This finding indicated that in this species, one of the CHMP3 copies was a functional pseudogene, making it more appropriately known as a 'retrogene'. In a further experiment, Elde's group used a genetic tool to coax human kidney cells in a dish to produce this retrogene version of CHMP3. They then allowed HIV to enter the cells, and found that the virus was dramatically less able to exit the cells, thereby stopping it in its tracks. By contrast, in cells that were not engineered to produce the retrogene, HIV was able to leave the cells, which means it could theoretically infect many more.

In a separate portion of the experiment Elde's group demonstrated that whereas human cells tweaked to make the toxic, truncated version of CHMP3 (the kind originally engineered a decade ago) die, cells coaxed to make the squirrel-monkey retrogene version of CHMP3 can survive. And by conducting a further comparison with the truncated version, Elde found that the retrogene&mdashwhat he calls retroCHMP3&mdashin these small primates had somehow acquired mutations that resulted in a CHMP3 protein containing twenty amino acid changes. It's some combination of these twenty points of difference in the protein made by the retrogene that he thinks makes it nontoxic to the cell itself but still able to sabotage HIV's efforts to bud off of cells. Elde presented the findings, which he plans to publish, in February at the Keystone Symposia on Viral Immunity in New Mexico.

The idea that retroCHMP3 from squirrel monkeys can perhaps inhibit viruses such as HIV from spreading is interesting, says Michael Emerman, a virologist at the Fred Hutchinson Cancer Research Center. &ldquoHaving an inhibitor of a process always helps you understand what's important for it,&rdquo Emerman explains. He adds that it's also noteworthy that retroCHMP3 wasn't toxic to the cells, because this finding could inspire a new antiviral medicine: &ldquoIt could help you to design small molecules or drugs that could specifically inhibit that part of the pathway that's used by viruses rather than the part of the pathway used by host cells.&rdquo

Akiko Iwasaki, an immunologist at the Yale School of Medicine in New Haven, Connecticut, is also optimistic that the finding will yield progress. &ldquoWhat is so cool about this mechanism of HIV restriction is that HIV does not bind directly to retroCHMP3, making it more difficult for the virus to overcome the block imposed by retroCHMP3,&rdquo Iwasaki says. &ldquoEven though humans do not have a retroCHMP3 gene, by understanding how retroCHMP3 works in other primates, one can design strategies to mimic the activity of retroCHMP3 in human cells to block HIV replication.&rdquo

Elde hopes that, if the findings hold, cells from patients with HIV infection might one day be extracted and edited to contain copies of retroCHMP3, and then reintroduced into these patients. Scientists have already used a similar cell-editing approach in clinical trials to equip cells with a variant of another gene, called CCR5, that prevents HIV from entering cells. In these experiments, patients have received infusions of their own cells&mdashmodified to carry the rare CCR5 variant. But although preliminary results indicate that the approach is safe, there is not enough evidence yet about its efficacy. (Another point of concern is that people with the rare, modified version of the CCR5 gene might be as much as 13 times more susceptible to getting sick from West Nile virus than those with the normal version of this gene13.) By editing both retroCHMP3 and the version of CCR5 that prevents HIV entry into cells, Elde suggests, this combination of gene edits could provide a more powerful way of modifying patient cells to treat HIV infection.

&ldquoYou could imagine doing a sort of cocktail genetic therapy in order to block HIV in a way that the virus can't adapt around it,&rdquo Elde says. His team also plans to test whether retroCHMP3 has antiviral activity against other viruses, including Ebola.

Going retro: Human cells engineered to make retroCHMP3 (shown in green). Credit: Diane Downhour

The investigations into how pseudogenes and retrogenes might influence health are ongoing. And there is mounting evidence that the LINE-1 elements that create them are more active than previously thought. In 2015, for example, scientists at the Salk Institute in California reported a previously unidentified region of LINE-1 retrotransposons that are, in a way, supercharged. The region that the researchers identified encodes a protein that ultimately helps the retrotransposons to pick up bits of DNA in the cell cytoplasm to reinsert them into the genome14. The same region also enhances the ability of LINE-1 elements to jump around the genome and thus create variation, adding weight to the idea that these elements might have an underappreciated role in human evolution and in creating diversity among different populations of people.

The active function of transposable elements is more important than many people realize, according to John Coffin, a retrovirus researcher who divides his time between his work at the US National Cancer Institute in Frederick, Maryland, and Tufts University in Boston. &ldquoThey can&mdashand have&mdashcontributed in important ways to our biology,&rdquo he says. &ldquoI think their role in shaping our evolutionary history is underappreciated by many evolutionary biologists.&rdquo


Plant Retrotransposons

AbstractRetrotransposons are mobile genetic elements that transpose through reverse transcription of an RNA intermediate. Retrotransposons are ubiquitous in plants and play a major role in plant gene and genome evolution. In many cases, retrotransposons comprise over 50% of nuclear DNA content, a situation that can arise in just a few million years. Plant retrotransposons are structurally and functionally similar to the retrotransposons and retroviruses that are found in other eukaryotic organisms. However, there are important differences in the genomic organization of retrotransposons in plants compared to some other eukaryotes, including their often-high copy numbers, their extensively heterogeneous populations, and their chromosomal dispersion patterns. Recent studies are providing valuable insights into the mechanisms involved in regulating the expression and transposition of retrotransposons. This review describes the structure, genomic organization, expression, regulation, and evolution of retrotransposons, and discusses both their contributions to plant genome evolution and their use as genetic tools in plant biology.


MECHANISTIC STUDIES OF LINE-1 RETROTRANSPOSITION

An assay to study LINE-1 retrotransposition

Almost 20 years ago, a functional assay was developed to assess the retrotransposition potential of LINE-1s in cultured mammalian cells (60). The assay builds upon a rationale developed by Boeke and Fink to demonstrate that the yeast Ty1 retrotransposon mobilizes via an RNA intermediate (6). Subsequent enhancements of the assay by the Heidmann and Curcio laboratories then led to the development of retrotransposition indicator cassettes that could only become activated for expression upon a successful round of retrotransposition (190, 191).

Briefly, the 3’ UTR sequences of candidate full-length LINE-1s are tagged with a retrotransposition indicator cassette that consists of a backward copy of a neomycin phosphotransferase reporter gene equipped with its own promoter and polyadenylation signals (i.e., the mneoI cassette Figure 2 ). Importantly, the reporter gene is disrupted by an intron that resides in the same transcriptional orientation as the LINE-1 (60, 192, 193). This arrangement ensures that the expression of the neomycin phosphotransferase gene only occurs upon a successful round of LINE-1 retrotransposition, which ultimately leads to the generation of clonal foci that grow in the presence of the neomycin analog G418. In sum, the assay allows a simple, yet powerful way to monitor LINE-1 retrotransposition efficiency by counting G418-resistant foci (60).

The LINE-1 expression vector consists of a retrotransposition-competent LINE-1 subcloned into pCEP4 (flanked by a CMV promoter and an SV40 polyadenylation signal). The pCEP4 vector is an episomal plasmid that has protein encoding (EBNA-1) and cis-acting sequences (OriP) necessary for replication in mammalian cells it also has a hygromycin resistance gene (HYG) that allows for the selection of mammalian cells containing the vector, as well as a bacterial origin of replication (Ori) and ampicillin selection marker (Amp) for plasmid amplification in bacteria. The mneoI reporter cassette, located in the LINE-1 3’ UTR, contains the neomycin phosphotransferase gene (neo, purple box, with its own promoter and polyadenylation signals, purple arrow and lollipop, respectively) in the opposite transcriptional orientation of LINE-1 transcription. The reporter gene is interrupted by an intron (light purple box) with splice donor (SD) and acceptor (SA) sites in the same transcriptional orientation of the LINE-1. This arrangement of the reporter cassette ensures that the reporter gene will only be expressed after a successful round of retrotransposition. De novo retrotransposition of the mneoI reporter cassette will result in G418-resistant colonies that can be quantified (Genetic assay panel with pJM101/L1.3 (wild type (WT)) and pJM105/L1.3 (RT mutant (RT−)) LINE-1 constructs). Alternate reporters can be used instead of mneoI to allow different drug-resistance, fluorescent, or luminescent read-outs (Alternate reporters panel, with blasticidin-S deaminase (BLAST), enhanced green fluorescent protein (EGFP) or luciferase (LUC)) retrotransposition indicator cassettes. The addition of the ColE1 bacterial origin of replication (Recovery of the insertion panel, green box) to a modified version of the mneoI reporter cassette allows the recovery from cultured cell genomic DNA of engineered LINE-1 retrotransposition events as autonomously replicating plasmids in E. coli. The insertions also can be characterized by inverse PCR using divergent oligonucleotide primers (Recovery of the insertion panel, black arrows: 1 and 2) that anneal to the reporter gene. The use of epitope tags (T7-tag in carboxyl-terminus of ORF1, yellow box, and TAP-tag in carboxyl-terminus of ORF2, blue box) allow the immunoprecipitation (not shown) and detection of LINE-1 proteins by western blot and immunofluorescence (IF) (Detection panel, with western blot data obtained with pAD2TE1, a vector expressing ORF1-T7p and ORF2-TAPp, compared to untransfected (UT) HeLa cells (82)). The addition of the RNA-stem loops that bind the bacteriophage MS2 coat protein (409) (orange box) in the 3’ UTR of LINE-1 can be used to detect the cellular localization of LINE-1 RNA by fluorescent in situ hybridization (FISH). Both IF and FISH strategies can be combined to detect the subcellular localization of ORF1p, ORF2p and LINE-1 RNA (Cellular localization panel, with pAD3TE1 vector containing ORF1-T7p, ORF2-TAPp, and LINE-1 RNA-MS2 (82)). The images shown in the cellular localization and the detection box originally were published in (82). Additional references are provided in the text.

Since the inception of the cultured cell retrotransposition assay, a battery of retrotransposition indicator cassettes has been developed to assess LINE-1 retrotransposition by either exploiting drug selection (i.e., a blasticidin resistance cassette) or screening for reporter gene activation (i.e., green fluorescent protein and luciferase cassettes) (194�) ( Figure 2 ). Moreover, engineered LINE-1 elements, that contain epitope tags on the carboxyl-termini of ORF1p and ORF2p and an MS2 binding site in the LINE-1 mRNA, have allowed the direct detection of the LINE-1 encoded proteins and mRNA in cultured cells using both biochemical approaches and fluorescence microscopy (82, 198�) ( Figure 2 ). Finally, retrotransposition indicator cassettes have been developed that allow the direct recovery of engineered LINE-1 retrotransposition events as autonomously replicating plasmids in E. coli (156, 202, 203) ( Figure 2 ).

In sum, the cultured cell retrotransposition assay, in conjunction with complementary molecular genetic and biochemical studies, has: 1) allowed the identification of active LINE elements from mammalian and vertebrate genomes (58�, 86, 90, 91, 204, 205) 2) shown that allelic heterogeneity affect LINE-1 retrotransposition (162, 206) 3) facilitated experimental illumination of the LINE-1 retrotransposition mechanism (reviewed in (11)) 4) demonstrated that LINE-1 retrotransposition generates genomic structural variation (156, 202, 203) 5) revealed that the LINE-1-encoded proteins (ORF1p and/or ORF2p) could act in trans to mediate SINE retrotransposition and processed pseudogene formation (92�) and 6) allowed the identification of host factors that may restrict and/or promote retrotransposition (see below). Clearly, the cultured cell assay has been and continues to be instrumental in allowing a deeper mechanistic understanding of LINE-1 biology.

Functional Studies of LINE-1 Retrotransposition

The LINE-1 5’ UTR

Although the promoter structures of human and mouse LINE-1s differ ( Figure 1 ), it is clear that the acquisition of an internal RNA polymerase II promoter has ensured that full-length retrotransposed LINE-1s retain the potential to undergo subsequent amplification in the genome. The human LINE-1 5’ UTR is approximately 910 bp in length and contains an internal RNA polymerase II promoter that directs transcription of LINE-1 mRNA at or near the first nucleotide of the element (75, 207). Experimental studies have revealed that a YY1-binding site at the 5’ end of the 5’ UTR is critical for accurate transcriptional initiation and that most LINE-1 mRNAs contain a 7-methyl guanosine cap structure, which facilitates their translation (208, 209). The 5’ UTR harbors cis-acting binding sites for the following transcription factors: Runx3, Sp1, and SRY-related (Sox) proteins (75, 207, 210, 211). Studies in cultured cells have revealed that mutations in these cis-acting sequences reduce LINE-1 transcription and retrotransposition. In addition, it is likely that other host factors bind the 5’ UTR and regulate LINE-1 expression (see below).

In addition to containing a sense strand promoter, the human LINE-1 5’ UTR contains a conserved RNA polymerase II antisense (AS) promoter (76). Transcription from the LINE-1 AS promoter can lead to the generation of chimeric transcripts comprising LINE-1 sequences conjoined to sequences derived from the 5’ genomic flank of a given LINE-1 locus (76, 212). These chimeric transcripts have been used as a proxy to identify transcriptionally active LINE-1 elements in human embryonic stem cells (213). While mammalian bidirectional promoters have been identified to be the source of some non-coding RNAs (214), the function of the human LINE-1 AS chimeric transcripts, if any, requires elucidation.

The mouse LINE-1 5’ UTR consists of a series of up to 7 2/3 rd monomeric repeats that are followed by an untranslated linker sequence immediately upstream of ORF1 ((90), (reviewed in (87))). Reporter gene assays have revealed that 5’ UTRs from the A, TF, and GF LINE-1 subfamilies remain transcriptionally active (90, 215, 216), whereas 5’ UTRs from the V and F LINE-1 subfamilies generally lack transcription activity (89). Cell culture based and biochemical assays have revealed that mRNAs derived from the A, TF, and GF subfamilies are enriched in ribonucleoprotein particles (RNPs) and that select LINE-1 elements from these remain retrotransposition-competent (86, 90, 91). Thus, it appears that the ability of mouse LINE-1s to capture new promoter sequences has, in part, led to their evolutionary success.

Mouse LINE-1s contain a transcriptionally active antisense RNA polymerase promoter (217, 218). Unlike human LINE-1s, the antisense promoter is located within ORF1. Transcription from the mouse AS promoter leads to the generation of chimeric transcripts containing LINE-1 sequences conjoined to genomic sequences that flank the 5’ end of the LINE-1 locus. Moreover, the overexpression of LINE-1 AS mRNA could lead to a reduction of retrotransposition in cultured cells (218). Thus, it is intriguing to speculate that LINE-1 AS mRNA may play a role in regulating mouse LINE-1 retrotransposition in vivo.

Recent studies indicate that a large fraction of mammalian long non-coding RNAs (lncRNAs) contain retrotransposon-derived sequences and that some are transcribed from LTR-retrotransposon-derived promoters ((219, 220), reviewed in (221)). Intriguingly, select lncRNAs are involved in maintaining the pluripotency of embryonic stem cells by yet unidentified mechanisms (222). It will be interesting to determine if the LINE-1 antisense promoter contributes to the transcriptional regulatory network regulating stem cell identity (reviewed in (223)).

ORF1p

40 kDa protein (also known as p40 for human LINE-1s) that is translated from LINE-1 mRNA by a traditional cap-dependent mechanism (77, 209, 224). Early biochemical studies demonstrated that mouse and human ORF1p resides in cytoplasmic ribonucleoprotein particles and binds single-strand RNA in a sequence-independent manner (78, 225�). Biochemical and genetic analyses clearly demonstrate that ORF1p is required for LINE-1 RNP formation, and that LINE-1 RNP formation is a necessary step in the retrotransposition process (198).

Structural studies have revealed that the N-terminus of LINE-1 ORF1p contains a coiled-coil domain, which facilitates trimerization of ORF1p molecules (79, 229�). The central region of ORF1p contains a non-canonical RNA recognition motif (RRM) that, with assistance of its C-terminal domain (CTD), is required for ORF1p RNA binding (79, 231, 233). Notably, missense mutations in highly conserved amino acid residues in both the RRM and CTD either abolish or adversely affect LINE-1 retrotransposition in cultured cells (60, 231).

Human and mouse ORF1p contain nucleic acid chaperone activities that can facilitate the re-annealing of single-strand DNAs in vitro (80, 234�). Notably, several studies have shown that, despite the lack of sequence homology, proteins encoded by non-mammalian LINEs also contain nucleic acid chaperone activity (237, 238). For example, ORF1p from a zebrafish LINE (ZfL2-1) has nucleic acid chaperone activity (239). It is hypothesized that this nucleic acid chaperone activity facilitates the initial steps of LINE-1 integration in vivo. Somewhat unexpectedly, the deletion of ORF1 does not abolish ZfL2-1 retrotransposition activity in cultured human cells (240). These data, coupled with the fact that Alu retrotransposition only requires the protein encoded by LINE-1 ORF2 (92), raise questions regarding how ORF1p nucleic acid chaperone activity participates in LINE-1 retrotransposition. The development of cell free systems to monitor LINE-1 retrotransposition would allow a more rigorous examination of which ORF1p functions are required for retrotransposition.

ORF2p

150 kDa protein (81, 82, 200, 241) that contains endonuclease (L1 EN) (83) and reverse transcriptase (L1 RT) (84) activities that are critical for retrotransposition (60, 83). The L1 EN domain resides near the N-terminus of the protein, and bears similarity to apurinic/apyrimidinic endonucleases (APE) (242�). In vitro and bioinformatics analyses (71, 83, 99, 196, 245) suggest that L1 EN makes a single-strand endonucleolytic nick at a loosely defined consensus sequence in genomic DNA (5’-TTTT/AA-3’ where the slash indicates the scissile phosphate), exposing a 5’ phosphate and 3’-hydroxyl group (83). Crystallographic studies suggest that L1 EN recognizes an extra helical 𠇏lipped” adenine residue 3’ of the scissile bond to mediate cleavage using a mechanism similar to that employed by other APE proteins (244). In addition, it is likely that epigenetic modifications of target DNA (e.g., nucleosome accessibility) might affect ORF2p accessibility and L1 EN cleavage activity (246).

The L1 RT domain is located downstream of the EN domain in ORF2p, and shares sequence similarity to the RT domains encoded by telomerase, Penelope-like retrotransposons, group II introns, other non-LTR retrotransposons, LTR-retrotransposons, and retroviruses (247�). Biochemical and genetic assays originally were used to demonstrate that Ty1/LINE-1 ORF2p fusion proteins possess reverse transcriptase activity in vitro (84, 250). The subsequent purification of recombinant ORF2p produced in a baculovirus expression system revealed that full-length ORF2p could efficiently generate reverse transcripts from poly rA/oligo dT12 primer template complexes, that L1 RT activity exhibited a preference for Mg 2+ over Mn 2+ , and that L1 RT exhibited both RNA- and DNA-dependent polymerase activities (251). Additional studies revealed that, like the RT encoded by the R2Bm retrotransposon (252), L1 RT is highly processive (when compared to Moloney murine leukemia virus (MMLV)-RT) and lacks detectable RNase H activity (253).

L1 RT activity has been detected in LINE-1 RNP preparations derived from cells transfected with engineered LINE-1 expression vectors (199). Importantly, this work confirmed that ORF2p preferentially reverse transcribes its own mRNA template (i.e., it exhibits cis-preference for its encoding RNA) and that point mutations in ORF1 and the L1 EN domain, which adversely affect LINE-1 retrotransposition, retain reverse transcriptase activity (199). Finally, these and subsequent studies confirmed previous inferences (156) that L1 RT can extend terminally mismatched primer-template complexes (199, 254). The latter property distinguishes L1 RT from MMLV and other retroviral reverse transcriptase enzymes.

While LINE-1 ORF1p and L1 RT activity were readily detectable in RNP preparations, the detection of the LINE-1 ORF2p had been notoriously difficult. Epitope-tagging strategies have allowed the detection of ORF2p in whole cell extract and RNP preparations derived from cells transfected with engineered LINE-1 expression vectors (82, 200, 241). Using a similar strategy, immunofluorescence microscopy studies revealed that engineered ORF2p co-localizes in cytoplasmic foci with both ORF1p and LINE-1 mRNA (82, 200, 241) ( Figure 2 ). Despite progress in detecting ORF2p in cultured cells, debate continues regarding the stoichiometry of ORF1p and ORF2p bound to LINE-1 mRNA (82, 200, 241). It appears that ORF1p is much more abundant in LINE-1 RNPs than ORF2p. Additionally, it is currently unknown how the transition from translation to retrotransposition occurs and what is the exact composition of a truly functional LINE-1 RNP. Clearly, the development of reconstituted in vitro target-site primed reverse transcription (TPRT) reactions would greatly advance the understanding of the detailed molecular mechanism of LINE-1 retrotransposition.

LINE-1 ORF2p contains an ill-defined cysteine-rich domain (C-domain) at its carboxyl-terminus, which has been suggested to function as a zinc-knuckle domain (85). Consistent with its biological importance, cysteine to serine mutations in the C-domain interfere with LINE-1 RNP formation and strongly inhibit LINE-1 retrotransposition in cultured cells (60, 82). Recent studies indicate that a recombinant protein containing the last 180 amino acids of ORF2p exhibits non-sequence specific RNA binding in vitro, and that cysteine to serine mutations in the C-domain do not adversely affect RNA binding (255). Thus, future studies are required to elucidate the exact function of the C-domain in LINE-1 retrotransposition.

Additional functional domains are likely to exist within LINE-1 ORF2p. Indeed, PCNA, which is the sliding clamp protein essential for DNA replication, recently was found to directly interact with ORF2p through a conserved sequence known as a PCNA interaction protein domain (PIP box), which is located between the L1 EN and L1 RT domains (200). Mutating the PIP box abolished LINE-1 retrotransposition (200) however, how PCNA functions in LINE-1 retrotransposition requires further elucidation.

How ORF2p is translated from bicistronic LINE-1 mRNA remains an active area of study and recent reports suggest that human and mouse LINE-1 ORF2p may be translated by distinct mechanisms (256, 257). In human LINE-1s, a 63-nucleotide spacer that contains two in-frame stop codons separates ORF1 and ORF2. Genetic studies in cultured cells suggest that ORF2p translation occurs by an unconventional termination/re-initiation mechanism where a translating ribosome must be able to scan from the stop codon of ORF1 to the start codon of ORF2 (256). Remarkably, studies show that human ORF2 can be translated in an AUG-independent manner (256). By comparison, evidence from luciferase reporter assays suggests that the presence of an internal ribosome entry sequence (IRES), which is located near the 3’ end of mouse ORF1, is used to facilitate translation of mouse ORF2 (257). In addition, cell culture assays have revealed that mouse ORF2 may be translated in an AUG-independent manner (256, 257). Notably, it is unlikely that the 3’ end of human ORF1 has IRES activity (256). Indeed, it has been demonstrated that the sequence of the ORFs encoded by human and mouse LINE-1s can be subjected to substantial sequence changes by codon optimization without affecting retrotransposition in cultured cells (258, 259), suggesting that strict cis-acting sequences are not required for ORF2 translation.

It is unlikely that LINE-1s have evolved a novel mechanism to mediate ORF2 translation. Instead, we hypothesize that LINE-1s have evolved to exploit translation mechanisms inherent to their hosts to mediate ORF2 translation (256). Interestingly, recent ribosomal profiling studies have uncovered an increasing number of unannotated reading frames that reside 5’ of annotated ORFs (260) some short ORFs also may be translated via an AUG-independent mechanism (261). Clearly, additional studies are warranted to elucidate the ORF2 translation mechanism and to determine if ORF2 translation differs among mammalian LINE-1s.

The LINE-1 3’ UTR

206 bp in length and contains a conserved polypurine tract that is predicted to form a G-quadruplex structure (262). Intriguingly, the polypurine tract is not required for LINE-1 retrotransposition in cultured cells (60) but the polypurine tract can inhibit LINE-1 RT activity in in vitro biochemical assays (253). Despite its evolutionary conservation, how the polypurine tract functions in LINE-1 biology remains unknown.

LINE-1 3’ UTRs contain a functional RNA polymerase II polyadenylation signal near their 3’ ends. Experiments in cultured cells have revealed that the LINE-1 poly (A) signal is relatively weak, is often bypassed by RNA polymerase II, and that RNA polymerase II frequently utilizes canonical polyadenylation sites fortuitously present in 3’ flanking genomic DNA sequences (263). The use of these genomic polyadenylation sequences can lead to the generation of chimeric LINE-1 transcripts containing genomic DNA sequences at their 3’ end (see below). Finally, recent data suggest that the human and mouse 3’ UTRs have promoter activity that leads to the generation of alternative LINE-1 transcripts in various tissues (264). The field awaits a better definition of this promoter activity and the role of the resultant transcripts in LINE-1 biology.

An overview of the LINE-1 replication pathway

LINE-1 retrotransposition occurs via a 𠇌opy and paste” process termed target-site primed reverse transcription (TPRT Figure 3 ), a mechanism originally described by the Eickbush laboratory for the related site-specific non-LTR retrotransposon, R2Bm, from the silkworm Bombyx mori genome (265). After transcription from a chromosomal locus, a full-length bicistronic LINE-1 mRNA is exported to the cytoplasm. Upon translation, ORF1p and ORF2p exhibit a strong cis-preference (97, 98) and bind to their respective encoding mRNA, forming a ribonucleoprotein particle (RNP) (78, 82, 198, 227, 228). The RNP minimally consists of LINE-1 mRNA, multiple ORF1p trimers, and as few as one molecule of ORF2p (82, 256), but also likely contains numerous cellular proteins and RNAs (200, 201, 266) ( Figure 3 ).

An active copy of LINE-1 is present at one chromosomal locus (light blue box in dark grey chromosome) and consists of a 5’ UTR (light grey box) with an internal promoter (thin black arrow), two ORFs (ORF1, yellow box, and ORF2, blue box), a 3’ UTR (light grey box) followed by a poly (A) tract (An) and is flanked by TSDs (thick black arrows). Transcription of LINE-1 occurs in the nucleus and produces a bicistronic RNA (wavy line). Upon translation in the cytoplasm, ORF1p and ORF2p (yellow circle and blue oval, respectively) bind back to their encoding RNA (cis-preference) to form an RNP complex. ORF1p and/or ORF2p also can retrotranspose cellular RNAs (mRNA, SVA, and Alu, in red, green and orange wavy lines, respectively). The retrotransposition of Alu RNA only requires ORF2p (92). There is some debate as to whether ORF1p augments Alu retrotransposition (410), and if SVA retrotransposition requires both ORF1p and ORF2p (94, 95). The LINE-1 RNP enters the nucleus where de novo insertion occurs by a mechanism termed TPRT. The ORF2p endonuclease activity makes a single-strand endonucleolytic nick at the genomic DNA target (L1 EN cleavage), at a loosely defined consensus site (5’-TTTT/A-3’, with “/” indicating the scissile phosphate). The ORF2p RT activity then uses the exposed 3’-OH group to initiate first-strand LINE-1 cDNA synthesis using the bound RNA as a template. The final steps of TPRT (i.e., top-strand cleavage, second-strand LINE-1 cDNA synthesis, and repair of the DNA ends) lead to the insertion of a de novo LINE-1 copy at a new chromosomal locus (light yellow box in light grey chromosome). The new LINE-1 copy is often 5’ truncated, contains a variable-sized poly (A) tract (An), and generally is flanked by target-site duplications (thick grey arrows). Additional references are provided in the text.

Intriguingly, subsequent studies revealed that ORF1p, ORF2p, and LINE-1 RNA accumulate in dense cytoplasmic foci, which are closely associated with stress granule proteins (82, 197). In yeast, proteins encoded by Ty1 and Ty3 retrotransposons are associated with cytoplasmic foci called processing bodies (P-bodies) and experiments suggest that P-body localization is important for RNP assembly and may represent a host mechanism that regulates retrotransposition (267�).

How the LINE-1 RNP enters the nucleus is not fully understood. Experiments using modified second-generation adenoviral expression vectors containing an active human LINE-1 have demonstrated that LINE-1 retrotransposition can occur in G1/S arrested cells (270). Similarly, LINE-like sequences from Candida albicans (271, 272) and Neurospora crassa (273), which undergo closed mitosis, can retrotranspose independently of nuclear envelope breakdown. Thus, it does not appear that cell division is a requisite for LINE-1 retrotransposition. Some reports suggest that cell division augments the retrotransposition of engineered LINE-1s in cultured cells (274, 275). Notably, the cultured cell retrotransposition assay generally requires the detection of retrotransposition events as a function of reporter gene expression. Thus, as cells divide they may produce more of the reporter gene product, leading to an apparent increase in LINE-1 retrotransposition potential, thereby explaining the apparent discrepancies between the above studies.

Once in the nucleus, L1 EN makes a single-strand endonucleolytic nick in genomic DNA at a degenerate consensus sequence (5’-TTTT/AA: where the “/” indicates the scissile phosphate), exposing a 3’ hydroxyl group that serves as a primer for the reverse transcription of the LINE-1 mRNA by the L1 RT activity encoded in ORF2p (83, 276) ( Figure 3 ). Whether the LINE-1 mRNA simply acts as a template for retrotransposition or whether it plays additional roles during TPRT requires more study. It is notable that codon-optimized synthetic mouse and human LINE-1s, in which

25% of the nucleotide sequence has been replaced to increase the G-C content of LINE-1 RNA while retaining the amino acid sequence of the LINE-1 encoded proteins, can readily retrotranspose in cultured human cells (258, 259).

Studies in cultured human cells have revealed that the LINE-1 RT has a mis-incorporation error rate of

1 in 6500 bases (156). By using the binomial distribution, it has been estimated that

40% of full-length LINE-1 retrotransposition events represent faithful copies of the progenitor LINE-1 element (

37% contain one mutation, and

16% contain two mutations) (156). Subsequent steps in the retrotransposition process, including second-strand target-site DNA cleavage and second-strand LINE-1 cDNA synthesis, require additional investigation. By analogy to the evolutionarily related R2 retrotransposon of Bombyx mori, ORF2p may play a role in each of the above processes (277). The net result of TPRT is the integration of a new LINE-1 copy at a new chromosomal location ( Figure 3 ).

Genomic Rearrangements generated during LINE-1 retrotransposition

LINE-1-mediated retrotransposition events sometimes are accompanied by intra-LINE-1 rearrangements (e.g., 5’ truncations and 5’ truncations associated with inversion/deletion events) or genomic structural rearrangements ( Figure 4 ). The features of these events suggest that host processes, such as DNA repair and/or DNA replication, may ultimately impact the structure of newly retrotransposed LINE-1s. Below we discuss some of these rearrangements.

A. LINE-1 retrotransposition events can result in local alterations in genomic target-site DNA. De novo insertion of LINE-1 occurs at a genomic DNA target (thick grey line). LINE-1 RNA is depicted as a blue wavy line followed by a poly (A) tail (An) LINE-1 cDNA as a blue arrow and a new LINE-1 copy as a thick blue line including a poly (A) tract (An). Insertions can occur by either conventional (full-length, left) or abortive (5’ truncated, right) retrotransposition and generally result in the formation of variable-length target-site duplications (TSD, black boxes). “Twin-priming” generates LINE-1 inversion/deletions or inversion/duplications (represented by opposing arrows in the LINE-1 new copy). The priming of LINE-1 cDNA synthesis from the cleaved top-strand genomic DNA is represented in light blue. The transduction of genomic DNA sequences can occur when either 5’ or 3’ flanking genomic sequences are incorporated into LINE-1 RNAs and are mobilized by retrotransposition. The 5’ and 3’ transductions are depicted as green or pink wavy lines (in RNA) and green or pink boxes (in the new LINE-1 copy), respectively. The 3’ transduction events contain two poly (A) sequences (An). The LINE-1 enzymatic machinery also can mobilize snRNAs such as U6 snRNA to new genomic locations. The proposed model involves an L1 RT template switch from LINE-1 RNA to the U6 snRNA (orange wavy line U6 cDNA, orange arrow) during TPRT. B. LINE-1 retrotransposition events associated with genomic structural variation. LINE-1 RNA, cDNA, and a de novo LINE-1 insertion are depicted as in panel A. Lower case letters (a, b, c, or d) in genomic DNA (grey thick line) are used to depict deletions or duplications (by alteration of the alphabetic order). The resolution of TPRT at site of DNA damage (left panel, black arrowhead upstream of the integration site) is hypothesized to result in a large genomic deletion (the loss of segment 𠇋”), whereas the resolution of TPRT at a single-strand endonucleolytic nick downstream from the LINE-1 integration site (left panel, black arrowhead) is hypothesized to lead to a large target-site duplication (the duplication of segment 𠇌”). The resolution of TPRT by single-strand annealing (SSA) (middle panel) can lead to the generation of a chimeric LINE-1, where an endogenous LINE-1 (light purple rectangle) is fused to a new LINE-1 (dark blue rectangle) the formation of the chimera results in the loss of segment 𠇋”. Similarly, the resolution of “twin-priming” intermediates by synthesis-dependent strand annealing (SDSA) (right panel) can lead to the generation of an L1 chimera with an intrachromosomal duplication (the duplication of both segment 𠇊” segment and the endogenous L1 sequence). The entire insertion is flanked by a target-site duplication (black boxes). Notably, LINE-1 insertions generated in cultured cells by “twin-priming” occasionally are repaired by SDSA (156). Details on how chimeric LINE-1 integration events are formed can be found in (156, 202, 203). Additional references are provided in the text.

Intra-LINE-1 alterations

The examination of the HGR reveals that

30�% of human-specific Ta-subset LINE-1 insertions are full-length,

40�% are truncated at their 5’ ends, and

25% contain internal rearrangements known as inversion/deletions (7, 63, 71, 278). The characterization of engineered LINE-1 retrotransposition events from cultured cells has led to the proposition that two pathways of LINE-1 retrotransposition exist: conventional and abortive retrotransposition (156). Conventional retrotransposition accounts for the generation of full-length LINE-1 insertions and can lead to the formation of new “master genes” that can serve as a source of retrotransposition events in subsequent generations ( Figure 4A ). In general, full-length LINE-1 insertions are characterized by typical LINE-1 structural hallmarks (i.e., they terminate with a poly (A) tract are flanked by variable size target-site duplications and integrate at a LINE-1 endonuclease consensus site) (reviewed in (11)).

The generation of 5’ truncated LINE-1 elements is proposed to occur via abortive retrotransposition. Here, the L1 RT becomes dissociated from the (−) strand LINE-1 cDNA during TPRT. Annealing of the LINE-1 cDNA to top-strand genomic DNA then may specify the placement of top-strand (also referred to as second-strand) genomic DNA cleavage, generating a 3’-hydroxyl group needed for DNA-dependent (+) strand LINE-1 cDNA synthesis (156, 279). How the L1 RT may become dissociated from the LINE-1 cDNA requires clarification however, it is intriguing to speculate that the process of LINE-1 integration represents a battleground between LINE-1 and the host, and that the Y-branch intermediate generated during (−) strand LINE-1 cDNA synthesis may elicit a DNA repair response(s) by the host (156). Recent studies suggest that the ataxia telangiectasia mutated (ATM) and excision repair cross-complementation group 1 (ERCC1) proteins modulate LINE-1 retrotransposition (280�) thus, it is reasonable to speculate that DNA repair pathways might influence the generation of 5’ truncated LINE-1s.

The formation of inversion/deletion structures represents an alternative form of conventional retrotransposition termed “twin-priming” (278). Here, the LINE-1 mRNA anneals to single strand DNA exposed at both the cleaved bottom and top-strand genomic DNA sequences ( Figure 4A ). Template switching of the L1 RT during RNA-dependent (−) strand LINE-1 cDNA synthesis, or perhaps a second molecule of ORF2p, then allows the use of the 3’-hydroxyl group generated at the top-strand LINE-1 mRNA/genomic DNA junction to serve as a primer for convergent RNA-dependent (−) strand LINE-1 cDNA synthesis. The completion of cDNA synthesis may involve a LINE-1 RT DNA-dependent DNA polymerase activity or a host-encoded DNA polymerase. Microhomology-mediated annealing of the resultant cDNAs then can lead to the formation of inversion/deletion structures ( Figure 4A ). Notably, virtually all of the predictions of the twin-priming model have been confirmed by examining engineered LINE-1 integration events from cultured cells (156). How microhomology-mediated annealing occurs needs further study, although one can hypothesize that it is carried out by an alternative, microcomplementarity-mediated non-homologous end joining pathway of DNA repair (283).

The incorporation of untemplated nucleotides, presumably added after the completion of (−) strand LINE-1 cDNA synthesis by the LINE-1 RT, can result in short stretches of non-templated sequence at the 5’ genomic DNA/LINE-1 junction (156, 203, 208, 284), which may facilitate annealing of the LINE-1 cDNA to single-strand DNA exposed at the top-strand genomic DNA target-site (279, 285, 286). If so, we reason that the resultant LINE-1 cDNA/genomic DNA hybrid then may specify the placement of top-strand genomic DNA cleavage, generating the 3’-hydroxyl group needed for DNA-dependent (+) strand cDNA synthesis by either the LINE-1 reverse transcriptase or a host-encoded DNA polymerase.

LINE-1-mediated transduction events

Active LINE-1s mobilize sequences that are derived from their 5’ and 3’ flanking genomic DNA by a process termed LINE-1-mediated transduction ( Figure 4A ). LINE-1s containing 5’ transduction events occur when a cellular promoter, which resides upstream of an active genomic full-length LINE-1 copy, is used to initiate LINE-1 transcription. Retrotransposition of the chimeric 5’ genomic/LINE-1 mRNA transcript then leads to the transduction of the 5’-derived genomic DNA sequence to a new chromosomal location. If a conventional RNA polymerase II promoter in genomic DNA is used to initiate LINE-1 transcription, the resultant 5’ transduced LINE-1 will lack the genomic promoter and generally will be transcribed using the internal promoter present in the LINE-1 5’ UTR in successive rounds of retrotransposition. Full-length LINE-1s containing 5’ transduced genomic DNA sequences originally were detected in the HGR (7). A 5’ transduction event is relatively rare and can only be identified by examining the sequences of full-length LINE-1s. Notably, the Nathans laboratory demonstrated that a full-length mouse LINE-1 insertion carrying a 28 bp 5’ transduction led to the mis-splicing of the Nr2e3 gene in a retinal degeneration 7-mouse model (287).

Due to the presence of inherently weak polyadenylation signals in their 3’ UTRs, LINE-1s also can mobilize sequences that are derived from their 3’ flanks, including exons, that range in size from tens of base pairs to at least 1.6 kb in length by 3’ transduction (263, 288�) ( Figure 4A ). This class of insertion is generated when RNA polymerase II bypasses the weak polyadenylation signal present at the 3’ end of a full-length LINE-1 and instead uses a fortuitous polyadenylation signal in the 3’ flanking genomic DNA. Retrotransposition of the resultant LINE-1/genomic hybrid mRNA leads to the insertion of the 3’ flanking genomic DNA downstream of the new LINE-1 copy at a new chromosomal location. Due to the structure of most mammalian genes, which contain long introns with significant numbers of LINE-1 and SINE insertions, we speculate that LINE-1s have evolved to contain a weak polyadenylation signal in order to allow normal expression of intron-containing genes ((263), (reviewed in (291�))).

The fact that LINE-1s can retrotranspose sequences derived from their 3’ genomic flanks to new genomic locations was first appreciated while characterizing a mutagenic LINE-1 insertion into the dystrophin gene (290). The 3’ transduction “genomic tag” in the mutagenic insertion then was used to isolate the likely progenitor LINE-1, named LRE2 (290). Experiments in cultured cells subsequently showed the LINE-1 3’ transduction events occur frequently and could, in principle, mobilize exons and promoters to new genomic locations, providing a possible mechanism for exon shuffling (60, 263). Since that time, it has become apparent that

20�% of LINE-1 retrotransposition events are accompanied by 3’ transduced sequences (288, 289). The presence of 3’ transductions also have been used as “genomic tags” to identify progeny/offspring relationships among LINE-1 elements, to stratify full-length LINE-1 elements into subdivisions, and to identify clusters of LINE-1s that are actively mobilizing in the human population (58, 294).

The severe 5’ truncation of the LINE-1/genomic mRNA upon TPRT can lead to the generation of “orphan transductions” that lack LINE-1 sequence (263). A mutagenic “orphan transduction” recently was identified as a cause of Duchene muscular dystrophy (295). Notably, and consistent with cultured cell studies, a recent study reported that

24% of somatic LINE-1 retrotransposition events in tumors derived from 244 patients were accompanied by 3’ transduction events, and that many of these events represented “orphan transductions” (296). Finally, it is noteworthy that transduction is not peculiar to LINE-1s. Both 5’ and 3’ transduction events have been observed with SVA retrotransposons (127, 297�). Indeed, SVA retrotransposons provided a vehicle to shuffle the acyl-malonyl condensing enzyme-1 (AMAC) gene to three different locations in primate genomes (298).

LINE-1 target-site alterations: local alterations at the integration site

The mechanism of top-strand target-site cleavage at the genomic LINE-1 integration site requires elucidation. It is clear that the placement of second-strand DNA cleavage can influence the structure of the resultant LINE-1 integration events. After characterizing 100 engineered LINE-1 insertions in cultured cells, Gilbert and colleagues proposed a model that accounts for a number of observed target-site alterations (156, 202). In this model, top-strand DNA cleavage upstream from the bottom-strand endonucleolytic nick can lead to small deletions of genomic DNA at the LINE-1 integration site. Likewise, top-strand DNA cleavage directly opposite to bottom-strand cleavage can lead to LINE-1 integration events that lack target-site alterations, whereas top-strand cleavage downstream of the initial endonucleolytic nick can lead to the generation of target-site duplications ( Figure 4A ). Interestingly,

10% of LINE-1 insertions in cultured cells were accompanied by large target-site duplications that are infrequently seen in the human genome reference sequence (156, 202) ( Figure 4B ). Otherwise, the cultured cell retrotransposition assay largely recapitulates the spectrum of structural outcomes observed among endogenous germline retrotransposition events. However, the cellular milieu in which retrotransposition takes place, perhaps defined by the presence and activity of DNA repair machinery and other host factors, likely influences the range of possible LINE-1-mediated target-site alterations. Notably, the mechanisms described above also are likely to account for local target-site alterations accompanying SINE retrotransposition and processed pseudogene formation (reviewed in (11)).

LINE-1-mediated retrotransposition target-site alterations: the generation of structural variants

In addition to minor target-site alterations, LINE-1 retrotransposition can lead to more substantial target-site genomic DNA modifications. The examination of LINE-1 integration events in cultured human cells has revealed that approximately 10% of retrotransposition events are accompanied by rearrangements of target-site DNA, creating genomic structural variation (156, 202, 203) ( Figure 4B ). The comparisons of pre- and post-integration sites in genomic DNA have revealed that the resolution of TPRT intermediates can lead to the generation of chimeric LINE-1 sequences (156, 202, 203). Single nucleotide polymorphism analyses revealed that the resultant integration events contain an endogenous genomic LINE-1 fused to the engineered LINE-1 and concomitant genomic alterations (156, 202, 203) ( Figure 4B ).

The formation of chimeric LINE-1 elements can occur by various mechanisms. For example, the resolution of TPRT intermediates by single-strand annealing (SSA) or synthesis-dependent strand annealing (SDSA) can lead to the formation of LINE-1 retrotransposition-mediated deletions or duplications, respectively (156, 202, 203) ( Figure 4B ). Importantly, large-scale genomic alterations observed in cultured cells are reflective of events occurring in humans in vivo. For example, a LINE-1 retrotransposition event was responsible for a 46 kb deletion in the PDHX gene of a human patient, resulting in pyruvate dehydrogenase deficiency (300).

Genomic alterations also can accompany the retrotransposition of both Alu and SVA elements. For example, an Alu retrotransposition event that occurred

2.7 million years ago led to the deletion of an internal 92 bp exon within the CMP-Neu5Ac hydroxylase gene. As a result, humans are genetically deficient for N-glyconeuraminic acid (301). Similarly, an SVA retrotransposition event led to an

14 kb deletion that resulted in the loss of the HLA-A gene in a cohort of Japanese families afflicted with leukemia (302). Finally, recent studies showed that two independent post-zygotic SVA retrotransposition events into the NF1 gene were associated with large deletions of

1 Mb and 867 kb, respectively (167).

On a larger scale, comparative genomics approaches between the human and chimpanzee reference sequences led to the identification of 50 LINE-1 retrotransposition events responsible for the deletion of

18 kb from the human genome and

15 kb from the chimpanzee genome (303). Similar approaches uncovered 33 Alu retrotransposition events that eliminated approximately 9 kb of human DNA (304). Thus, although relatively rare when compared to conventional retrotransposition events, LINE-1 retrotransposition-mediated deletion events continue to sculpt the landscape of the human genome.

Post-integration recombination events between genomic retrotransposons

The sheer mass of LINE-1 and Alu sequences in the genome also can provide substrates for post-integration recombination, generating structural variation in the human genome. For example, non-allelic homologous recombination (NAHR), non-homologous DNA end joining (NHEJ), and other types of recombination events between genomic LINE-1 or Alu elements can lead to genomic deletions, duplications, and perhaps translocations, and are implicated in human disease (reviewed in (11, 151, 305�)). Clearly, these examples will continue to grow as individual whole genome DNA sequencing continues during the coming years.

Endonuclease-independent LINE-1 Retrotransposition

LINE-1 retrotransposition by TPRT usually is initiated by the cleavage of genomic DNA by L1 EN. The examination of LINE-1 retrotransposition in Chinese hamster ovary (CHO) cells deficient in components of the non-homologous end joining (NHEJ) pathway of DNA double-stranded break repair led to the discovery of an alternative integration pathway termed endonuclease-independent (ENi) LINE-1 retrotransposition (196). The ENi pathway of LINE-1 retrotransposition is reminiscent of a type of RNA-mediated DNA repair in which LINE-1 elements that lack L1 EN function presumably can use genomic lesions to initiate TPRT (196, 308). ENi retrotransposition events bear structural hallmarks distinct from canonical TPRT-mediated LINE-1 insertions in that they frequently are both 5’ and 3’ truncated, do not occur at typical LINE-1 endonuclease sites in genomic DNA, generally lack target-site duplications, and often are accompanied by the deletion of genomic DNA at the integration site (196). ENi LINE-1 retrotransposition events also occasionally are accompanied by the insertion of short cDNA fragments at both their 5’ and 3’ LINE-1/genomic DNA junctions, which appear to be derived from the reverse transcription of cellular mRNAs (196, 309).

Subsequent studies in DNA protein kinase catalytic subunit (DNA-PKcs)-deficient CHO cells revealed that ENi retrotransposition events could occur at dysfunctional telomeres, highlighting similarities between ENi retrotransposition and telomerase activity (309�). Indeed, these results parallel the situation in Drosophila and Bdelloid rotifer genomes, where 𠇍omesticated” retrotransposons function to maintain telomere length in place of a conventional telomerase activity (311, 313�).

As with other phenomena discovered using engineered LINE-1 retrotransposons in the cultured cell retrotransposition assay, putative ENi retrotransposition events also have been identified in the human and mouse genomes (174, 316). Indeed, a likely ENi retrotransposition event into the EYA1 gene was accompanied by a

17 kb deletion, leading to a sporadic case of human oto-renal syndrome (317). It will be interesting to learn whether deficiencies in other DNA repair pathways lead to increased ENi LINE-1 retrotransposition.

Notably, some group II introns lack an EN domain and can use 3’-hydroxyl groups at nascent DNA strands present at DNA replication forks to initiate retrotransposition (318). Moreover, it has been proposed that the L1 EN domain was acquired after the L1 RT domain during LINE-1 evolution (248). Together, these data indicate that the ENi pathway of LINE-1 retrotransposition may represent an ancient mechanism of LINE-1 retrotransposition prior to the acquisition of an APE-like endonuclease domain.


A family of proteins known as histones provides support and structure to DNA, but for years, scientists have been puzzling over occasional outliers among these histones, which appear to exist for specific, but often mysterious reasons. Now, researchers have uncovered a new purpose for one such histone variant: preventing genetic mutations by keeping certain so-called "jumping genes" in place.

This research, which began at Rockefeller University and was published May 4 in Nature, reveals a basic mechanism by which epigenetics, or the control of inherited traits through means other than DNA, works. Due to histones' close relationship with DNA, scientists have known for some time that they are frequently involved in epigenetic control of genes. In this case, one particular histone variant appears to reduce the chance of potentially harmful changes in the stem cells that will eventually generate the various types of tissue that make up a living creature.

"They say that good things come in small packages. Nowhere is this more true than with histone variants. This study found the variant H3.3, which differs only slightly from the standard H3 histones, helps prevent certain genetic elements, which are remnants left behind by ancient viral infections, from moving about within the genome," says study author C. David Allis, Joy and Jack Fishman Professor and head of the Laboratory of Chromatin Biology and Epigenetics. "This discovery is an important addition to our still-evolving knowledge of how epigenetics works at the molecular level."

Histones are proteins that act as spools for the thread that is DNA, giving it support and structure. Chemical modifications to these histones can change the expression of genes, making them more available for expression or silencing them by compacting the DNA-protein complex. Oddball H3.3 varies from its regular counterpart H3 by only few amino acids. Because it is present throughout the animal kingdom, however, scientists have suspected for some time that H3.3 has a specific biological role.

Study authors Simon Elsasser and Laura Banaszynski, both of whom worked on H3.3 in Allis's lab at Rockefeller but have since moved on to other institutions, started by looking at the locations on the mouse genome where H3.3 was deposited in stem cells. Elsasser began the project as graduate student in Allis's lab and continued as a postdoc at the MRC Laboratory of Molecular Biology in the United Kingdom. He is now an assistant professor at the Karolinska Institute in Sweden. He had the idea to look for H3.3 at repetitive sequences however, repeats are normally filtered out in a genome-wide study. So, Elsasser developed a new approach to capture this information.

A pattern emerged from the results: H3.3 appeared at a certain type of repetitive sequence: retrotransposons, which are leftovers from ancient viral infections. Unlike their ancestral viruses, retrotransposons are trapped in the host genome, but they can still copy themselves and jump to new locations within it. Sometimes, evolution finds a use for them. For instance, retrotransposon-derived genes code for proteins necessary for the placenta in mammals. But when retrotransposons jump, they can also cause harmful mutations.

For studies like this one, which explores chromatin's role regulating gene expression, scientists often use mouse embryonic stem cells. Stem cells' chromatin landscape is more plastic than that of differentiated cells, reflecting their capacity to enter any of many gene expression programs that lead to the hundreds of different cell types in an adult organism. Once the cells have begun to pick an identity, parts of the genome not needed for that identity get closed off forever. Prior to the current study, scientists knew mouse stem cells kept most of the genome accessible, while keeping the lid on retrotransposons by tagging them with chemical markers containing three methyl groups on histone H3.

Early experiments done by Banaszynski, while a postdoc in Allis's lab, suggested that H3.3 is necessary for the placement of these suppressive "trimethyl" marks. "By taking away proteins responsible for placing H3.3 into chromatin, or eliminating H3.3 completely, we confirmed that trimethylation depends on H3.3," says Banaszynski, who is currently an assistant professor at the University of Texas Southwestern Medical Center.

"Furthermore, retrotransposons became more active in cells without H3.3, and in these cells, we saw chromosomal abnormalities. It may be that by silencing retrotransposons, H3.3 prevents these abnormalities, however we cannot eliminate the possibility that loss of H3.3 results in this genomic instability for other reasons," Elsasser says.

Although the types of retrotransposons studied in these experiments are not active in humans, it's likely that human stem cells do use H3.3 to keep other varieties of jumping genes in place, Banaszynski says.

The research has implications beyond epigenetics. "This study also hints at a fascinating question in biology: How do cells balance the potential evolutionary benefit of mobile elements, such as retrotransposons, with the competing need to silence them so as to maintain the genome?" she says.


New roles emerge for non-coding RNAs in directing embryonic development

Traditionally, the roles of only a few types of RNA have been understood for the significant part they play in cell biology. The short list includes messenger RNA, transfer RNA, and ribosomal RNA. More recently, other types have been added: mircroRNA, small interfering RNA, and antisense RNA.

But that hardly exhausts the list. A more recently discovered type of RNA is large, intergenic non-coding RNA (lincRNA), a particular subtype of long non-coding RNA. LincRNA are so-named because they are not derived from gene-coding DNA, but instead from stretches of DNA lying between genes. New research suggests that an important function of some lincRNAs is to regulate the development of embryonic stem cells in the earliest stages of embryo develpment.

Scientists at the Broad Institute of MIT and Harvard have discovered that a mysterious class of large RNAs plays a central role in embryonic development, contrary to the dogma that proteins alone are the master regulators of this process. The research, published online August 28 in the journal Nature, reveals that these RNAs orchestrate the fate of embryonic stem (ES) cells by keeping them in their fledgling state or directing them along the path to cell specialization.


Features

Centrifuge

Bench Report

President's Letter

Perspectives & Opinions

Toolbox

Science Education

Lab Book

About the Bulletin

Primates have evolved specialized weaponry to home in on bits of jumping DNA.
illustration by John C.W. Carroll / Gladstone Institutes

Cells have a rapidly evolving arsenal to counter wayward DNA.

For millions of years, bits of DNA called retrotransposons have been duplicating and inserting themselves randomly throughout the genomes of different organisms, including humans. Sometimes the jumping genes alight in places that help species evolve new traits. Other times they do nothing. But all too often they touch down in the middle of a gene, altering the way it’s regulated or knocking it out of commission. Fortunately, as HHMI Investigator David Haussler recently showed, cells have evolved a way to keep these wayward sequences in check.

“Over generations, our genome becomes bloated with copies of retrotransposons,” explains Haussler. “They do damage by disrupting normal genetic mechanisms. Something has to be the security patrol to try and shut these things down.”

In humans, one form of security patrol comes from more than 400 rapidly evolving genes that produce watchdog proteins called KRAB zinc-fingers. These watchdogs scan the genome, clamp onto retrotransposons, and then call on other proteins to silence them.

To see KRAB zinc-fingers in action, Haussler’s team at the University of California, Santa Cruz, used mouse cell lines containing a copy of human chromosome 11, which includes hundreds of retrotransposons commonly found in primates like humans, but not present in rodents. Since mice have rodent-specific KRAB zinc-fingers to control their retrotransposons, the rodent cells couldn’t stop the rogue human retrotransposons from expressing themselves. That is, until the researchers added two human KRAB zinc-fingers—ZNF91 and ZNF93—that halted the primate-specific retrotransposons in their tracks. The team published its findings September 28, 2014, online in Nature.

Haussler explains that mutations in retrotransposons allow them to escape detection by the KRAB zinc-fingers, which in turn drives the evolution of new KRAB zinc-finger genes. “By reconstructing molecular evolutionary histories, we can see that these KRAB zinc-finger genes have been major players in this battle with the retrotransposons and will probably continue to be so,” he says.

There is a silver lining in this seemingly endless arms race within our own DNA. Once KRAB zinc-fingers are no longer needed to suppress retrotransposons, many of these watchdog proteins adopt new roles as regulatory proteins, controlling the activity of genes near retrotransposon landing sites.