Information

How can you tell if a protein-coding gene is nuclear or mitochondrial?


New to genes, and have to read literature to find candidate genes for a particular study. I cannot for the life of my understand if all genes are placed into either the "nuclear" or "mitochondrial" category… are there more categories? Some are easy, like cytochrome subunit genes are always written as "mitochondrial" and then rhodopsin are written as nuclear… but other genes like ATPase etc. don't have those descriptions. When I search online further, it still does not clarify, so I am wondering now if there are just more categories to this?


I have limited my answer to refer to humans, but the advice generalises to other eukaryotic cells.

In human cells, almost all of the genes that code for proteins are located in the genome, which is located in the nucleus. The mitochondria have their own genetic material, however, the mitochondrial genes only encode for 14 proteins. The answer to this question also notes the existence of extrachromosomal circular DNA. There are also a number of proteins (1192) that are verified to exist but their genomic location is currently unconfirmed.

In your comments you mention ATP1 and note that it iscalled a membrane protein.The gene is located on chromosome 19 in the nucleus. The protein, that is encoded by the gene, can be found in a variety of places, including the nucleus and also the mitochondrion. The gene for Cytochrome c oxidase subunit 1 (CO1/COX1) is encoded in the mitochondrial genome and it localises to the inner membrane of the mitochondrion.

The links in this answer all point to Uniprot. This is what I use to get an idea of the basic functions of proteins. The database includes information genes but its primary purpose is to describe proteins and their functions. The information is roughly correct, it is not perfect. It includes a many species and links to a number of tools, including the gene ontology which attempts to characterise proteins using a controlled vocabulary.


Engineering New Mitochondrial Genes to Restore Mitochondrial Function (MitoSENS)

Mitochondria perform and support several vital functions in a cell, and the alternate genome, mtDNA, plays a critical role in organelle maintenance. There is increasing evidence that mitochondrial function declines with age, and that dysfunctional mitochondria adversely contribute to several metabolic and neuromuscular diseases. Our goal is to address age-acquired and inborn errors of mutation in the mtDNA using a gene therapy approach. We are exploring:

  1. allotopic expression (expressing mtDNA genes from the nucleus), and
  2. whole-organelle replacement

as strategies to revitalize mitochondrial function. Our multidisciplinary approach employs cell culture and mouse models to achieve our objectives.


ELife digest

Mitochondria are like the batteries of our cells they perform the essential task of turning nutrients into chemical energy. A cell relies on its mitochondria for its survival, but they are not completely under the cell’s control. Mitochondria have their own DNA, separate from the cell’s DNA which is stored in the nucleus. It contains a handful of genes, which carry the code for some of the important proteins needed for energy production.

These proteins are made in the mitochondria themselves, and their levels are tweaked to meet the cell's current energy needs. To do this, mitochondria make copies of their genes and feed these copies into their own protein-production machinery. By controlling the number of gene copies they make, mitochondria can control the amount of protein they produce. But the process has several steps. The copies come in the form of a DNA-like molecule called RNA and, at first, they contain several genes connected one after the other. To access each gene, the mitochondria need to cut them up. They then process the fragments, fine-tuning the number of copies of each gene. This process – called gene expression – happens in the mitochondria, but they cannot do it on their own they need proteins that are coded within the DNA in the cell nucleus.

Genes in the cell nucleus can affect gene expression in the mitochondria, changing the cell's energy supply. Scientists do not yet know all of the genes involved, or how this might differ between different tissues or among different individuals. To find out, Ali et al. examined more than 11,000 records of RNA sequences from 36 different human cells and tissues, including blood, fat and skin. This revealed a large amount of variation in the expression of mitochondrial genes. The way the mitochondria processed their genes changed in different cells and in different people. To find out which genes in the nucleus were responsible for the differences in the mitochondria, the next step was to compare RNA levels from the mitochondria to the DNA sequences in the nucleus. This is because changes in the DNA sequence between different people – called genetic variants – can also affect how genes work, and how genes are expressed. This comparison revealed 64 genetic variants from DNA in the cell nucleus that are associated with the expression of genes in the mitochondria. Some of these had a known link to genetic variants involved in diseases like the skin condition vitiligo or high blood pressure.

So, although mitochondria contain their own DNA, they rely on genes from the cell nucleus to function. Changes to the genes in the nucleus can alter the way that the mitochondria process their own genetic code. Understanding how these two sets of genes interact could reveal how and why mitochondria go wrong. This could aid in future research into illnesses like heart disease and cancer.


Mitochondrial DNA in cancer: Small genome, big impact

Mitochondria Credit: Cancer Research UK

Mitochondria are in vogue, with devotees ranging from exercise physiologists and sports scientists to molecular biologists and clinicians all coalescing around these unusual organelles.

Given their position as a metabolic and energetic hub, as well as their central role in controlling cell death, mitochondria are also an area of focus for many cancer scientists. However, one essential facet of mitochondrial biology in cancer has remained underexplored mitochondrial DNA (mtDNA).

A physically and heritably distinct piece of DNA that is passed down the generations through the maternal line, mtDNA exists only inside mitochondria. Recent work from my lab at the Cancer Research UK Beatson Institute, in collaboration with scientists at the Memorial Sloan Kettering Cancer Center in the US led by Dr. Ed Reznik, has revealed the substantial impact mutations in mtDNA can have in cancer. Understanding this could offer up new indicators of disease prognosis and provide a new focus for future therapeutics.

Mitochondria—a trans-kingdom enigma

At the molecular level, the components of mammalian mitochondria are assembled from viruses, bacteria and eukaryotes. As such, the organelle we see in human cells today is a trans-kingdom mixture that doesn't fully resemble any of its ancestors.

Human mtDNA is a small genome, only 16,569 base pairs long. In keeping with its bacterial ancestry, mtDNA is also circular and multicopy—with hundreds to thousands of copies present in every cell. mtDNA is very genetically compact and encodes only 13 proteins, all of which are core subunits of the oxidative phosphorylation (OXPHOS) complexes.

These OXPHOS complexes, found only within mitochondria, are unique in human biology as they are the only cellular structures formed of proteins encoded by genes from the two separate genomes. The nuclear DNA provides around 90% of the required proteins for OXPHOS, and the mtDNA provides the remaining 10%.

Mitochondria and disease: a charged history

Through the process of OXPHOS, an electrochemical gradient builds up in mitochondria, causing the inside of mitochondria to become negatively charged. This relative difference in charge is then harnessed by parts of the OXPHOS machinery to produce usable energetic molecules, however almost all other mitochondrial functions, unrelated to OXPHOS, also require this charged state to work. Without it, the precursors and products of reactions that occur in mitochondria begin to build up on the wrong sides of the mitochondrial membranes, unable to be transported as normal because of the electrochemical poise of the organelle. This results in various forms of metabolic dysfunction, which are the hallmark of rare mitochondrial diseases that occur when individuals are born with mutations in their mtDNA. Generally, these mutations result in a shift in metabolism towards greater utilization of glucose a short-term biochemical solution to the underlying issue that often results in severe disease.

Now, OXPHOS is not the only way to generate energy and building blocks for cells. A huge cancer research effort has gone into detailing the ways in which cancer cells can be rewired to survive and undergo rapid cell growth.

One of these metabolic changes is a much-discussed phenomenon known as the Warburg effect, where tumors generate large amounts of lactate by preferentially utilizing glucose as a fuel source, despite being in conditions where their mitochondria could pick up the slack. Otto Warburg, and many since, have suggested that the altered metabolism associated with this effect, and other forms of metabolic dysfunction, are a driver of cancer initiation and progression. However, consensus on this view has never been reached in the cancer research community and these metabolic changes are often seen as a consequence of cancer rather than a potential cause.

In light of recent developments, this view may need to evolve. While anecdotal evidence of mtDNA mutations arising in tumors has been around for nearly two decades, in the last five years or so several studies using large scale sequencing data concluded that roughly 60% of tumors bear mutations of mtDNA (1,2,3). While these studies lacked statistical power and clinical insight, such clear links between a highly abundant and plausible source of mitochondrial dysfunction and cancer had never been made previously. There was a growing temptation amongst some to view these tumors as isolated groups of cells with both cancer and severe mitochondrial disease.

In a paper recently published in Nature Metabolism, my lab and colleagues from Memorial Sloan Kettering Cancer Center, detail the patterns underlying mtDNA mutations, the impacts these mutations have on tumors and the clinical implications of this in colorectal cancer (CRC) patients. In agreement with previous studies, we found that around 60% of tumors contain one or more mtDNA mutations. We also found highly recurrent mutations occurring across all tumors at specific stretches of DNA where a single DNA base is repeated, known as a homopolymer. This is significant because recurrence is an indicator of selective pressure—it implies the mutation confers an advantage to the cancer.

We also calculated a comparative mutation rate for all known cancer-associated genes, which included mtDNA genes. Surprisingly, this led us to the conclusion that mtDNA genes are among the most mutated genes in all cancer, with 25 of the top 30 most mutated genes being encoded in mtDNA. While comparing mtDNA and nuclear DNA does have its limitations—a relative understanding of these mutations can give us a sense of context and proportion when considering cancer genetics. Additionally, we showed that the mutational burden in mtDNA is unrelated to the mutational burden in the nucleus. This is important, because nuclear DNA mutational burden in many tumors is associated with their response to, among other things, immune-targeted therapies. It's easy to see how mitochondrial mutational status could be harnessed to better allocate such treatments, if not allow development of mitochondria-targeted immunotherapies which may have advantages over current immunotherapy targets. Intriguingly, we saw that the burden of mutations in mtDNA was spread unevenly across the OXPHOS complexes. This hints at how tumors might harness specific forms of mitochondrial dysfunction, while struggling to survive with others and could inform future therapeutic approaches.

Mutations confer survival benefit

The mutations we detected in mtDNA are not arising late in tumor development but are present in stage 1 tumors at a comparable rate to stage 3 tumors. They also cause a distinct change in the way the nuclear DNA of the tumor cell is expressed. Increases in nuclear encoded OXPHOS genes and decreases in genes associated with innate immunity have been linked with diverse mtDNA mutations across nearly all cancers studied. Importantly, we found a substantial survival benefit for CRC patients whose tumors bear mtDNA mutations, with a decreased risk of death of 57-93% for the majority of those in this category.

This potentially holds significant immediate implications for CRC patient care, however, it also raises many other questions: will this impact be seen in other cancers or just CRC? What are the precise differences between mtDNA mutant vs non-mutant cancer, beyond the mtDNA changes? Do some approaches to therapy for these patients work better or worse because of this? Beyond these specific, immediate questions, do mtDNA mutations actually cause or predispose cells to becoming cancerous, and how has this been missed for so long?

A lot of clinical and laboratory work needs to be done to address these. However, some issues are easier to address. For example, in what now seems to be a major misstep, mtDNA has been actively excluded from analysis of sequenced tumors as a matter of course, mostly owing to technical issues that arise when mtDNA is included in the data. This is an unfortunate but likely reason for their relevance to cancer being overlooked.

For much of the history of cancer research, and with good reason, scientists have focused heavily on nuclear DNA. These efforts have led to cancer being seen by many as a disease of the genome, an understanding that our recent discoveries suggest should be broadened. Cancer: no longer a disease of the genome, but a disease of the genomes.

James B. Stewart et al. Simultaneous DNA and RNA Mapping of Somatic Mitochondrial Mutations across Diverse Human Cancers, PLOS Genetics (2015). DOI: 10.1371/journal.pgen.1005333

undefined undefined et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers, Nature Genetics (2020). DOI: 10.1038/s41588-019-0557-x


Discussion

Structural evolution of acrodontan mitogenomes

In the present study, mitogenomic sequences were collected from major representative lineages of Acrodonta to provide an opportunity to compare mitogenomic structures among three iguanian families. Iguanid mitogenomes were very conservative with no gene rearrangements. They also appear to have evolved much more slowly than agamid and chamaeleonid counterparts as judged from relatively short branch lengths within Iguanidae (Fig. 3). On the other hand, acrodontan (especially agamid) mitogenomes have an entirely different tendency with occasional gene rearrangements and increased molecular evolutionary rates.

In Fig. 5, possible lineages for the mitogenomic reorganizations described here are mapped along the phylogeny based on the parsimony criterion. Although agamid mitogenomes have several examples of gene rearrangements, most of them can be assigned to specific lineages without multiple parallel changes, supporting the rarity and less homoplasious nature of mitochondrial gene rearrangements [22]. The only possible homoplasious change is the translocation of the tRNA Pro gene from the 5' to 3' side of the CR in both Agaminae and Chamaeleonidae. However, the tRNA Pro genes in these taxa are placed in a somewhat different genomic context i.e., neighboring sequences composed of two types of AT-rich sequences are always present in the chamaeleonids (Fig. 2). This supports that translocations of the tRNA Pro gene in agamines and chamaeleonids resulted from independent events. Moreover, if this translocation had occurred once in a common ancestral lineage of Agaminae and Chamaeleonidae, multiple reversals to the original location of the tRNA Pro gene must be assumed in several agamid lineages, which seems unlikely (Fig. 5). Gene rearrangements around the CR have been shown to occur independently in multiple lineages by the canonical duplication-and-deletion mechanism [22].

Occurrence of mitogenomic structural changes in acrodont lizards. Lineages on which individual changes occurred were supposed by the parsimony criterion based on the phylogenetic framework (Fig. 3) and distributions of gene arrangements in extant species. See Fig. 1 for actual changes in the gene arrangements. The anticodon change in the tRNA Pro gene is TGG to CGG (see text).

To the best of our knowledge, very few studies [38, 49] have characterized structural organization and evolution of CRs in lizard mitogenomes. The present study shows that a region encompassing Boxes C, D and F retains a notable similarity among acrodonts and even among diverse groups of vertebrates (Additional File 1). On the other hand, acrodontan CRs do not seem to conserve CSB II in Domain 3 and agamids do not even retain CSB III (Additional File 1). This is in sharp contrast with CRs of several iguanids, gekkonids and lacertids which conserve all three members of CSBs (Additional File 1 [38]), presenting another evidence for more conservative nature of iguanid mitogenomes than acrodontan counterparts.

Phylogenetic relationships

As outlined earlier, acrodontan phylogeny was originally reconstructed with morphological data, which has been evaluated with molecular data using some mitochondrial and nuclear gene sequences. The present study addressed this issue using the hitherto longest molecular dataset (9,386 bp). As a trade-off of gaining the large number of sites, we had to somewhat sacrifice the depth of taxon sampling. It is thus important to assess monophylies of individual groups in order to interpret our results with respect to subfamilial or generic interrelationships.

Fortunately, previous molecular studies using a number of taxa but less sites provided strong evidence for a number of clades in Agamidae and Chamaeleonidae. For example, Honda et al. [12], Macey et al. [13] and Amer and Kumazawa [50] strongly, in terms of high bootstrap or other tree-support values, suggested monophylies of Uromastycinae, Amphibolurinae, Draconinae and Agaminae within Agamidae. The remaining subfamilies Leiolepidinae and Hydrosaurinae contain limited numbers of extant species and the monophyly of some leiolepidine species was also strongly supported [13]. In addition, the occurrence of subfamily-specific gene rearrangements (Fig. 5 [24, 25]) is consistent with the monophylies of the corresponding groups. Within Chamaeleonidae, recent molecular studies using multiple gene sequences [46, 51] supported the monophylies of two of the traditional six genera (i.e., Brookesia and Furcifer). Remaining traditional genera Chamaeleo, Calumma, Bradypodion and Rhampholeon may each be an assemblage of a few monophyletic groups [14, 15, 46, 51] but no definitive conclusion has been obtained on their phylogenetic relationships.

Our mitogenomic tree (Fig. 3) strongly supports the monophyly of Agamidae relative to Chamaeleonidae (1.00 Bayes-PP and 100% ML-BP values). This is not an artifact due to, e.g., the long branch attraction because all acrodontan lineages appear to possess accelerated molecular evolutionary rates relative to iguanids (Fig. 3). Morphological analyses (e.g., [7, 10]) and some molecular ones (e.g., [16, 41, 52]) did not support the agamid monophyly while other molecular studies (e.g., [13, 17, 53]) did. Although Agamidae has been regarded as a metataxon under the tentative assumption of its monophyly [2, 3, 10], this no longer seems necessary in light of our strong molecular evidence on the agamid monophyly.

Mitogenomic data provided agamid subfamilial interrelationships with strong tree-support values in general (Fig. 3). These relationships are consistent with the most parsimonious tree obtained by Macey et al. [13] using

1,500 bp mitochondrial gene sequences. However, the traditional morphological view tended to unite Uromastyx and Leiolepis into a basal clade (e.g., [3]). The Kishino-Hasegawa test (data not shown) suggested that this sister relationship of Uromastyx and Leiolepis is unlikely, though not rejectable (p = 0.275). The sister group relationship of Agaminae and Draconinae is common between morphological [3] and molecular (Fig. 3) results.

Our mitogenomic tree (Fig. 3) was consistent with other molecular studies [14, 46] with respect to the most basal divergence of genus Brookesia and the subsequent divergence of the Rhampholeon + Rieppeleon group. However, this phylogenetic relationship was not clear in Townsend and Larson [15] who used

1,500 bp mitochondrial gene sequences for phylogenetic inference. We conducted Bayesian analyses using combined mitochondrial gene sequences that were used by Raxworthy et al. [14] and Townsend and Larson [15] and the results (data not shown) also supported the most basal divergence of Brookesia as shown in Townsend et al. [51]. Although Brookesia (Madagascan leaf chameleons) and Rhampholeon (African leaf chameleons) were once grouped into a common subfamily Brookesiinae [5], another morphological study based on osteological characters [6] suggested the basal divergence of Brookesia alone. Taken together, but primarily based on our mitogenomic phylogeny (Fig. 3), we conclude that the Madagascan Brookesia represents the earliest shoot-off of extant chameleons.

The monophyly of traditional genus Chamaeleo was strongly suggested by morphological analyses based on distinct synapomorphies (e.g., four rotulae in a hemipenis) and it was subsequently supported by a molecular study [14]. However, another molecular study [15] found separate occurrence of its two subgenera (Chamaeleo and Trioceros) in the chamaeleonid phylogeny, albeit with little statistical evaluation for their non-monophyly. The most recent molecular study [46] showed stronger evidence for the separation of Chamaeleo and Trioceros, proposing their elevation to distinct genera. Our mitogenomic analyses (Fig. 3) showed that Trioceros melleri is placed distinctly from 6 representatives from Chamaeleo. The Kishino-Hasegawa test rejected (p = 0.017) the best tree obtained by constraining the Chamaeleo + Trioceros monophyly (Tree 5 in Table 2, see also Additional file 2). The more conservative Shimodaira-Hasegawa test (Table 2) did not reject this tree but its probability was very low (p = 0.137), supporting the formal elevation of Chamaeleo and Trioceros to distinct genera [46].

Together with results on testing some specific hypotheses derived from previous morphological and molecular analyses (Table 2), mitogenomic data seem to provide a certain level of resolution on the chamaeleonid phylogeny. To the best of our knowledge, clustering of Furcifer with a group of Calumma containing C. parsonii (Fig. 3) was not suggested by previous studies. Nor was clustering of these two taxa with Trioceros (Fig. 3). Evaluation of these new relationships, which did not receive strong bootstrap supports (Fig. 3), awaits further taxon sampling of mitogenomic and/or nuclear gene data.

Historical biogeography

Previous studies on the historical biogeography of Acrodonta did not necessarily postulate the monophylies of Agamidae and Iguanidae. Thus, terms such as the origins of Iguania, Acrodonta and Agamidae have been confusingly used. This study provided strong evidence for the monophylies of Agamidae and Iguanidae, with which previous biogeographic hypotheses can be reevaluated. Here, we discuss the acrodontan biogeography based on molecular, paleontological and geological evidence without a priori assumption of vicariance or dispersal (see Fig. 6).

The historical biogeography of acrodont lizards based on the molecular, paleontological and geological evidence. Paleogeographical maps at six different times [63] are shown on which a hypothesis on the origin and migration pathways for agamids (red) and chamaeleonids (blue) is illustrated. The earliest fossil records for acrodonts and chameleons are, respectively, Early-Middle Jurassic (165 - 200 MYA) Bharatoagama from the Kota Formation of India [61] and Miocene (

26 MYA) Chamaeleo caroliquarti from Bohemia [64]. Acrodont fossils of Priscagamidae are found from Aptian-Albian (100 - 120 MYA) and Campanian (

80 MYA) Central Asia and Mongolia [43, 54, 55]. Another acrodont fossil of a gliding lizard Xianglong is found from Early Cretaceous of China [56].

There have been two major hypotheses on the origin of Acrodonta, i.e., where the most recent common ancestor of Agamidae and Chamaeleonidae was, either Laurasian or Gondwanan. Occurrence of acrodontan priscagamid (and even pleurodont iguanian) fossils from mid-late Cretaceous of Asia (Fig. 6.3) led some researchers to hypothesize Laurasian (more specifically Central Asian or Mongolian) origin of Iguania and Acrodonta (e.g., [54, 55]). A recent report of a gliding acrodont lizard (Xianglong) from Early Cretaceous of China [56] may also support this idea. Extant agamid lizards are distributed primarily in Eurasia but some occur in Australasia and Africa. Recent molecular phylogeny ([12, 25, 50, 57] but see [13, 58]) is consistent with a view that extant Agamidae originated from Asia and that some descendant lineages (e.g., Amphibolurinae and Uromastycinae) dispersed to Australasia and Africa during Cenozoic times when they were geographically connected to or in close proximity to Eurasia. On the other hand, there is good agreement in Gondwanan (more specifically Madagascan or African) origin of extant chamaeleonids [5, 6, 14, 59]. Taken together, the Laurasian origin of Acrodonta requires the long-distance transmarine dispersal of chamaeleonid ancestors from Eurasia to Madagascar/Africa.

Gondwanan origin of Iguania was proposed by Estes [60] based on some fossil evidence and the basal divergence of Iguania from Scleroglossa, which is not tenable by recent molecular phylogeny. Macey et al. [13] used molecular phylogeny to advocate the Gondwanan origin of Acrodonta and proposed that major acrodontan lineages diverged vicariantly and/or migrated to the Northern Hemisphere by plate tectonics (i.e., collision of Indian subcontinent or other Gondwanan land blocks to Eurasia). Although a recent molecular study [17] further supported the out-of-India radiation of a subfamily Agaminae, other molecular studies [25, 50, 57] questioned the Gondwanan vicariance or multiple northward migrations for at least some lineages (e.g., Amphibolurinae and Uromastycinae).

More recently, some fossil evidence shed a light on this issue. Bharatagama from Early-Middle Jurassic Kota Formation of India (Fig. 6.2) represents the oldest record of early acrodont iguanians [61]. This fossil record may support the Gondwanan origin of acrodonts [47, 61]. If acrodonts did originate from Gondwanaland, it is consistent with the likely Gondwanan origin of Iguanidae sensu lato ([18] refs. therein) but ancestors of extant agamids, which were postulated to be in Asia (see above), may need to have migrated from the Southern to Northern Hemisphere by transmarine dispersal.

The present study (Fig. 4) suggested that Agamidae and Chamaeleonidae are each monophyletic and that they diverged from each other in the mid-Cretaceous (96-122 MYA) although independent molecular dating using nuclear genes suggested somewhat younger dates around 85 MYA [42]. Okajima and Kumazawa [18] previously showed that the appreciable gap in the estimated divergence time of oplurine iguanids between mitochondrial [18] and nuclear [62] gene sequences could be due to multiple factors, such as differences in the tree topology and time constraints assumed for each study, the intrinsic data property of mitochondrial and nuclear sequences, and relatively poor squamate fossil records [47] that can be used to constrain ingroup squamate divergences precisely.

In spite of this somewhat low precision of time estimation, the molecular dating results (Fig. 4 [42]) consistently suggest that Agamidae and Chamaeleonidae were separated after the Middle-Late Jurassic break-up of Pangea into Laurasia and Gondwanaland (Fig. 6.2) and when the latter two supercontinents were further fragmented by plate tectonics [63]. Geological data suggest that India and Madagascar were drifted from Gondwanaland in the Early Cretaceous (120-130 MYA)(Fig. 6.3) and separated from each other in the Late Cretaceous (

90 MYA)(Fig. 6.4)[63]. Then, India moved northward and accreted to Eurasia from the latest Cretaceous to Eocene (Fig. 6.5)[63].

Assuming Gondwanan origin of Acrodonta, the molecular dating results (96-122 MYA from Fig. 4 and

85 MYA from [42] for the divergence of Agamidae and Chamaeleonidae) are consistent with a view that Agamidae vicariantly diverged from Chamaeleonidae on the India/Madagascar landmass. It may be further hypothesized that Agamidae migrated to Eurasia on the drifting Indian subcontinent (Fig. 6.5) while Chamaeleonidae was left within Madagascar and its descendants migrated to Africa over Mozambique Channel and later to Eurasia (Fig. 6.6). Molecular dating (Fig. 4) suggested that extant chamaeleonid genera diverged during Cenozoic times. Some of them are distributed exclusively in Madagascar (Furcifer, Calumma and Brookesia) while the others are distributed in Africa (Rhampholeon/Rieppeleon and Bradypodion/Kinyongia/Nadzikambia) or in Africa + Eurasia (Chamaeleo/Trioceros). Because Africa and Madagascar had been clearly separated in the Cenozoic [63], generic radiations of chamaeleonids cannot be associated with Gondwanan vicariance. As previous authors postulated [14, 15], chamaeleonids are likely to have experienced transmarine dispersal over Mozambique Channel multiple times. To the best of our knowledge, the oldest certain fossil record of Chamaeleonidae is Chamaeleo caroliquarti from western Bohemia [64]. Africa had long been isolated from other continents but connected to Eurasia in the Miocene [63]. Thus, occurrence of this fossil in the Miocene of Europe is consistent with the above-mentioned biogeographic explanation.

Molecular dating (Fig. 4) also suggested that extant agamid subfamilies diverged from each other in the Late Cretaceous. If Agamidae did migrate to Eurasia on India as hypothesized above, the dating result suggests that subfamilial radiations of agamids occurred on the drifting Indian subcontinent. This may sound somewhat unlikely but this is not impossible under the assumption that a number of ancient agamid lineages radiated, dispersed locally in Asia, and became extinct. Vastanagama susani and Tinosaurus indicus from the Early Eocene of India are known as the earliest certain agamid fossils in South Asia [65].

Then, how can this hypothesis based on Gondwanan origin of extant acrodont groups reconcile with the early Cretaceous Xianglong and mid-late Cretaceous priscagamid acrodonts from Laurasian sites (Central Asia and Mongolia)? Xianglong and priscagamids are stem acrodont lizards which are unlikely to be nested within extant agamids but their exact phylogenetic positions are not yet known [56, 66]. Therefore, they probably diverged from a long branch of stem acrodonts 110-200 MYA (see Fig. 4). These extinct groups may be Laurasian relics of acrodont lizards which had diverged from Gondwanan acrodonts (i.e., direct common ancestors of extant agamids and chamaeleonids) before Pangean break-up into Laurasia and Gondwanaland. Alternatively, they may simply have derived from the Gondwanan ancestors by transmarine dispersal.


An assessment of the value of nuclear and mitochondrial genes in elucidating the origin and evolution of Isotoma klovstadi Carpenter (Insecta, Collembola)

In order to infer the origin and the evolution of Antarctic Collembola, a correct phylogenetic analysis depicting relationships among Antarctic and non-Antarctic species is required. A preliminary assessment of the value of DNA sequences in reconstructing phylogenetic relationships among the Antarctic Isotoma klovstadi and other non-Antarctic species was carried out by sequencing one mitochondrial gene (Cytochrome c oxidase, subunit II) and two nuclear genes (a fragment of the 28S rDNA and the Elongation Factor-1α). Estimates of base composition heterogeneity revealed that in the two protein-coding genes (COII and EF-1α) 3rd codon position sites are compositionally very heterogeneous and the analysis of these two genes was therefore performed only on 1st and 2nd codon position sites. Phylogenetic analyses using Maximum Likelihood, Maximum Parsimony and Minimum Evolution revealed that the COII and the EF-1α genes are more suitable than the D3 fragment for the reconstruction of phylogenetic relationships within the Family Isotomidae to which Isotoma and several other genera of Antarctic Collembola belong.


Nuclear and mitochondrial RNA editing systems have opposite effects on protein diversity

RNA editing can yield protein products that differ from those directly encoded by genomic DNA. This process is pervasive in the mitochondria of many eukaryotes, where it predominantly results in the restoration of ancestral protein sequences. Nuclear mRNAs in metazoans also undergo editing (adenosine-to-inosine or ‘A-to-I’ substitutions), and most of these edits appear to be nonadaptive ‘misfirings’ of adenosine deaminases. However, recent analysis of cephalopod transcriptomes found that many editing sites are shared by anciently divergent lineages within this group, suggesting they play some adaptive role. Recent discoveries have also revealed that some fungi have an independently evolved A-to-I editing mechanism, resulting in extensive recoding of their nuclear mRNAs. Here, phylogenetic comparisons were used to determine whether RNA editing generally restores ancestral protein sequences or creates derived variants. Unlike in mitochondrial systems, RNA editing in metazoan and fungal nuclear transcripts overwhelmingly leads to novel sequences not found in inferred ancestral proteins. Even for the subset of RNA editing sites shared by deeply divergent cephalopod lineages, the primary effect of nuclear editing is an increase—not a decrease—in protein divergence. These findings suggest fundamental differences in the forces responsible for the evolution of RNA editing in nuclear versus mitochondrial systems.

1. Introduction

The ‘central dogma of molecular biology’ holds that genetic information stored in DNA is decoded into functional proteins, with messenger RNAs (mRNAs) acting as faithful intermediates. An important caveat is that RNA editing can result in amino acid sequences that differ from those encoded in the genome [1]. Editing can involve individual base substitutions as well as short indels. Such changes are often important for proper cellular and organismal function, as disruption of editing can have deleterious and even lethal phenotypic consequences [2,3]. It is less clear, however, whether editing can be considered adaptive in the sense that it provides some fitness benefit over direct encoding of corrected sequences in genomic DNA. Many adaptive hypotheses have been advanced (e.g. related to mutational buffering, gene regulation, proteome diversification and genomic GC content optimization), but non-adaptive mechanisms for the proliferation of editing sites have also been described [4].

Some insight into the origins and maintenance of mRNA editing may be gleaned by investigating its effects on protein conservation and diversity. Editing is well studied and often pervasive in the mitochondria of diverse eukaryotes, including land plants, trypanosomes, diplonemids, dinoflagellates, heteroloboseans, myxomycetes and some metazoans [1,5–7]. Editing in these mitochondrial systems is generally restorative, meaning that it tends to produce ancestral-like protein sequences that more closely resemble homologues in other eukaryotes [1]. In fact, a simple and effective method to predict mRNA editing sites in land plant organelle genomes (where cytidine-to-uridine or ‘C-to-U’ editing is common) is to scan genes for sites where C-to-T changes would increase protein sequence conservation with related species [8]. The restorative effects in systems with insertional editing are even more dramatic because they generally ‘correct’ shifted reading frames that would otherwise produce completely unrelated proteins. In many cases, mitochondrial mRNA editing is so extensive that the unedited gene sequences are essentially unrecognizable [6].

Metazoans have a nuclear RNA editing system, in which adenosine-to-inosine (A-to-I) substitutions are introduced by a specific class of adenosine deaminases known as ADARs [2]. During translation, inosine is read as a guanosine, so A-to-I editing can result in changes to amino acid sequences. The effects of nuclear A-to-I editing on protein conservation are less clear than those of the mitochondrial systems described above. A-to-I editing is often described as a mechanism that diversifies the proteome [9], but it has also been argued that it preferentially acts on sites that experienced a historical G-to-A change at the genomic level and thereby restores ancestral protein sequences [10,11]. Observed patterns of mRNA editing in humans have led to the conclusion that changes in protein-coding sequences are generally nonadaptive. Very few editing sites are shared with other mammals [10], and editing appears to be more common at sites that are less functionally important (e.g. synonymous sites) [12], suggesting that most edits are just tolerable by-products of promiscuous enzyme activity. This conclusion is thought to extend to other metazoan systems, but recent research has indicated that A-to-I mRNA editing is much more extensive and potentially adaptive in coleoid cephalopods [13]. Notably, some identified editing sites are even shared across representatives of divergent cephalopod groups that span hundreds of millions of years of evolution (i.e. octopus, squid and cuttlefish) [14].

A-to-I editing of nuclear mRNAs has also been discovered in the fungal genus Fusarium [15] and other filamentous ascomycetes [16], with large predicted effects on protein sequences during sexual development. Interestingly, this RNA editing system appears to be independently evolved because fungi lack ADARs, which are responsible for mRNA editing in metazoans.

Here, the consequences for protein diversity resulting from A-to-I editing of nuclear mRNA transcripts in metazoan and fungal lineages are compared to the well-documented restorative effects in mitochondrial systems. This analysis reveals that nuclear and mitochondrial editing systems have strikingly opposite effects on protein conservation. Rather than restoring ancestral protein sequences, the vast majority of A-to-I mRNA edits introduce evolutionarily derived amino acid changes.

2. Material and methods

Published datasets were obtained for A-to-I editing of nuclear transcripts in four focal species—Homo sapiens [12], Drosophila melanogaster [17], Octopus bimaculoides [14] and Fusarium graminearum [15]—and for C-to-U editing of mitochondrial transcripts in the angiosperm Arabidopsis thaliana [18]. For each species, edited protein sequences were mapped with NCBI BLAST v. 2.2.30+ to either protein (blastp) or genome/transcriptome (tblastn) databases from two successive outgroup species (table 1) to identify the amino acid states at orthologous positions. The chosen metazoan outgroups were previously shown to share a negligibly small fraction of editing sites with the focal species [10,14,17], and the proportion of shared editing sites also appears to be very low among filamentous fungi [16]. BLAST searches and extraction of sequence information were automated with custom BioPerl scripts. Analysis was restricted to nonsynonymous editing sites for which both outgroups share the same amino acid so that the ancestral state could be confidently inferred. Editing was defined as ‘restorative’ or ‘diversifying’ if the ancestral amino acid matched the edited and unedited state, respectively. Sites were excluded if the edited and unedited states were both different than the ancestral amino acid (electronic supplementary material, table S1). Statistical analysis was performed in R v. 3.3.3 (see electronic supplementary material, Methods).

Table 1. Focal species and outgroups for analysis of RNA editing and reconstruction of ancestral states.

3. Results

Analysis of C-to-U editing sites in A. thaliana confirmed that mRNA editing in plant mitochondria generally increases protein similarity across taxa. At 98.0% of the analysed sites, editing restores the amino acid found in two green algal outgroups (figure 1). Conversely, just 2.0% of these edits replace the ancestral state with a derived amino acid that differs from the two outgroups. The effects of A-to-I nuclear mRNA editing in metazoans are dramatically different. Only a small fraction of the analysed A-to-I sites lead to restoration of the ancestral state: 0.6%, 4.1% and 5.9% in D. melanogaster, H. sapiens and O. bimaculoides, respectively (figure 1).

Figure 1. Differences in rates of restorative changes for nuclear A-to-I editing in four different species and mitochondrial C-to-U editing in the angiosperm Arabidopsis. Whereas mRNA editing generally restores ancestral-like protein sequences in most mitochondrial systems (including land plants as shown here), nuclear A-to-I editing is rarely restorative. Lettering is based on post hoc comparisons of each pairwise combination of species (electronic supplementary material, Methods). Species that do not share a letter in common are significantly different from each other. Error bars represent two standard errors of the proportion. The number of analysed sites is indicated in parentheses.

RNA editing is especially abundant in coleoid cephalopods, and previous analysis of O. bimaculoides has shown that most of its editing sites are either unique to that species or shared only with its closely related congener O. vulgaris [14]. However, a small proportion of the editing sites were identified as being shared with other anciently divergent lineages of coleoid cephalopods [14]. These shared sites, which represent the most likely candidates to play functionally important roles, also exhibit a low rate of restorative changes. In fact, in O. bimaculoides, the ratio of restorative to diversifying changes decreases to even lower values for the editing sites that are more widely shared with other cephalopods (figure 2a table 2).

Figure 2. Octopus bimaculoides editing sites were distinguished based on whether they are found only in O. bimaculoides (‘unique’), shared with the congener O. vulgaris but not with other cephalopods (‘octopus’), shared with the squid Doryteuthis pealeii but not with all sampled cephalopods (‘oct-squid’), or shared with all sampled cephalopods (‘cephalopods’) [14]. (a) The proportion of edits that restore the ancestral protein sequence decreases for classes of edit sites that are more widely shared across cephalopods. (b) More widely shared sites also have a higher editing frequency (i.e. a larger percentage of transcripts are edited at that site) [14]. Within each group, however, edits that restore the ancestral amino acid have a higher average editing frequency than those that result in a derived change. Errors bars represent one standard error of the proportion (a) or of the mean (b). The number of analysed sites is indicated in parentheses.

Table 2. Logit model predicting the probability that an edit will be restorative based in the editing frequency at the site and extent of phylogenetic conservation of the editing site. Level of editing is expressed as a per cent (0–100), and the phylogenetic conservation parameters are expressed relative to the most widely shared ‘cephalopod’ category (figure 2).

In addition to having low rates of restorative editing, the most widely conserved editing classes in O. bimaculoides were previously shown to exhibit the highest levels of editing (i.e. the fraction of transcripts that are edited at a given site) [14] (figure 2b). Overall, however, editing sites that restore the ancestral amino acid state are edited at a significantly higher level (11.9%) on average than those that produce a derived change (9.2% table 2). This effect is driven by the higher level of editing at restorative sites within each of the phylogenetic-conservation classes—particularly for editing sites that are unique to O. bimaculoides or only shared within the Octopus genus (figure 2b)—which more than offsets the negative association that exists across classes between the average level of editing and rates of restorative editing (figure 2).

Although A-to-I editing in filamentous fungi appears to be evolutionarily independent from the ADAR system in metazoans, the fungus F. graminearum exhibits similarly low rates of restorative editing. Only 4.6% of the analysed A-to-I sites in F. graminearum nuclear mRNAs lead to restoration of the ancestral state (figure 1).

4. Discussion

Previous research on A-to-I mRNA editing in metazoans led to conclusions that ‘editing can mediate RNA memory on evolutionary time scales to maintain ancestral genetic information’ [11] and ‘editing serves as a mechanism to compensate for a loss of phenotype caused by G-to-A evolution’ [10]. These studies have shown that A-to-I editing sites are more likely to have experienced a previous G-to-A change in DNA sequence than a C-to-A or T-to-A change. Such patterns are important but may largely be explained by the fact that sites that historically accommodated a G will tend to be more permissive of A-to-I editing [12]. From this perspective, focusing on the role of A-to-I editing in reversing G-to-A mutations may arguably miss the bigger picture that it is far more common for A-to-I editing to introduce novel, derived amino acids than to restore ancestral protein sequences (figure 1). The present analysis has shown that this pattern applies not only to diverse metazoan lineages but also to an independent origin of A-to-I nuclear editing in fungi.

This feature of nuclear mRNA editing distinguishes it from mitochondrial editing systems, which generally have restorative effects on protein sequences. Such effects are exemplified here by the C-to-U edits in land plant mitochondrial genomes (figure 1), but they have been documented in numerous mitochondrial systems [1,5–7]. The contrasting effects of nuclear A-to-I editing may, in part, reflect mechanistic differences. In many mitochondrial systems, there is (relatively) precise determination of editing sites based on trans-acting factors or strict cis sequence motifs [19], whereas the adenosine deaminases responsible for A-to-I editing in metazoans appear to have limited specificity. The profile of A-to-I nuclear mRNA editing is also dominated by sites with low editing frequencies. Analysis of the O. bimaculoides transcriptome identified tens of thousands of editing sites in protein-coding sequences [14], but the median level of editing was only 3.4%. Thus, even in cephalopods, where there is evidence that A-to-I editing of protein-coding sequences plays a larger and more adaptive role than in other metazoans [13,14], it is likely that most identified editing sites are nonetheless the result of nonadaptive, off-target activity.

Even so, the differences between mitochondrial RNA editing systems and nuclear A-to-I editing cannot be attributed entirely to differences in enzyme promiscuity. Rates of restorative changes are also extremely low for the subset of A-to-I sites that are edited at high levels and shared among distant relatives (figure 2). One leading hypothesis to explain the proliferation of RNA editing is that the existence of editing activity facilitates the neutral spread by genetic drift of otherwise deleterious mutations. Upon reaching fixation, such mutations would make the formerly nonadaptive editing activity a functional necessity [4]. Now known as ‘constructive neutral evolution’ [20], this nonadaptive model provides a cogent explanation for the extensive restorative editing in mitochondrial genomes but is a seemingly poor fit for the editing patterns in nuclear genes. Instead, it is likely that the evolution of nuclear A-to-I editing and its effects at key functional sites in protein-coding sequences can be attributed to more conventional adaptive explanations associated with regulation and expansion of proteome diversity.

Data accessibility

Site-by-site data are provided as electronic supplementary material, File S1.


Results

Strains and their genetic loci

Seven geographical isolates of C. reinhardtii were employed in this study their strain numbers, mating types, origins of isolation, and strain abbreviations are presented in Table 1. To access levels of genetic diversity we sequenced the complete mitochondrial genome and portions of 7 single-copy nuclear genes from each of the 7 isolates. A genetic map of the C. reinhardtii mitochondrial genome is shown in Figure 1, and partial genetic maps of the 7 nuclear genes are shown in Figure 2. We sequenced the entire mtDNA in order to employ both intergenic regions and synonymous sites in our calculations of πsilent – previous studies on genetic diversity in mitochondrial genomes, due to a paucity of intraspecific sequence data, have tended to use only synonymous sites for estimating πsilent. Moreover, whole mtDNA sequences from C. reinhardtii allow for the comparison of synonymous-site nucleotide diversity (πsyn) in the standard mitochondrial protein-coding genes to that of rtl, a mitochondrial open reading frame (ORF) in the C. reinhardtii mtDNA coding for a putative reverse-transcriptase-like protein [15]. It has been suggested that synonymous sites in rtl are under less selective constraints than those of the standard mtDNA protein-coding genes and that they may be more appropriate for estimating the neutral mutation rate in the mitochondrial compartment [7]. For the nuclear loci, we sequenced mostly introns rather than exons because it is believed that in the C. reinhardtii nuclear genome intronic sites are more neutrally evolving than synonymous sites and may give more reliable estimates of the neutral mutation rate [6]. Sequences for two of the nuclear loci from the 7 isolates have been previously reported [16–18] allowing us to confirm both our strain assignations and sequencing methods.

Genetic map of the Chlamydomonas reinhardtii mitochondrial genome, including all currently identified optional introns. Protein-coding regions and regions encoding structural RNAs are red and orange, respectively. S1–S4 represent the small-subunit rRNA-coding modules L1–L8 represent the large-subunit rRNA-coding modules. The terminal inverted repeats (IR) are black. Intronic regions and their open reading frames are boxed in blue inside their associated genes. The C. reinhardtii strains (Table 1) in which the different introns occur are labelled in parentheses. Solid arrows denote the transcriptional polarities. Note: due to the presence/absence of introns among the different strains, the size of the C. reinhardtii mitochondrial genome can vary from 15,782 nt to 18,990 nt.

Partial genetic maps of the 7 Chlamydomonas reinhardtii nuclear-encoded genes employed in the analysis. The bracketed segment beneath each map represents the region that was PCR amplified. Left of each map is the name of the gene, the approximate size of the region that was PCR amplified, and the location of the gene within the C. reinhardtii nuclear genome – locations are based on the C. reinhardtii draft nuclear genome sequence version 3.0 [8]. Exons are red they are labelled with an "E" and a number denoting their position within the gene. Introns are blue and are labelled with a roman numeral denoting their location within the gene. Note: each of these genes is present only once in the C. reinhardtii nuclear genome.

We were able to obtain the complete mtDNA sequence from an 8 th strain of C. reinhardtii (CC-503) by collecting and assembling mtDNA sequences that were generated from the C. reinhardtii nuclear-genome sequencing project [8, 19]. Both C. reinhardtii CC-503 and C. reinhardtii CC-277 (one of the 7 isolates described in Table 1) are cell-wall-less mutants recovered from the same "Ebersold-Levine" wild-type background of C. reinhardtii, but they have been separated for at least 35 years [20]. The mtDNA sequence of C. reinhardtii CC-503 is identical to that of C. reinhardtii CC-277 and when we downloaded the sequences of the 7 nuclear loci for C. reinhardtii CC-503, they too were identical to those of C. reinhardtii CC-277. Therefore, for the purpose of this study we will be considering C. reinhardtii MA-1 as synonymous with C. reinhardtii CC-277 and CC-503.

Prior to this study a complete mtDNA sequence for C. reinhardtii was already available (Genbank accession number NC_001638) this sequence, which resulted from the accumulated efforts of multiple parties, mostly came from C. reinhardtii CC-277, or in some cases from strains having the same genetic background as C. reinhardtii CC-277. The mtDNA sequence of C. reinhardtii CC-277 presented here differs at 46 positions relative to NC_001638 because 44 of these 46 differences are also present in the mtDNA of the other six C. reinhardtii isolates described here and because the C. reinhardtii CC-277 mtDNA sequence from this study was shown to be identical to the C. reinhardtii CC-503 mitochondrial genome, we feel that our version of the C. reinhardtii CC-277 mtDNA is currently the most accurate and that the discrepancies between our sequence and NC_001638 are the result of sequencing errors in the latter.

It is important to note that our annotation of the C. reinhardtii mitochondrial genome (Figure 1) does not contain the so-called rRNA-coding modules L2b and L3a. In previous studies each these modules was presumed to code for a non-core region of the large-subunit (LSU) rRNA [21]. However, because sequence homologs of L2b and L3a have not been identified in the mtDNA of close relatives to C. reinhardtii [7, 22, 23], or in any other genome, we have classified these regions as intergenic DNA, and have treated them as such for all genetic analyses.

Nucleotide diversity

Summary statistics of the nucleotide diversity in C. reinhardtii are shown in Table 2. Two measures of nucleotide diversity were used to calculate variation within the C. reinhardtii mitochondrial and nuclear genomes: π, which is the average number of pair-wise nucleotide differences per site between sequences in a sample [24], and θW, which is based on the number of polymorphic sites in a sample of sequences but is independent of their frequency [25]. With respect to both measures, the nuclear compartment shows significantly more silent-site nucleotide diversity than the mitochondrial compartment: net values of πsilent for the nuclear DNA and mtDNA were 31.96 × 10 -3 and 8.54 × 10 -3 , respectively. Net values of θW at silent sites are slightly higher at 33.02 × 10 -3 for the nuclear compartment and 9.18 10 -3 for the mitochondrial compartment. In all cases, silent sites in the various nuclear loci show more diversity than the silent sites in the mitochondrial compartment. The only exception to this is the nuclear gene CBLP, which has less synonymous-site diversity (πsyn = 2.77 × 10 -3 ) than that of the mitochondrial protein-coding regions. Within the mitochondrial compartment, diversity at intergenic and synonymous sites is similar (8.92 × 10 -3 vs. 8.52 × 10 -3 ), as is the diversity of protein-coding regions and regions coding for structural RNAs (2.06 × 10 -3 vs. 2.42 × 10 -3 ). The mitochondrial gene rtl, which encodes a putative reverse transcriptase, shows more diversity than the other mitochondrial protein-coding genes when all 3 codon sites are considered (3.07 × 10 -3 vs. 2.06 × 10 -3 ) and slightly less diversity when looking only at synonymous sites (7.88 × 10 -3 vs. 8.52 × 10 -3 ) however, it is unlikely that these observations are statistically significant.

Insertions and deletions

For both the nuclear and mitochondrial compartments, insertions and deletions (indels) represent a large proportion of the observed polymorphisms (Table 2). In our alignments of the nuclear loci from the 7 different strains of C. reinhardtii, 36% of mismatched nucleotides result from indels. The nuclear indels range from 1–31 nucleotides (nt) in length and have an average size of 4.5 nt. In the mitochondrial compartment indels represent 20% of the mismatched nucleotides. The mitochondrial indels range from 1–6 nt in length and have an average size of 2.5 nt. It is important to note that our estimates of nucleotide diversity shown in Table 2 are derived from sites in the alignment where all seven strains of C. reinhardtii have a nucleotide therefore, sites corresponding to indels were removed from the alignment. If our methods for calculating π are modified to include indels (by counting each gap in the alignment as a nucleotide change) the overall values of πsilent in the nuclear and mitochondrial compartments become 49.27 × 10 -3 (± 4.89 × 10 -3 ) and 10.93 × 10 -3 (± 1.96 × 10 -3 ), respectively.

Testing for neutrality

Two statistical tests were performed on the mitochondrial and nuclear datasets to examine for traces of selection: Tajima's D-test, which compares the average number of nucleotide differences between pairs of sequences to the total number of segregating sites [24], and the McDonald-Kreitman test, which compares the ratio of nonsynonymous to synonymous differences observed within a species to that observed between species [26]. Tajima's D is slightly negative in all cases pertaining to the mitochondrial compartment and in most cases pertaining to the nuclear compartment, but it is slightly positive for a few of the nuclear loci (the exons of MAT3 and PDK, and the introns of CBLP, PETC, PDK, and ACTIN) (Table 2). In no case is Tajima's D-test statistically significant. The McDonald-Kreitman test was performed by comparing the ratio of nonsynonymous to synonymous polymorphisms within C. reinhardtii to the ratio of nonsynonymous to synonymous fixed differences between C. reinhardtii and Chlamydomonas incerta (one of the closest known non-interfertile relatives of C. reinhardtii [27]) (Table 3) – this was done for all of the protein-coding regions surveyed in this study. Overall, no significant departures from neutral expectations were detected for any of the mitochondrial or nuclear loci, and in no case is the McDonald-Kreitman test statistically significant.

Mitochondrial introns

Three of the C. reinhardtii strains (PA-1, MA-2, and FL) have introns in their mtDNA (Figure 1). C. reinhardtii MA-2 has a single intron, inserted into cob C. reinhardtii FL has 2 introns, one in the L5-rRNA-coding module (the L5-intron) and one in the L7-rRNA-coding module (the L7-intron) and C. reinhardtii PA-2 has 3 introns, two in cox1 and the L5-intron (note: the DNA sequence of the L5-intron in C. reinhardtii PA-2 is identical to that of C. reinhardtii FL). Of these introns only that of cob in C. reinhardtii MA-2 has been previously described [13]. Like the intron of cob in C. reinhardtii MA-2, each of the 4 introns presented here has an ORF for which the deduced amino acid sequence shows similarity to a LAGLIDADG-type endonuclease. RT-PCR experiments confirm that all five introns, including their ORFs, are spliced-out in mature transcripts. Secondary-structure modelling suggests that the two introns in cox1 are group I introns belonging to subgroup D. Our analyses of the L5- and L7-introns suggest that they lack the core sequence and potential secondary structure necessary to be classified as either group I or group II introns thus, at the present time they are considered highly-degenerate "unclassified" introns. A 35-nt duplicated portion of the L5-rRNA-coding module is found within the 5' end of the L5-intron RT-PCR experiments validate that this segment is in fact a component of the intron. The insertion sites of the L5- and L7-introns within the C. reinhardtii mtDNA and the nature of the repeat found within the L5-intron are described in Figure 3A, B, and 3C, respectively. The insertion sites of the L5- and L7-introns in context to the C. reinhardtii LSU rRNA sequence are shown in Figure 3D.

Schema of the introns in the L5- and L7-rRNA-coding modules. The vertical arrows in A show the intron insertion sites within the C. reinhardtii mtDNA. B and C depict the introns in the L5- and L7-rRNA-coding modules, respectively rRNA-coding regions are orange introns are light blue intronic open reading frames are boxed in dark blue within their respective introns L5-frag refers to a duplicated segment of the L5-rRNA-coding module (the first 35 nt of the module are duplicated) bracketed portions of the map represent regions that were shown to be spliced-out in mature transcripts. D depicts the intron insertion sites in the context of the large subunit (LSU) ribosomal RNA sequence of C. reinhardtii arrows point to the region where the introns are inserted numbers above the arrows denote the position of the residue that immediately precedes the insertion site: un-bracketed numbers correspond to the residue in the 23S rRNA gene of Escherichia coli [44] and bracketed numbers correspond to the residue in the LSU-rRNA secondary-structure model of Boer and Gray [21]. Note: the C. reinhardtii strains in which these introns occur are shown in Figure 1.


How to analyze your genome Part I—Mitochondrial DNA

Native structure of the human MT-Cyb protein subunit. Credit: proteinmodelportal.org

Genome analysis today is basically blind. It typically proceeds by randomly inspecting a smattering of possible variants that are only loosely associated with some disease or physical trait. Unless you already have a major health problem, this kind of narrowly focused crapshoot is not likely to be a game changer for you.

In this void, we would like to offer a more methodical approach—a simple formula to logically parse genomes at their critical points, and establish reliable physiologic predictors that are relevant for anyone. But first, a little background is required.

Our genomes are hybrids that have been built by viruses and bacteria. We have two of them, a big genome and a little one. Bioinformatics in general has mostly ignored the simple 16500 position mitochondrial DNA (mtDNA), and instead focused almost exclusively on the complex 3 billion position nuclear DNA (nucDNA). As we shall see, this is completely backward.

Except a small peptide called humanin, all the proteins that are coded within the tiny human mitogenome are used exclusively in respiratory complexes I through V. Two dueling 3-D print heads (see below) localized on either side of the double mitochondrial membrane system cooperatively extrude and deploy each subunit into the proper assembly compartment. Once there, assembly factors specific for each complex sequentially stitch each together. The mitochondrial matrix side of this membrane hosts the mitoribosomes while the cell side hosts the cytosolic ribosomes.

Cytosolic and Mitoribosomes working in tandem on either side of the double mitochondrial membrane. Credit: Wikipedia.org

Mitochondria evolved as bacterial endosymbionts. They built the very first eukaryotic nucleus, and every nucleus since, using copy-paste-modify hardware they borrowed from viruses. Nucleotide tapes are read, written, and altered using mitochondrial-specific DNA and RNA polymerases that have long since been offloaded to the nucleus for storage, along with most of their other essential proteins. Understanding this mitochondrial construction project will be our key to unlocking the entire genome.

There are at least 1500 places in the nucDNA that we are concerned with. These are the locations that code for proteins, and some RNAs, that are used in mitochondria. The trick to recognizing these genes is that they usually start off with a mitochondrial localization sequence that targets them to the right organelle. Not all of these genes that have been culturally assimilated into the nucleus are migrants from the nucleoid. Many have simply duplicated themselves from existing nuclear genes and subsequently conjured up an alternative way to splice in an organelle localization motif.

While the full list of these genes (the "mitonuclear genome"), has yet to be completely discovered, many that have been mined so far reside online at the MitoCharta website. Researchers continue to find many more citizens of the mitonuclear genome. These are not expressed by all tissues, and do not always contain localization motifs, but can make their way in to mitochondria. Collectively, these are the critical points of the hybrid genome.

In other words, the sweet spot in genomics lies at the places where the two genomes intersect. To analyze them, we must search for correlations between polymorphism in the expanding mitonuclear genome and the tiny mitogenome. More specifically, we need to create a panel of effects that might be expected when variants in the >1500-strong mitonuclear genome are found alongside specific variants in the 13-strong mitogenome.

Until recently, searching disease databases for single variants that might match a patient's was the only way to analyze risk or diagnose many rare disorders. Mitochondrial disease, once considered extremely rare, is actually something that affects everyone in some form or another. While it is possible to create cell hybrids or 'cybrids' to explore effects of specific mtDNA mutations in a research setting, this is not typically done for individuals. Fortunately, there is now another way forward, namely, structural modeling and molecular dynamic simulation.

The beauty of this approach is that it can outperform crude database searches of other people's business (and frequently other organisms) that only return a diffuse hodgepodge of poor matches to an individual's particular set of variants. Now, any genetic profile can be directly reverse-engineered. In other words, we can individually tag all our variants, and then explore the implications of each on a complex-by-complex basis.

Each complex is embedded within a periphery of associated import, replication, translation, and metabolic cycle components of the mitonuclear arsenal that organizes into different macro-complexes under different conditions. In addition to mutations that alter core catalytic activities of individual subunits, it is now widely appreciated that changes in the peripheral amino acids where subunits interact often give the most readily visible effects. These border amino acids are the ones that most directly control assembly of subunits into higher-order structures—all the way up to the elusive respiratory supercomplex.

A good way to begin the analysis is to use an example of real mtDNA. I recently obtained a whole genome sequencing (WGS) from Dante Labs with a special request for mtDNA sequencing that cost only a few hundred dollars. I received the mtDNA results after a few weeks in the form of two files in a common format known as FASTQ. Each file contains the sequencing data as obtained by sequencing fragments of DNA in one direction. To compare the results with the Cambridge Reference Mitochondrial Sequence (CRS), and extract my particular variants, I uploaded my files at a website called mtDNA-Server , and used 'single ended' for file type.

In addition to listing your variants, the most abundant 'homoplasmic' sequence results from your submitted sample, you also receive results about heteroplasmy. This data consists of additional low-abundance reads, mainly from the so-called nuclear mitochondrial DNA segments (NUMTs). These are mostly nonfunctional relic copies of mtDNA residing on many chromosomes. Going forward, it will be important to analyze mtDNA from compartments other than saliva to get a better handle on actual heteroplasmy within different mitochondria.

For example, cardiac muscle cells or fibroblasts can preferentially accumulate mutated and deleted mtDNA. In other cases, different tissues deliberately maintain separate populations of heteroplasmic mitochondria. In my case, I received the following homoplasmic variants:

11788 C>T MT-ND4
1438 A>G MT-CYB (should be MT-RNR2)
15326 A>G MT-CYB
16519 T>C MT-DLOOP1
263 A>G MT-DLOOP2
4769 A>G MT-ND2
750 A>G MT-RNR1
8860 A>G MT-ATP6

The good news, here, is that while not everyone is a winner in the potluck of genetic recombination, everyone with mitochondria can be assigned to a particular haplotype. This tells them roughly where they fit in the human family tree, and to a lesser degree, the kinds of environments to which their mitochondria are adapted. Using tools like MitoMap and Phylotree, and the expert assistance of researchers Marie Lott and Shiping Zhang at the Children's Hospital of Pennsylvania, I found that I fall squarely within haplogroup "H." This was because I share all of the very common 263G, 750G, 1438G, 4769G, 8860G, 15326G, and 16519C markers with this group. The one rare allele I have, at 11788T, is only found in 25 of the 45,000 sequences in the MitoMap database and this bumps me up into the H56 subgroup. This haplotype is a predominantly Northern European subgroup believed to have originated shortly after the last Ice Age.

One thing that jumps out to me right away is an apparent abundance of A>G variants in my workup. A>G also seems to be overrepresented among known severe mitochondrial disease mutations, like, for example, MELAS. Marie and Shiping noted that while A>G is definitely a statistically favored transition ( G>A vs A>G shifts occur at a ratio of 2.26 to 1 ), their abundance can be partly explained by the prevalence of transitions over transversions and a natural strand asymmetry within mtDNA The light strand, from which the mitogenome is numbered, has

13 percent G. Furthermore, A and G variants are known to be preferentially localized to specific regions of the mtDNA.

Going down the variant list, we find that some protein-coding variants, like the 4769A>G in MT-ND2 or the 11788 C>T in the MT-ND4 gene were "synonymous." These genes code for subunits of NADH dehydrogenase complex I. Synonymous means that although the nucleotide changes, the new codon will still be recognized by the same mt-tRNA or family of similar tRNAs. Therefore, these variants are not as likely to affect the protein sequence itself. It is still possible that some minor changes to the speed or fidelity of translation may occur.

The 8860 A>G transition MT-APT6, on the other hand, is a nonsynonymous substitution. More specifically, this missense mutation from A >G changes the codon specificity from that of T to A. It is important to realize that in the codon world T is the amino acid threonine not the nucleoside thymidine, while A is alanine, not adenosine. To determine where this mutation occurs in the F-ATPase protein 6 subunit of complex V, we can use Mitomap to convert the 8860 mtDNA-referenced coordinate to protein-referenced coordinate of 112.

Using protein structure databases like Uniprot and interactive modeling sites like Protein Model Portal, we can punch in the alanine variant at position 112 and see where it lies among the various loops and folds of the protein. The main article image at top shows the whole ATP6 protein and the image below highlights position 112.

MT-Cyb with variant at position 112 (highlighted green). Credit: proteinmodelportal.org

Searching the dbSNP Short Nucleotide Variations database revealed that several MT-CYB variants are associated with hypertrophic cardiomyopathy, so I wanted to do some more in-depth analysis there. One paper, which included my particular 15326A>G variant in their study, provides an excellent recipe. The authors of this 2013 study used PolyPhen to analyze mutations close to the conserved function heme-binding redox centers of MT-CYB. Although Polyphen offers feelings-based metrics that score various mutations as "possibly very damaging," it can also attribute these subjective descriptions to actual functional parameters like overpacking at buried sites in the protein.

Depending on whether or not crystal structures of human proteins are available at sufficiently high resolution, one can use prediction software like I-TASSER and Swiss-PDB Viewer to introduce amino acid changes and evaluate all possible rotamers. Changes in hydrogen bonding, macromolecular interactions and energy minimization can be performed using GROMOS force field.

Full molecular dynamics simulations are still not for the faint of heart. The NAMD and CHARMM22 programs, for example, generally require a lot of computing power. The MT-Cyb subunit interacts with several of the 11 subunits within complex III. Packages like the STRIDE simulator can calculate secondary structure at successive timepoints to reveal where interacting alpha helices transition and alternately transition to random coils to disrupt the structure of the complex.

We will wrap up our analysis of the mitogenome in the next article in this series, and also start to look at the mitonuclear genome as it becomes available to me. I am trying to follow in the footsteps of an early pioneer of open genomics, Brian Pardy, who was one of the first people to make his entire genome available for free to anyone interested in using it for analysis. My own information and files are available as they come in at this blog.


One-step test for mitochondrial diseases

More powerful gene-sequencing tools have increasingly been uncovering disease secrets in DNA within the cell nucleus. Now a research team is expanding those rapid next-generation sequencing tests to analyze a separate source of DNA -- within the genes inside mitochondria, cellular power plants that, when abnormal, contribute to complex, multisystem diseases.

The study team, headed by a specialist in mitochondrial medicine at The Children's Hospital of Philadelphia (CHOP), adapted next-generation sequencing to simultaneously analyze the whole exome (all the protein-coding DNA) of nuclear genes and the mitochondrial genome. "A first step in developing treatments for a disease is to understand its precise cause," said Marni J. Falk, M.D., the director and attending physician in the Mitochondrial-Genetic Disease Clinic at Children's Hospital. "We have developed a one-step, off-the-shelf tool that analyzes both nuclear and mitochondrial DNA to help evaluate the genetic cause of suspected mitochondrial disease."

Falk and colleagues describe their customized, comprehensive test, which they call the "1:1000 Mito-Plus Whole-Exome" kit, in the journal Discovery Medicine, published Dec. 26, 2012. Her co-corresponding author, biostatistician Xiaowu Gai, Ph.D., now of the Loyola University Stritch School of Medicine, collaborated on developing the test while at Children's Hospital.

While each mitochondrial disease is very rare in the population, hundreds of causes of mitochondrial diseases are known. Some originate in mutations in DNA specific to the mitochondria, tiny structures located outside the cell nucleus, while many other mitochondrial diseases are based in nuclear DNA genes that affect mitochondrial function. The role of mitochondria in human disease has been recognized only since the 1980s, based on pioneering research by Douglas C. Wallace, Ph.D., now at Children's Hospital, and a co-author of the current study.

Many mitochondrial diseases remain poorly understood. One complicating factor is heteroplasmy -- a mixture of mutated and normal mitochondrial genomes within the same cells or tissues. In contrast to conventional gene sequencing, which can detect only heteroplasmic mutations that reach levels of at least 30 to 50 percent, the customized kit has the sensitivity to detect mitochondrial genome mutations present at levels as low as 8 percent. To achieve their results, the study team adapted an existing whole-exome sequencing kit from Agilent Technologies, expanding it to encompass the mitochondrial genome.

The availability of the new kit, said Falk, if used for either clinical or research purposes, may shorten the "diagnostic odyssey" experienced by many patients and families seeking the cause of debilitating and puzzling symptoms. "Many families travel from one specialist to another for years, searching for the cause of their rare disease," she says. Specific treatments are not always available, but identifying their disease cause may be the first step toward discovering treatments.

A second recent study by Falk and colleagues reviews progress in diagnosing mitochondrial disease, through their experience at a single center over a rapidly changing three-year period before whole-exome sequencing was generally available. The retrospective review in Neurotherapeutics, published Dec. 27, 2012, covers 152 child and adult patients evaluated at CHOP's Mitochondrial-Genetics Diagnostic Clinic from 2008 to 2011.

"Before 2005, very few individuals could receive definitive molecular diagnoses for mitochondrial diseases, because of limitations in both knowledge and technology," said Falk. "Since that time, the clinical ability to sequence whole mitochondrial DNA genomes has significantly improved the diagnosis of many mitochondrial disorders."

During the study period covered in the review article, the clinic at CHOP confirmed definite mitochondrial disease in 16 percent of patients and excluded primary mitochondrial disease in 9 percent. While many diagnostic challenges clearly remain, Falk says the advent of massively parallel nuclear exome sequencing is revealing increasingly more of the genes in nuclear DNA that affect mitochondrial function, and the precise genetic disorder in a given patient, even if it is novel or uncommon. She added that molecular genetics is yielding a more nuanced understanding of the cellular pathways underlying symptoms in many mitochondrial disorders. "Those pathways offer potential new targets for treating these disorders," said Falk.


Watch the video: Biologi Film Om Fotosyntese u0026 Respiration (January 2022).