Information

Where can I obtain the sequence of the E. coli ribosomal RNA precursor?


I have found separate sequences for 16S rRNA (from the small subunit) and 23S rRNA and 5S rRNA (from the large subunit). I need the full precursor rRNA sequence, which I cannot find.

However the complete annotated sequence of the E. coil genome is available:

https://www.ncbi.nlm.nih.gov/nuccore/U00096.3?report=fasta

Does changing the T to U of this sequence give me the complete rRNA sequence?


Generally most organisms have multiple copies of the genes/regions coding for ribosomal RNA (since it's needed quite a lot). Additionally these regions can be repetitive or otherwise similar to sequence or place correctly in the genome, therefore it's generally harder to find good genomic sequences for rRNA compared to other genes.

Based on your link, it seems that you are looking for the rRNA of E.coli; the GeneBank entry corresponding to the fasta file you linked contains the annotation of the full genome and has 7 entries for sets of ribosomal RNA (named rr[slf][A-H]*).
If you take any of the sequences spanning one of these regions you should get the sequence of the full precursor. However, there are things you should be aware of:

  • there will likely be minor sequence differences between all 7 regions
  • Apparently most of the e.coli rRNA operons also have tRNA genes mixed into them (these are different for all 7 operons)
  • replacing the genomic T with U will make the sequence look like RNA, however ribosomal RNA is often also edited post-transcriptionally, so the sequence you get may not fully match the mature rRNA sequence.

Another website that gives a better readable overlook over the rRNA operons in e.coli is this, it also contains source links & seems to match the genbank entry from what I can see.

Example on how to find rRNA region from the geneBank entry:

This is a (shortened) part of the annotation of the GeneBank entry, which shows the 'A' ribosomal genes :

gene 4035531… 4037072 /gene="rrsA" rRNA 4035531… 4037072 /gene="rrsA" gene 4037141… 4037217 /gene="ileT" tRNA 4037141… 4037217 /gene="ileT" gene 4037260… 4037335 /gene="alaT" tRNA 4037260… 4037335 /gene="alaT" gene 4037519… 4040423 /gene="rrlA" rRNA 4037519… 4040423 /gene="rrlA" gene 4040517… 4040636 /gene="rrfA" rRNA 4040517… 4040636 /gene="rrfA"

As you can see the genes in the region/opernoccur in order rrsA, ileT, >alaT, rrlA, rrF (the fact that this is an operon can not be inferred from the GeneBank entry) and if you take the corresponding region of the fasta sequence (nucleotides/characters 4035531 to 4040636) you'll get (most of) the primary transcript sequence.
Since promoter & terminator Sequences are not annotated in GeneBank (or any/most other genome assemblies), you can't easily get the sequence before the first and after the last gene. One potential source for these might be the other ressource I linked, which lists some sources for terminator sites.

* This is expressing all names as regex. i.e. for rRNA operon A there are rrlA, rrsA & rrfA; same for B,C,…


Cryo-EM structure of the E. coli translating ribosome in complex with SRP and its receptor

We report the 'early' conformation of the Escherichia coli signal recognition particle (SRP) and its receptor FtsY bound to the translating ribosome, as determined by cryo-EM. FtsY binds to the tetraloop of the SRP RNA, whereas the NG domains of the SRP protein and FtsY interact weakly in this conformation. Our results suggest that optimal positioning of the SRP RNA tetraloop and the Ffh NG domain leads to FtsY recruitment.


<p>This section provides any useful information about the protein, mostly biological knowledge.<p><a href='/help/function_section' target='_top'>More. </a></p> Function i

Miscellaneous

<p>The <a href="http://www.geneontology.org/">Gene Ontology (GO)</a> project provides a set of hierarchical controlled vocabulary split into 3 categories:<p><a href='/help/gene_ontology' target='_top'>More. </a></p> GO - Molecular function i

    Source: CAFA <p>Inferred from Direct Assay</p> <p>Used to indicate a direct assay for the function, process or component indicated by the GO term.</p> <p>More information in the <a href="http://geneontology.org/page/guide-go-evidence-codes#ida">GO evidence code guide</a></p> Inferred from direct assay i

      GO - Biological process i

        Source: EcoCyc <p>Inferred from Mutant Phenotype</p> <p>Describes annotations that are concluded from looking at variations or changes in a gene product such as mutations or abnormal levels and includes techniques such as knockouts, overexpression, anti-sense experiments and use of specific protein inhibitors.</p> <p>More information in the <a href="http://geneontology.org/page/guide-go-evidence-codes#imp">GO evidence code guide</a></p> Inferred from mutant phenotype i

          <p>UniProtKB Keywords constitute a <a href="http://www.uniprot.org/keywords">controlled vocabulary</a> with a hierarchical structure. Keywords summarise the content of a UniProtKB entry and facilitate the search for proteins of interest.<p><a href='/help/keywords' target='_top'>More. </a></p> Keywords i

          Enzyme and pathway databases

          BioCyc Collection of Pathway/Genome Databases


          <p>This section provides information about the protein and gene name(s) and synonym(s) and about the organism that is the source of the protein sequence.<p><a href='/help/names_and_taxonomy_section' target='_top'>More. </a></p> Names & Taxonomy i

          <p>Manually curated information that is based on statements in scientific articles for which there is no experimental support.</p> <p><a href="/manual/evidences#ECO:0000303">More. </a></p> Manual assertion based on opinion in i

            <p>A UniProt <a href="http://www.uniprot.org/manual/proteomes%5Fmanual">proteome</a> can consist of several components.<br></br>The component name refers to the genomic component encoding a set of proteins.<p><a href='/help/proteome_component' target='_top'>More. </a></p> Component i : Chromosome Component i : Chromosome

          Chow, L. J. molec. Biol. 113, 611–621 (1977).

          Deonier, R. & Hadley, R. Nature 264, 191–193 (1976).

          Saedler, H. & Heiss, B. Molec. gen. Genet. 122, 266–277 (1973).

          Ohtsubo, H. & Ohtsubo, E. Proc. natn. Acad. Sci. U.S.A. 75, 615–620 (1978).

          Ghosal, D., Sommer, H. & Saedler, H. Nucleic Acids Res. 6, 111–122 (1979).

          Starlinger, P. & Saedler, H. Curr. Topics Microbiol. Immun. 75, 111–152 (1976).

          Bukhari, A., Shapiro, J. & Adhya, A. (eds) DNA Insertion Elements, Plasmids andEpisomes (Cold Spring Harbor Laboratory, New York, 1977).

          Alwine, J. C., Kemp, D. & Stark, G. Proc. natn. Acad. Sci. U.S.A. 74, 5350–5354 (1977).

          Nisen, P., Prucker, M. & Shapiro, L. (in preparation).

          Rak, B. Molec. gen. Genet. 149, 135–140 (1976).

          De Crombrugghe, B., Adhya, S., Gottesman, M. & Pastan, I. Nature new Biol. 241 260–264 (1973).

          Brosius, J., Palmer, M. L., Kennery, P. J. & Noller, H. F. Proc. natn. Acad. Sci. U.S.A. 75, 4801–4805 (1978).

          Korn, L. J., Queen, C. L. & Wegman, M. N. Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405 (1977).

          Sutcliffe, J. G. Nucleic Acids Res. 5, 2721–2728 (1978).

          Fiers, W. et al. Nature 273, 113–130 (1978).

          Sures, I. Lowry, J. & Kedes, L. H. Cell 15, 1033–1044 (1978).

          Sanger, F. et al. J. molec. Biol. 125, 225–246 (1978).

          Post, L., Strycharz, G., Nomura, M., Lewis, H. & Dennis, P. Proc. natn. Acad. Sci. U.S.A. 76, 1697–1701 (1979).

          Kuhn, S., Fritz, H. J. & Starlinger, P. Molec. gen. Genet. 167, 235–241 (1979).

          Rigby, B., Dieckmann, M., Rhodes, C. & Berg, P. J. molec. Biol. 113, 237–251 (1977).


          DATA AVAILABILITY

          The sequencing data have been deposited in NCBI's Gene Expression Omnibus ( 105) and are accessible through GEO Series accession number GSE152974 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE152974). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium ( 106) via the PRIDE ( 107) partner repository with the dataset identifier PXD019900 (https://www.ebi.ac.uk/pride/archive/projects/PXD019900). The used software and the resulting files have been deposited at Zenodo (https://doi.org/10.5281/zenodo.3876866 and https://doi.org/10.5281/zenodo.3955585).


          Materials and Methods

          Homology search was done with BLAST (Altschul etਊl., 1990), using the model organisms database. The resulting phylogenetic distribution of the BLAST hits was used to deduce whether any orthologous proteins could be found beyond Enterobacteriales, gamma Proteobacteria, Proteobacteria, and Bacteria, and whether any orthologs could be found in Eukarya and Archaea. Visual inspection of the search results was used to filter out paralogous, rather than orthologous proteins. In doubtful cases, a reciprocal BLAST search, with a putative orthologous protein identified in the original search, was performed to check whether the protein used as bait for the initial search would be found as its closest homologue among the proteins of E. coli.

          In all experiments, the strains ΔrsmG (JW3718), ΔrsmD (JW3430), ΔrsmB (JW3250), ΔrsmC (JW4333), ΔrsmH (JW0080), ΔrsmF (JW5301), ΔrsmE (JW2913), ΔrsmJ (JW5672), ΔrsmA(ksgA) (JW0050), ΔrlmA (JW1811), ΔrlmC (JW2756), ΔrlmF (JW5107), ΔrlmG (JW5513), ΔrlmH (JW0631), ΔrlmD (JW0843), ΔrlmI (JW5898), ΔrlmJ (JW3466), ΔrlmK/L (JW0931), ΔrlmB (JW4138), ΔrlmM (JW2777), ΔrlmN (JW2501), and ΔrlmE (JW3146) from the Keio collection (Baba etਊl., 2006) were used and compared with the parental wild-type strain BW25113 (Datsenko and Wanner, 2000).

          For the rRNA MT expression analysis, an overnight culture of wild-type E. coli was diluted in triplicate in fresh LB media to A260 0.01 and grown at 37ଌ in a shaker. Aliquots of cells were removed at 1, 2, 3, 4, 5, 6, 7, and 56 h and used for total RNA purification with Trizol reagent (Invitrogen), followed by cDNA synthesis with either a Maxima First Strand cDNA Synthesis Kit for RT-qPCR (Thermo) with a random hexamer primer or a Superscript reverse transcriptase (Invitrogen). Quantitative PCR was performed by a Maxima Hot Start DNA polymerase (Thermo) in the presence of SYBR green. The following primers were used for amplification of indicated mRNAs: rsmA(ksgA) (CCCTTTTGCGGGTTAATGGC and ACGCTTCGGGCAAAACTTTC), rsmB (CTATGCCACCTGTTCGGTGT and CTGTTTCGCAAAGTTCGGCA), rsmC (GCGCATAATCTGCCAGCATCand AAGAACAAACCGGAAGCCCA), rsmD (CGCAAAAAGGTACACCGCAT and CAGCCAGCCGTTATCTTCCA), rsmE (TGAGCAGTGTGGTCGTAACCand ACCGGTAACGGCAACGTATT), rsmF(CCGATTTTCTCGGTTGGGGA and ATACCACTCCTCCGCTTCCT), rsmG (GGACGCACGATAGAGAGTGGand TTCGGTCCGCGATCCTAATG), rsmH (CTCACGTCTGATCCTCTCGC and CAATAGTCTTCGCAACGGCG), rsmJ (AATTCCAGATGTTCCGGCGT and TGCCTTATCTGTTCTGGCGG), rlmA (AAATCAGCCCCTTCAGCTCC and CCGATACCAGTATGGACGCC), rlmB (GCCAGGACGTCAGTATCAGG and AGGATCAGCAGGAACGGTTG), rlmC (GGGCTTTGGTTTACACTGCG and AAACTGAGTGGAGTCCAGCG), rlmD (TCGACAATGTCACTGGAGCC and GATGTTCCCTGGGGCTATCG), rlmE (AGGTCGACAACCGTCATTCC and AAAGGGGTTACGTTCCCGTG), rlmF (CATCACAGCCGCTACGATCTand AAGTCTACGCTTTGCTCCCC), rlmG (AATGCCAGTGTTTTCGGCAC and ACGGGATTGATGAGTCGAGC), rlmH (TGCTCACCCTCTTTGTCGAG and TTTACCGAGTACCTGCGTCG), rlmI (ATCGCGATAAGTACGCAGCA and GAAGCGCTGGATATTGCACG), rlmJ (CAGTTAGGCAGCGAACATGC and CTGACCGCTACGGTTGAAGT), rlmKL (TTTGAAACGTCTGCTGCGTG and CAGGCCGTCGAGATCCATAC), rlmM (CTTCAACACGCAGTTCACCG and CATTTGCCGCCAGAAGATCG), rlmN (ATGTCGATGGCTTCACCCTG and TATCGATGCTGCCTGTGGTC), oppA (AATCGTTCTTGAACGCAGC and GATCAACGTGAACTTCGTCC), metK (AGGCTGAAGTGCGTAAAAAC and GGGCAGAATTGGCTTGATG), nanA (GTGGTGTACAACATTCCAGC and CGAAGATTTCGTCGTAACCG), gatY (TTTGCCATCGCTTTGATGTC and TGGCATACATCCCATGAGC), guaB (AAGACTTCCAGAAAGCGGAA and TTCACGGATACGTTGCAGTA), mdaB (CAGCGACTACGATGTCAAAG and TTTTCGACGGATCTTTGCG), ybeD (CAGGCGTTACCTGAGCTG and TTTGCCCAGTTCTTCATACAGT), astC (ATTGACGACTCTACCTGTGC and TCACGCCGTAGTGCATATAG), acs (GCAGTATTCCGCTGAAGAAA and GATCTTCGGCGTTCATCTCT), modA (GCCTGCGGATCTGTTTATTT and TTCAGCAGTGAAGTCCAGTT), psp (ATCGATGTTCGTGTTCCAGA and CCCATCTCGCTAAGGATCTC), ugpB (GCTGGATCCAACTGGAAAAC and TTCATCCTTACGACCGACG), argT (TACCGATAAACGTCAGCAGG and CCTTTACTACGCCAGGTCTC), sra (AATCGAACCGTCAGGCAC and TTTTCAGCGGGGCGTTT), cer (TGAGCAAGGGCGAGGAGC and TGGTGCAGATGAACTTCAGG), rfp (GCTGATCAAGGAGAACATGC and AGGATGTCGAAGGCGAAGG). Quantification of expression was done by Δ㥌t method using 16S rRNA as a reference (gAgAATgTgCCTTCgggAAC and CCgCTggCAACAAAggATAA for MT gene expression analysis or CATTGACGTTACCCGCAGAAGAAG and CTACGAGACTCAAGCTTGCCAGTA for other gene expression analyses). To estimate the proportion of 17S rRNA processing, an intermediate RT qPCR approach was used. The following primer sequences were used for the 16S rRNA (GAAGAGTTTGATCATGGCTCAG and CCACTCGTCAGCAAAGAAG) and for the 17S rRNA processing intermediate (TCATTACGAAGTTTAATTCTTTGAGCG and GAAGAGTTTGATCATGGCTCAG). The proportion 17S/(16S +�S) was calculated by normalization of the levels of the 5′-end-extended 17S transcript to the total amount of 16S and 17S transcripts.

          To assess the accumulation of assembly intermediates, cells of rRNA MT knockout strains and a BW25113 strain (WT) and were grown in 500 ml of an LB medium at 37ଌ or 20ଌ to A600 0.6, slowly cooled on ice, and harvested by centrifugation. Cells pellets were resuspended in a lysis buffer (20 mM HEPES-KOH pH 7.5, 4.5 mM Mg(OAc)2, 150 mM NH4Cl, 4 mM β-mercaptoethanol, 0.05 mM spermine, 2 mM spermidine buffer) and lysed by ultrasonication. After removal of cell debris, lysates containing approximately 1,200 pmol of ribosomes were applied to either a 10% to 30% sucrose gradient in a buffer 20 mM HEPES-KOH pH 7.5, 1 mM Mg(OAc)2, 200 mM NH4Cl, 4 mM β-mercaptoethanol, or a 10% to 40% sucrose gradient in a buffer 20 mM HEPES-KOH pH 7.5, 10 mM Mg(OAc)2, 200 mM NH4Cl, 4 mM β-mercaptoethanol. Ultracentrifugation was performed by an SW41Ti rotor at 19,000 rpm for 19 h followed by optical density monitoring at 260 nm.

          To create a pRFPCERtet construct, a pRFPCER plasmid (Osterman etਊl., 2013) was digested with HindIII and SacII and ligated with pair of pre-annealed complementary oligonucleotides (TetR F 5′ AGCTTGGGAAATCATAAAAAATTATTTGCTTACTCTATCATTGATAGAGTTATAATAGCCGC-3′ and TetR R 5′-GGCTATTATAACTCTATCAATGATAGAGTAAGCAAATAATTTTTTATGATTTCCCA-3′), containing a T5 promoter with the TetR binding site. The obtained plasmid was digested with SacII and NdeI restriction enzymes and ligated with pair of pre-annealed complementary oligonucleotides (5′-CACACAACAAAGGAGGTAC and 5′-TAGTACCTCCTTTGTTGTGTGGC), containing a highly efficient ribosomal binding site. The resulted plasmid was used for further study as pRFPCERtet.

          To monitor growth rates upon exogenous gene overexpression, and to evaluate protein synthesis efficiency, cells of rRNA MT knockout strains and the BW25113 strain (WT) were transformed with the plasmid pRFPCERtet. Overnight cultures of the transformants, in triplicate for each strain, were diluted by LB with or without anhydrotetracycline 0.2 ug/ml to A600 0.01 in a 96 well plate. Cells were cultivated with continuous shaking at 37ଌ with automatic A600 monitoring every 30 min by a Janus workstation (Perkin Elmer). Growth rates of rRNA MT knockout strains, and the wild-type strain not transformed by any plasmid, were measured likewise.

          For evaluation of CER and RFP protein synthesis efficiency, cells transformed by the plasmid pRFPCERtet were grown in triplicates for 18 h in 200 ul LB media at 37ଌ with or without anhydrotetracycline 0.2 ug/ml in a 96 deep well plate with continuous shaking. After incubation, the cells were centrifuged in a 96 well plate and washed twice with 0.9% NaCl. The fluorescence of the cells was measured by a Victor X5 plate reader (Perkin Elmer) at 430/486 nm for CER and 531/595 nm for RFP.

          To determine in vivo protein synthesis efficiency, the cells of the wild-type and rRNA MT knockout strains were transformed by a plasmid encoding the FastFT protein under a control of an araBAD promoter (Subach etਊl., 2009). Cells grown in LB media with 10 mM arabinose at 37ଌ after 48 h were diluted 1:100 by a fresh LB media with 10 mM arabinose. An aliquot was taken at various time points cells were isolated by centrifugation, washed two times by sterile PBS, and analyzed by a fluorescently activated cell sorter BD FACSAria III at the wavelengths 405/460 nm and 555/610 nm.

          Comparative proteome analysis using 2D PAGE was performed as described (Hoch etਊl., 2015). Not less than three independently grown cultures were used for each knockout strain.

          Shotgun comparative proteome analysis was performed as described (Toprak etਊl., 2014 Osterman etਊl., 2015). Briefly, cells resuspended in 0,75% w/w RapiGest SF (Waters) were lysed by sonication. After debris removal, protein cysteine bonds were reduced with 10 mM dithiothreitol and alkylated with 30 mM iodoacetamide. Trypsin was added in a 1/50 w/w ratio trypsin/protein and incubated at 37ଌ overnight. To stop trypsinolysis, trifluoroacetic (TFA) acid was added to the final concentration of 0,5% v/v. Peptides were desalted and resuspended in 3% acetonitrile (ACN), 0.1% TFA, to a final concentration of 2 µg/µl. Mass spectrometry analysis was performed on a TripleTOF 5600+ mass-spectrometer with a NanoSpray III ion source (ABSciex, Canada) coupled to a NanoLC Ultra 2D+ nano-HPLC system (Eksigent). For protein identification,.wiff data files were analyzed with ProteinPilot 4.5 revision 1656 (ABSciex) using the Paragon 4.5.0.0 revision 1654 (ABSciex) search algorithm and a standard set of identification settings to search against SwissProt database, species Escherichia coli.


          3. Conclusions

          Several conclusions can be drawn from these studies with respect to the relative stabilities and structures of RNAs with single and multiple modified nucleotides. Small reproducible differences in the free energy values for the E. coli h31 variants reveal slight destabilizing effects of the modifications on helix 31. A recent X-ray crystal structure of the T. thermophilus 70S ribosome complexed with a model mRNA and two tRNAs revealed that the positions of the 16S rRNA P-site nucleotides in the vacant ribosome superimpose well with those in the tRNA-containing complex, with the exception of m2 2 G966. 16 Residue m2 2 G966 (m2G966 in E. coli ribosomes) is flipped out in the crystal structure of the T. thermophilus 70S ribosome containing a model mRNA and two tRNAs ( Figure 4A ) 15 , 16 or remains stacked in the 30S ribosomal subunit from T. thermophilus crystal structure ( Figure 4B ). 22 The interaction with the anticodon loop of the P-site-bound tRNA appears to be stabilized by stacking interactions involving m2 2 G966 with ribose 34. Hence, the flipped-out base has been suggested to facilitate correct positioning of the tRNA during translation. 16 , 25 Therefore, the slight destabilizing effects of modifications in h31 may be important for facilitating the flipping movement of residue 966 but at the same time, stacking interactions with the tRNA are stabilized through the methyl group. Positioned in the middle of the three stacked bases, the m 5 C967 residue has a greater destabilizing effect than m 2 G966 (ECh31M5C vs. ECh31M2G). This result may be due to a greater disruption of stacking by the methylated base of m 5 C967.

          Structures of h31 showing the flipped out (A) and stacked (B) conformations of residue G966 (PDB accession IDs - 2J00 15 and 1FJF 22 ).

          The CD data indicate that the unmodified, singly modified, and fully modified RNAs all contain A-form stem regions and display similar conformations. Minor differences between the CD spectra of the fully modified and unmodified h31 constructs indicate possible differences in the loop regions. These differences could arise from modification-dependent changes in the loop, such as altered base stacking at positions 966, 967, and 968. The UV melting data reveal, however, that the presence of modifications at specific locations does not influence the ability of the constructs to form stable hairpin loop structures.

          The exact functional role of the modifications at positions 966 and 967 of h31 is still unknown. Ribosomes carry out the essential biochemical process of translation, which requires an exquisite array of highly specific interactions between rRNA, mRNA, tRNAs, and ribosomal proteins. Modifications are believed to help fine-tune ribosome function. 3 Since proper ribosome function depends on the correct balance of speed and accuracy of tRNA binding, peptide-bond formation and tRNA release, methylations in h31 could play a role in maintaining proper interactions within the ribosome. 36 , 37 Mutational analyses revealed that a loss of methylation at either position 966 or 967, leads to increased protein production by the mutant ribosomes. 21 Our data would therefore suggest that methylations destabilize h31 in order to maintain the proper interactions with tRNA, rRNA, or proteins. A lack of modification at residues 966 or 967 in h31 could reduce the ability of base 966 to flip and regulate tRNA affinity, positioning, or accuracy. Thus, it will be of great interest to explore in greater detail the relationship between G966 and C967 methylation and translational fidelity. Furthermore, the availability of a suitable method for synthesizing m 2 G and its corresponding phosphoramidite (commercially not available) will allow RNAs containing m 2 G modifications to be generated in sufficiently large quantities for use in additional biophysical and ligand-binding studies.


          Discussion

          The use of CRISPR/Cas9 for genome editing has had a profound impact in the field of synthetic biology and more importantly, continues to open new avenues for translational medicine such as gene-therapies 37 . In this study, we utilized the potential of CRISPR/Cas9 and the innate homologous recombination capacity of yeast to engineer the small-subunit ribosomal RNA in the bacterial genome of M. mycoides. Furthermore, we also tested the function of the newly-introduced mutations by transplanting the genome with the engineered 16S rRNA gene into recipient M. capricolum cells 27 . Thus, we were able to generate a genome-engineering platform that can be used for robust and extensive site-directed mutagenesis directly on the bacterial chromosome with up to 100% efficiency.

          The ribosomal RNAs are one of the most conserved genes across all three kingdoms of life, underscored by the fact that the small-subunit RNA is used to measure the rate of evolution of organisms and phylogenetically-classify them according to the divergence in the sequence of this gene. Apart from the conserved-nature, engineering the 16S rRNA presents a considerable challenge also because of the number of copies of the ribosomal RNA genes varies between 1 and 15 within the domain Bacteria 20 . Functional redundancy of various copies of the rRNA has been demonstrated in few eubacterial species such as E. coli 21 ,22 and M. smegmatis 23 ,24 as well as in the eukaryote, Saccharomyces cerevisiae 38 which has enabled the engineering of organisms with homogenous populations of ribosomes encoded by single transcription units. Thus the copy-number problem was circumvented by using model organisms such as M. smegmatis, which only has two copies of rRNA genes 23 ,24 and E. coli strains with a single rRNA operon encoded from a plasmid or the chromosome 21 ,22 . Likewise, the M. mycoides genome carries only two rRNA operons, and importantly, deletion of one operon does not affect viability of the cells (Hutchison et al. 29 ). Hence, this makes M. mycoides an ideal model for engineering the 16S rRNA and the ribosome. However, the lack of genetic tools has limited the use of M. mycoides as a model organism to study fundamental biological processes despite the many ground-breaking synthetic biology-feats accomplished with its genome 25 ,26 ,27 . One of these accomplishments, namely, cloning of the entire bacterial genome in yeast, allowed us to apply the genetic tools developed in yeast on the M. mycoides genome, thus making it a versatile platform for bacterial genome editing. Using the unique capacity of this platform, we were able to generate a version of the genome lacking the essential 16S rRNA gene, which is not possible by conventional techniques. Subsequently, the functionality of this non-functional genome was restored by using wild-type or an engineered version of the gene that could support ribosome function. Additionally, we were able to test the function of multiple versions of the engineered 16S rRNA gene in a single yeast-transformation and the subsequent M. mycoides genome transplantation event (Supplementary table 1), thus creating the possibility of high-throughput functional-testing of multiple loci through this platform.

          The goal of this study was not to systematically understand if every segment of the M. mycoides rrs gene is essential for its function. Rather, our study aimed at using the M. mycoides genome cloned in yeast as a platform to explore the possibilities of engineering this essential gene by identifying the sites where sequence modifications could be introduced and the extent to which such modifications could be tolerated. In the process, we found that the introduction of the MS2 coat-protein binding helix (MS2) was the most tolerated heterologous element since five out of six engineered rrs* carrying this modification at different locations did not affect cellular viability. Notably, in E. coli, the introduction of MS2 at h6 did not affect viability however, it only resulted in

          85% of the MS2-tagged small-subunits upon affinity purification 34 . Given our results with MS2, perhaps the introduction of MS2 at multiple sites simultaneously could have still allowed for fully-functional ribosomes and cellular-viability in E. coli and recovered a much higher fraction of the MS2-tagged subunits. Similarly, the introduction of the MS2 element at multiple locations in the large-subunit rRNA could have resulted in a purer population than the observed

          90% when tagged at a single site 34 . Also of note, the problem of heterogeneity for non-lethal mutations is circumvented in the M. mycoides platform, since the wild-type gene is completely eliminated before the introduction of mutagenized 16S rRNA gene.

          It is interesting to note that all six sites chosen for engineering (h6-h39) were able to tolerate at least a single kind of an insertion element ( Fig. 3A ), indicating that for the rrs* carrying insertions that did not produce viable transplants, the site was not the reason per se. However, the context of the insertion elements could have played a role in the rRNA folding following transcription, post-transcriptional modifications, association of ribosomal proteins or rRNA stability, thus affecting the small-subunit assembly and function and consequently, cellular viability. For example, the insertion of “synthetic helices” has been shown to modestly decrease the fidelity of the ribosome (less than two-fold) in-terms of stop-codon read through and frameshifts 33 . However, a detailed functional knowledge of how each of these six helices contribute to ribosome function is largely unknown. The structural integrity and hence, the functionality of the native helices are largely preserved during the addition of heterologous elements, while helix substitutions deviate considerably in sequence and consequently, in the structure of the helix ( Fig. 2B,D ). This could potentially explain why most of whole helix substitutions were lethal only 2/12 single helix substitutions produced viable transplants as opposed 11/18 helix insertions. On the contrary, we observed a loss of viability when two distinct insertion elements were introduced at two different locations simultaneously ( Fig. 3B ), even though we observed viability when both h17 and h39 were substituted simultaneously with those from B. subtilis and E. coli, respectively (the triple-site engineered rrs* that were functional carried these two substitutions, Fig. 3C ).

          While the insertion elements were already tried and tested in E. coli 33 ,34 ,35 ,36 , it was still surprising to note that several of these single-site additions were able to support viability, given the considerable 16S rRNA phylogenetic distance between E. coli and M. mycoides (Supplementary Fig. 3). Even more impressively, we were able to generate functional rrs* engineered at up to three locations ( Fig. 3c ) within the gene by careful design, which clearly demonstrates the flexibility provided by the architecture of the 16S rRNA. This flexibility could be exploited to add new functions to the translation machinery such as orthogonality and also to study central ribosome functions such as initiation and decoding by introducing mutations. Also of note, all the single-, dual- and triple-site variants were tested on the M. mycoides rrs that already carried seven mutations from the M. capricolum rrs gene. It is likely that perhaps even more of these engineered rrs* could have been functional had they been tested in the context of the wild-type M. mycoides rrs gene, underscoring the potential utility of this platform to generate and study small-subunit mutants.

          Our efforts were focused mainly on evaluating if the engineered rrs* could support the production of viable M. mycoides cells after genome transplantation. An easily observable phenotype of rRNA engineering is the growth-defect brought about by the mutations introduced. It is conceivable that some of the transplants carrying engineered rrs* are slow-growing upon culturing, which might even explain in some cases, why mutations that produced transplants individually did not produce viable M. mycoides cells when combined ( Fig. 3A𠄼 ). Such slow-growing mutants could be potentially used to study processes such as 16S rRNA folding and stability and the small-subunit maturation.

          In conclusion, the M. mycoides platform described here could enable the high-throughput testing of different mutagenized versions of any essential gene since only the functional versions can restore the capacity of the genome to produce viable cells upon genome transplantation. Thus, this platform can be applied to further understand the function of various regions of the 16S rRNA and could be extended to the large-subunit RNA, ribosomal and extra-ribosomal factors and other essential genes involved in fundamental biological processes to answer basic questions of life.


          DISCUSSION

          Ribosomal RNAs are the main components of the translational machinery and are synthesized at high levels to meet the cellular demands for protein synthesis, especially under conditions of rapid growth ( 24). However, a description of the pathways that lead to the generation of mature rRNAs in E. coli has remained incomplete even though the initial steps of rRNA processing were defined nearly 50 years ago. Based on the evidence suggesting that 5S rRNA undergoes 5′-end maturation using a 5′ to 3′ exonucleolytic mechanism, I examined a strain that lacks RNase AM, a 5′ to 3′ exonuclease, and found that 5′-end maturation of 5S rRNA is performed by this enzyme both in vivo and in vitro (Figure 1). I also found that RNase AM is responsible for the 5′-end maturation of 23S rRNA, but this enzyme removes only the final three nts of the seven precursor residues, indicating the presence of another enzyme that removes the preceding four nts (Figure 2). Finally, I found that RNase AM generates the mature 5′ end of 16S rRNA (Figure 4). This process was attributed to RNase G, but here it is shown that the RNase G cleavage product retains three unprocessed nts, which are then removed by RNase AM. Why RNase AM should remove precisely three unprocessed nts from each of the rRNAs remains unclear. Nonetheless, the identification of the enzyme that matures the 5' end of these rRNAs, in conjunction with the previously identified enzymes responsible for 3′-end maturation, now provides a full description of the enzymes that generate mature ends for each of the three rRNAs in E. coli.

          Due to the high levels of ribosomes that are required for growth, a significant proportion of transcription, and therefore, of the energy generated by a cell, is used for rRNA synthesis. A question of interest is how rRNA degradation, which would contribute to inefficiencies in energy utilization, is minimized given a cellular environment that contains multiple RNases. It has been shown that once complete ribosomes are formed, rRNA becomes extraordinarily stable in the assembled form ( 25), but a small proportion of rRNA does get degraded during the assembly process ( 26, 27). One factor that limits more extensive rRNA degradation during the assembly process could be through a restriction of the activity of rRNA processing enzymes to late stages of rRNA assembly, where over-digestion can be curbed by the presence of ribosomal proteins or rRNA structures that form during the assembly process. Supporting this view, I found that RNase AM maturation of 5S and 23S rRNA occurs not only more efficiently, but also with greater accuracy on ribosomal particles as compared to purified rRNA (Figure 1D and 2C). In the case of 23S rRNA, RNase AM processing was observed to occur at a late stage of assembly when 50S particles join with 30S particles to form the complete ribosome (Figure 3G). Similarly, it has been shown that processing of the 5S and 23S rRNAs at their 3′ ends by RNase T is also more efficient on ribosomal particles, as compared to free RNA ( 8, 9). Identifying the mechanisms through which the activity of the processing RNases is restricted on free rRNA and/or channeled to ribosomal particles will be a topic of interest for the future.

          We had previously provided experimental evidence that the 5′ and 3′ ends of 23S rRNA undergo maturation at similar rates under different growth conditions, and based on those observations, we suggested that the processing of the two ends might be coupled ( 19). This hypothesis was tested by using ΔyciV and ΔrnbΔrph strains to inhibit processing at the 5′ or 3′ ends, respectively. These analyses revealed that in a ΔrnbΔrph strain, 5′-end precursors accumulate to the same extent as 3′-end precursors, suggesting that 3′-end maturation occurs first and is necessary for processing at the 5′ end (Figure 3). Similarly, work on 16S rRNA has suggested that processing at one end facilitates maturation at the other end ( 10, 28). Thus, the coupled processing of 3′ and 5′ ends may be a common feature of bacterial rRNAs, with potential relevance for increasing the efficiency of rRNA maturation at both ends.

          Paralogs of RNase AM are found in a wide variety of organisms, including many other Gram-negative and positive bacteria, archaea and lower eukaryotes. However, the model Gram-positive bacterium, Bacillus subtilis lacks an RNase AM homolog. In this case, 5′-end maturation of the three rRNAs each proceeds through a different set of mechanisms, involving the enzymes Mini-III for 23S rRNA, RNase J1 for 16S rRNA and RNase M5 for 5S rRNA maturation ( 29–31). Interestingly, in some members of the delta-proteobacteria, RNase AM paralogs are found in fusion with RNase III, which suggests that multiple steps of rRNA processing might be performed by the same enzyme in these organisms. Future studies on this enzyme will be needed to clarify the rRNA maturation role of RNase AM in all domains of life. In summation, the findings described here provide answers to decades-old questions regarding the mechanism of 5′-end maturation of the E. coli rRNAs, complete the roster of the RNases required to yield mature rRNAs and describe the first biological function for a 5′ to 3′ RNA exonuclease in this organism.