4.4: Introduction to Nucleic Acids - Biology

4.4: Introduction to Nucleic Acids

Nucleic Acids: Useful Notes on Nucleic Acids

These are important organic substances found in nucleus and cytoplasm. They control the important biosynthetic activities of the cell and carry hereditary information from generation to generation.

Thus, nucleic acids are macromolecules of the utmost biological importance.

They are associated with the chromosomes and transmit various information to cytoplasm.

Image Courtesy :

All the hereditary (genetic) information of the cell (i.e., all the information necessary to reproduce and maintain a new organism) is stored in coded form in molecules of DNA. DNA is replicated and distributed to daughter cells during cell division, and in this way all the hereditary information accumulated over billions of years of evolution is passed from cell to cell and from one generation of an organism to another.

With the aid of RNA, this information is expressed as specific patterns of protein synthesis. These nucleic acids are of two types: (i) deoxyribonucleic acid (DNA) and (ii) ribonucleic acid (RNA). DNA is the major store of genetic information. This information is transmitted by transcription into RNA molecules, proteins are then synthesized in a process involving translation of the RNA.

DNA → transcription RNA→ translation Protein

In higher cells DNA is localized mainly in the nucleus as part of the chromosomes. A small amount of DNA is present in the cytoplasm and contained within mitochondria and chloroplasts. RNA is found both in the nucleus, where it is synthesized, and in the cytoplasm, where the synthesis of proteins takes place.

Nucleic acids consist of a sugar (pentose), nitrogenous bases (purines and pyrimidines), and phosphoric acid. A nucleic acid molecule is a linear polymer in which nucleotides are linked together by means of phosphodiester ‘bridges’ or bonds.

These bonds link the 3′ carbon in the pentose of one nucleotide to the 5′ carbon in the pentose of the adjacent nucleotide. Thus the backbone of a nucleic acid consists of alternating phosphates and pentoses. The nitrogenous bases are attached to the sugars of this backbone.

Nucleic acids are basophilic, i.e., stain readily with basic dyes. After a mild hydrolysis the nucleic acids are decomposed into nucleotides.

[I] Deoxyribonucleic acid (DNA):

It forms about 9% part of nucleus as found by spectrophotometric analysis. Chemically it consists of mainly three components: phosphoric acid, sugar, and bases.

1. Phosphoric acid:

It may occur also as phosphate and forms the backbone of DNA molecule along with sugar molecule. It links the nucleotides by joining the deoxyribose (pentoss sugar) of two adjacent nucleotides with an ester-phosphate bond. These bonds connect carbon 3′ in one nucleotide with carbon 5′ in next. This acid is a channel for the chemical energy used by the molecule.

2. Pentoses:

These are of two types: ribose in RNA and deoxyribose in DNA. DNA has one oxygen atom less than that of RNA. The pentose sugar in nucleic acids is always ribose-, in RNA it is D-ribose and in DNA, it is deoxyribose. It is always the OH on C-l carbon which is the point of attachment of the base.

This linked to the 1-nitrogen atom in case of pyrimidines and to 9-nitrogen atom in purines. Both deoxyribose and ribose (pentose sugars of nucleic acids) have a pentagonal ring with five carbons, among which two (i.e., 3′ and 5′) are attached to phosphoric acid and three (Г) to the base.

3. Bases:

These may be of two types:

These are characterised by the presence of two fused benzene rings. They may be adenine (A) and guanine (G). RNA contains uracil (U) instead of thymine. The combination of base plus a pentose, minus the phosphate, forms a nucleoside. For example, adenine is a purine base adenosine (adenine + ribose) is the corresponding nucleoside, i.e., deoxyadenosine and deoxyguanosine.

These are characterized by the occurrence of single benzene ring. They are thymine (T) and cytosine (C).

Nucleotides are phosphate esters of nucleosides, purine or pyrimidine bases linked to sugars. In the nucleotides, the 3-nitrogen of the pyrimidine bases or the 9-N of the purine bases is attached to the 1-carbon atom of the sugar, and the phosphoric acid residue is attached to the 5′ carbon atom of the sugar.

Thus nucleotides of purines are deoxyriboadenylic acid and deoxyriboguanylic acid, and of pyrimidines are deoxyribothymidylic acid and deoxyribocytidylic acid.


Purines and pyrimidines are weak bases. Adjacent nucleotides of the nucleic acids are connected by a linkage between the phosphoric acid residue of one nucleotide and the 3′ carbon atom of the sugar on the next nucleotide. Both the bases and sugars have an approximately planar structure. In polynucleotides the planes are oriented with respect to one another at an angle of 70° to 75°.

In addition to four common bases, a number of unusual bases are found is .DNA. DNA of animal origin contains trace amounts of 5-methy- lcytosine, while large amounts of this base are found in DNA of plant origin. Similarly, 6-methyl aminopurine is found in DNA from bacteria and viruses.

Cytosine in DNA of T-even bacteriophages of E.-coli is replaced by 5-hydroxymethylcytosine, to which glucose or other sugars may be linked at the hydroxyl group. In some viral DNA’s, the rare base 5-hydroxy— methyluracil substitutes for thymine.

[III] DNA base composition:

DNA in living organisms is found as a linear molecules of extremely high molecular weight. For example, in E-coli it is a single circular DNA molecule weighing about 2.7 × 10 9 daltons and its length is 1.4 mm. In higher organisms the amount of DNA may be several thousand times larger. For example, in a single human diploid cell its total length when fully extended is 1.7 meters.

All the genetic information of a living organism is stored in its linear sequence of the four bases. Therefore, a four letter alphabet (A, T, G, C) must code for the primary structure (i.e., number and sequence of 20 amino acids) of all proteins. The base composition vary from one species to another, but in all cases amount of adenine is equal to amount of thymine (A=T).

Similarly amount of cytosine is equal to guanine (C=G). Consequently, total quantity of purines equals the total quantity of pyrimidines (i.e., A+G=C+T). On the other hand, AT/GC ratio varies considerably between species.

[IV] DNA is double helix:

On the basis of X-ray diffraction data of Wilkins and Franklin, Watson and Crick (1953) proposed a model for DNA structure. It is composed of two right-handed helical polynucleotide chains that form a double helix around the same central axis. The two strands are antiparallel, meaning that their 3′, 5′ phosphodiester links run in opposite directions. The bases are stacked inside the helix in a plane perpendicular to the helical axis.

The two strands are held together by hydrogen bonds present between pairs of bases. Since there is a fixed distance between two pentose sugars in the opposite strands, only certain base pairs can fit into the structure.

As shown in figure 5 two hydrogen bonds are formed between A and T, three are formed between С and G, therefore a CG pair is more stable than AT pair. In addition to hydrogen bonds, hydrophobic interactions established between the stacked bases are important in maintaining the double helical structure.

The axial sequence of bases along one polynucleotide chain may vary considerably, but on the other chain the sequence must be complementary, as given below —

Because of this property, order of bases on one chain, the other chain is complimentary. During duplication the two chains dissociate and each one serves as a template for the synthesis of a new complementary chain.

[V] Separation of DNA strands:

DNA double helix is preserved by weak interactions (i.e., hydrogen bonds and hydrophobic interactions between stacked bases) two strands may be separated by heating or alkaline pH. This separation is called melting or denaturation of DNA. The melting point depends on AT/GC ratio. Breakage of GC pairs needs higher temperature to that of AT pairs.

If DNA is cooled slowly after denaturation, double helical conformation will be restored. This process is called renaturation or annealing and this is the base-pairing properties of nucleotides.

DNA renaturation can be used to estimate the size (number of nucleotides) of the genome of a given organism. A large genome (e.g., calf) take more time to reanneal than a small genome (e.g., E. coli). This is because the individual sequences take longer time to find the correct partners.

Single stranded DNA will also anneal to complimentary RNA, resulting in a hybrid molecule in which one strand is DNA and the other is RNA. Molecular hybridization is a very powerful method for characterizing RNAs since RNA molecule will hybridize only to DNA from which it was transcribed.

[VI] Ribonucleic acid (RNA):

RNA is present in considerable amounts in the nucleolus and is also found in small amounts on chromosomes. The major part of the cells RNA is in the cytoplasmic ribosomes. A small amount of RNA is also present in mitochondria and chloroplasts.

Transfer RNA and mRNA are present in solution in the cytoplasmic matrix unless affixed to the ribosomes. The RNA content of nucleus and cytoplasm varies with activity cycles of the cell. The cytoplasmic RNA increases in quantity during cell growth preceding mitosis and is partitioned equally between the daughter cells.

RNA accumulates in both nucleus (especially in nucleolus) and cytoplasm during high metabolic activity or growth, as in regenerating nerve cells, active neurons, gland cells, cells infected with virus and tumor cells. Actively metabolizing yeast cells contain a large amount of RNA, but starved yeast cells have little RNA. Infact, starved cells in general show RNA depletion.

RNA also varies with other physiological conditions such as lack of oxygen and presence of metabolic poisons. RNA is labile in dividing cells and also in active cells that are not dividing.

[VII] Structure of RNA:

RNA is a long-chain molecule built up of repeating nucleotide units linked by 3′ to 5′ phosphate diester bonds. Sugar component of RNA is ribose and three out of four bases, adenine, guanine and cytosine are the same as in DNA, and the fourth base is uracil in place of thymine of DNA, Uracil has one methyl group less.


RNA nucleotides are formed from pentose sugar ribose, phosphoric acid and either adenine guanine, cytosine or uracil (U). Nucleotides are regarded as phosphorylated derivatives of nucleosides. Nucleosides are combinations of a nitrogenous base and a pentose sugar without an attached phosphate group.

A nucleotide unit consists of a molecule of sugar, a base and a phosphoric acid. A single nucleic acid contains a large number of nucleotide units consisting of high molecular weight (about 8,000,000).

Nucleotides are the monomeric units of the nucleic acid macromolecule. The nucleotides result from the covalent bonding of a phosphate and a heterocyclic base to the pentose. Within the nucleotide, the combination of a base with the pentose forms a nucleoside.

For example, adenine is a purine base adenosine (adenine+ ribose) is the corresponding nucleoside, and adenosine monophosphate (AMP), adenosine diphosphate (ADP) and adenosine triphosphate (ATP) are nucleotides. Nucleotides, thus constitute the building blocks of nucleic acids and they are also used to store and transfer chemical energy.


Nucleotides are joined together to form a polynucleotide chain by a covalent linkage between the phosphoric acid residue of one nucleotide and 3′ carbon of the sugar on the next nucleotide. This linkage is often called a 3′, 5′ phosphodiester bond, because the phosphate is esterified to two OH groups, one attached to the 3′ carbon and one attached to the 5′ carbon.

The backbone of a polynucleotide chain thus consists of alternating sugar and phosphate units.

The sequence of nucleotides in DNA and RNA is the key to their genetic functions, just as the sequence of amino acids determines the biological activity of a particular protein. Even though both DNA and RNA are usually composed of only four different nucleotides, the number of possible sequences of nucleotides is enormous in a large polymer.

RNA usually exists as a single-stranded polynucleotide chain and have no regular helical configuration. The linear chain is thought to be folded in many ways, with certain nucleotides pairing off and forming short double-stranded regions.

[VIII] Kinds of ribonucleic acid:

The ribonucleic acids are of three types —

1. Messenger RNA:

This ribonucleic acid is of nuclear origin and conveys genetic information from DNA in the nucleus to the ribosomes in the cytoplasm, where amino acids become grouped to form proteins.

2. Transfer or adapter RNA:

It is another important type of ribonucleic acid which is present in the cytoplasm, helping there in protein synthesis. It has been recently found that t-RNA originates from nucleus near the nucleolar region.

3. Ribosomal RNA:

This ribonucleic acid is the major component of cytoplasmic particles called ribosomes. Ribosomal RNA comprises up to 80% of the cellular RNA of Escherichia coli. It is the site of amino acids union.

For detailed structure see chapter-Protein synthesis.

[IX] Significance of nucleic acids:

Deoxyribonucleic acids and ribonucleic acids are the key centres which control all the metabolic activities of cell and in turn the whole organism.

(1) If there occurs any deficiency in the DNA amount, nucleus loses its capacity to support adenosine triphosphate (ATP) synthesis.

(2) Nucleus also becomes inefficient to incorporate amino acids into proteins.

(3) Besides, DNA is the main genetic material constituting genes and chromosomes which carry hereditary information from generation to generation. DNA helps in the RNA synthesis in the cell. If the loops of amphibian oocytic chromosome (lamp brush) are exposed to actinomycin (which has the property to fuse with DNA and thereby causing decrease in DNA amount), RNA synthesis is in­hibited.

(4) Recently, McConnell and Cameron (1968) have produced the evidence that RNA amount increases the intelligence and learning capacity of men.

Molecular arrangement of components in nucleic acids:

In deoxyribonucleic acid (DNA), nucleotides are arranged in the form of helixes or chains spirally coiling around each other. According to Watson and Crick (1962), DNA consists of two helixes coiled about each other. The chain of each helix is made of sugar and phosphate group.

These two helixes are interconnected by the bases through hydrogen bonds. Generally one purine becomes attached with one pyrimidine to form base connection between the chain. Thus, adenine along with thymine, and cytosine along with guanine becomes connected with the sugar molecule of chain alternately.

The direction of one helix is opposite to the other. In DNA one helix serves as a template for the formation of complementary helix, i.e., adenine in one helix forms the thymine in new helix and similarly cytosine effects the formation of guanine in new helix.

In ribonucleic acid (RNA), the nucleotides are not arranged in double helical model but for the most part RNA exists as a single strand. Sometimes it may form also smaller helices in some parts due to folding and convolutions. These secondary helical structures in RNA are regular according to recent research. In the formation of helix, bases become hydrogen-bonded like DNA except uracil substitutes thymine.

Overview of Protein–Nucleic Acid Interactions

Proteins interact with DNA and RNA through similar physical forces, which include electrostatic interactions (salt bridges), dipolar interactions (hydrogen bonding, H-bonds), entropic effects (hydrophobic interactions) and dispersion forces (base stacking). These forces contribute in varying degrees to proteins binding in a sequence-specific (tight) or non–sequence-specific (loose) manner. For example, specific protein–DNA interactions are commonly mediated by an α-helix motif in the protein that inserts itself into the major groove of the DNA, recognizing and interacting with a specific base sequence through H-bonds and salt bridges. In addition, the affinity and specificity of a particular protein–nucleic acid interaction can be increased through protein oligomerization or multi-protein complex formation (e.g., GCN4, glucocorticoid receptor, transcription initiation complexes, mRNA splicing complexes, RISC, etc.). The secondary and tertiary structure formed by nucleic acid sequences (especially in RNA) provides an important additional mechanism by which proteins recognize and bind particular nucleic acid sequences.

Application note: Analysis of Androgen-dependent and -independent Regulation of Transcriptional Activity

This application note describes the use of chromatin immunoprecipitation (ChIP) assay to monitor the interaction between the androgen receptor (AR) and androgen response elements (AREs) in the DNA. The LNcaP prostate cancer cell line was exposed to testosterone and the Thermo Scientific Pierce Magnetic ChIP Kit was used with an anti-AR antibody followed by qPCR. The figure to the left shows changes in AR binding to known AREs (PSA, CDKN1A, FKBP5, TMPRSS2 and IGF-1), with a 300-fold change to the FKBP5 ARE 30 minutes after treatment. This kit allows for the efficient isolation of chromatin-bound DNA by immunoprecipitation in about 8 hours with as few as 10,000 cells.

Our 72-page Protein Interaction Technical Handbook provides protocols and technical and product information to help maximize results for protein interaction studies. The handbook provides background, helpful hints and troubleshooting advice for immunoprecipitation and co-immunoprecipitation assays, pull-down assays, far-western blotting and crosslinking. The handbook also features an expanded section on methods to study protein–nucleic acid interactions, including ChIP, EMSA, and RNA EMSA. The handbook is an essential resource for any laboratory studying protein interactions.

Contents include: Introduction to protein interactions, Co-immunoprecipitation assays, Pull-down assays, Far-western blotting, Protein interaction mapping, Yeast two-hybrid reporter assays, Electrophoretic mobility shift assays [EMSA], Chromatin immunoprecipitation assays (ChIP), Protein–nucleic acid conjugates, and more.

Nucleic acid binding domains

The DNA- or RNA-binding function of a protein is localized in discrete conserved domains within its tertiary structure. An individual protein can have multiple repeats of the same nucleic acid binding domain or can have several different domains found within its structure. The identity of the individual domains and their relative arrangement are functionally important within the protein. Several common DNA binding domains include zinc fingers, helix-turn-helix, helix-loop-helix, winged helix and leucine zipper. RNA-binding specificity and function are constituted by zinc finger, KH, S1, PAZ, PUF, PIWI and RRM (RNA recognition motif) domains. Multiple nucleic acid binding domains with a single protein can increase specificity and affinity of the protein for certain target nucleic acid sequences, mediate a change in the topology of the target nucleic acid, properly position other nucleic acid sequences for recognition or regulate the activity of enzymatic domains within the binding protein.

Complex interactions

Proteins can bind directly to the nucleic acid or indirectly through other bound proteins, effectively creating a hierarchy of interactions. The strength of these interactions influence which assays or approaches are best for studying complex assembly. Some of these interactions are transient and require stabilization through chemical crosslinking prior to isolation of the complexes. Understanding how proteins interact with nucleic acids, determining what proteins are present in these protein-nucleic acid complexes and identifying the nucleic acid sequence/structure required to assemble these complexes are vital to understanding the role these complexes play in regulating cellular processes.

Learn more

Select products

The common DNA-binding domains, helix-turn-helix and zinc finger domains, are incorporated within numerous DNA-binding proteins expressed in the cell. Specificity is derived from higher order interactions involving nucleoprotein complexes. These DNA-binding protein complexes find their target by “sliding” along the genomic DNA until their specific DNA-docking site is discovered. The binding of protein to DNA controls the structure of genomic DNA (chromatin), RNA transcription, and DNA repair mechanisms. The following example illustrates how Invitrogen Dynabeads magnetic beads may be used to recover proteins that bind to nucleic acids.

Isolating DNA- and RNA-binding proteins. DNA- and RNA-binding proteins can be isolated using Invitrogen Dynabeads M-280 Streptavidin. First, the biotin-labeled DNA/RNA target sequence is immobilized onto the streptavidin beads. Then they are incubated with the protein extract, and the bound proteins are isolated using magnetic separation. The protein can then be eluted for characterization. These beads have low nonspecific binding of proteins and are ideal for isolating RNA- and DNA-binding proteins.


A major function of protein–DNA interactions is to manage the extensive length of the genetic material contained in each cell. Chromosomes have evolved to package, store and move DNA throughout the cell, but they also play a role in transcriptional regulation. Chromosome remodeling allows selective portions of a chromosome to be unraveled so that the DNA is available for gene transcription, or to remain tightly packaged so that transcription of the encoded genes is completely silenced.

Transcriptional regulation

Once unraveled, genomic DNA can be transcribed however, not all of the DNA sequence codes for proteins. Only genes are transcribed to produce RNA, and the sequences between the genes (and within) serve to regulate transcription through protein binding. These sequences are important for transcriptional control, and they contain promoters, enhancers, insulators and spacers. Enhancer sequences, which can be many kilobases away from the gene start site, bind proteins and act as beacons to attract the transcriptional machinery. Transcription of a gene is initiated when transcription factor proteins bind to specific DNA promoter sequences located immediately adjacent to the gene transcription start site. This interaction is facilitated through the DNA-binding domain(s) of the transcription factor. Through protein interactions, the trans-activation domain of the transcription factor facilitates binding and localization of the RNA polymerase II holoenzyme to the gene promoter in order to initiate production of a messenger RNA (mRNA)

Learn more

Select products

Proteins interact with RNA in order to splice, protect, translate or degrade the message. The first interaction occurs just after transcriptional initiation, when the complement to the promoter sequence is cleaved out of the mRNA and the capping machinery incorporates a "GpppN" cap at the 5' end of the mRNA. This results in recruitment of elongation factors that regulate the reset of mRNA transcription. Elongation is followed by 3'-end processing and splicing, resulting in a mature RNA transcript that is exported to the cytoplasm for translation. All of these processes require significant protein–RNA interactions and are highly regulated and complex. Many of the regulatory elements for this process reside in noncoding 3' and 5' untranslated regions (UTRs) of the mRNA. However, regulatory microRNAs (miRNAs) also occur in coding regions of introns, as well as exons, noncoding genes and repetitive elements. In recent years, increased emphasis has been placed on the importance of these noncoding RNA sequences and their roles in cellular regulation and disease states. However, tools for the study of critical protein–RNA interactions have been limited. The data shown in this example were generated using an RNA–protein pull-down experiment using the Thermo Scientific Pierce Magnetic RNA-Protein Pull-Down Kit.

RNA–protein pull-down experiment. The binding of wild-type and mutant androgen receptor 3’-UTR RNA to proteins HuR and Poly(C)BP was assayed using the Pierce Magnetic RNA-Protein Pull-Down Kit. Samples were normalized by volume. L = lysate load, FT = flow-through, E = eluate. This kit provides a robust method to enrich protein–RNA interactions.

5' and 3' UTRs

The UTR regions of mRNA contain sequence elements that recruit RNA-binding proteins for post-transcriptional regulation and protein translation. In addition, these elements promote transcript stability or degradation, and they can direct subcellular localization of the RNA. These RNA regulatory elements vary in length but rely on both primary and secondary structure for protein binding. For example, iron regulation in the cell is a tightly regulated process where a protein–RNA interaction is key to maintaining iron homeostasis. Target genes, such as the iron storage protein ferritin or the transferrin receptor, contain a small (

28 nucleotides) consensus iron responsive element (IRE) in their respective 5' or 3' UTR. The iron responsive protein (IRP) responds to cellular iron status by binding the IRE element. Under iron-starved conditions, IRP remains bound to the IRE element to suppress translation of iron storage proteins. Under iron-rich conditions, IRE binding activity of IRP is lost and iron storage proteins are translated. Many of these RNA consensus elements have been identified and classified into different families based on sequence and function. The 3' UTR also contains recognition elements for miRNA, which are responsible for repressing protein translation of the coding mRNA.


MicroRNAs (miRNAs) are a large and ubiquitous class of noncoding RNAs that regulate post-transcriptional silencing of target mRNA. Over 700 miRNAs have been identified in the human genome. MicroRNAs have binding recognition sequences in 57.8% of human mRNAs, with 72% containing of those mRNAs having multiple miRNA recognition sites. The miRNA begins as a 70–100 nucleotide transcribed RNA (pre-miRNA) containing a 6–8 nucleotide seed region at the 5' end for mRNA binding. The miRNA is then cleaved by DROSHA, a nuclear endoribonuclease III. The pre-miRNA then associates with double-stranded RNA-binding proteins and is actively exported to the cytoplasm, dependent on Exportin 5 and Ran GTPase. The pre-miRNA is then further processed in a ribonucleoprotein (RNP) complex consisting of Argonaute proteins and Dicer (endoribonuclease III), which cleaves the pre-miRNA into the mature 19–22 nucleotide miRNA. The miRNA-Argonaute complex then binds to target genes and recruits additional unidentified proteins for regulation of target genes.

MRNA regulation

In most cases, the mRNA regulation results in repression of translation through mRNA degradation, deadenylation or storage in cytoplasmic mRNA processing bodies (P-bodies), but mRNA translation may also be up-regulated. Several models of mRNA repression and degradation have been proposed, but a single accepted model has not yet been adopted. MicroRNA research is rapidly expanding and key protein–RNA interactions are being investigated to further understand the role of miRNA in cell growth, differentiation and carcinogenesis.

4.4: Introduction to Nucleic Acids - Biology

The first isolation of what we now refer to as DNA was accomplished by Johann Friedrich Miescher circa 1870. He reported finding a weakly acidic substance of unknown function in the nuclei of human white blood cells, and named this material "nuclein". A few years later, Miescher separated nuclein into protein and nucleic acid components. In the 1920's nucleic acids were found to be major components of chromosomes, small gene-carrying bodies in the nuclei of complex cells. Elemental analysis of nucleic acids showed the presence of phosphorus, in addition to the usual C, H, N & O. Unlike proteins, nucleic acids contained no sulfur. Complete hydrolysis of chromosomal nucleic acids gave inorganic phosphate, 2-deoxyribose (a previously unknown sugar) and four different heterocyclic bases (shown in the following diagram). To reflect the unusual sugar component, chromosomal nucleic acids are called deoxyribonucleic acids, abbreviated DNA. Analogous nucleic acids in which the sugar component is ribose are termed ribonucleic acids, abbreviated RNA. The acidic character of the nucleic acids was attributed to the phosphoric acid moiety.

The two monocyclic bases shown here are classified as pyrimidines, and the two bicyclic bases are purines. Each has at least one N-H site at which an organic substituent may be attached. They are all polyfunctional bases, and may exist in tautomeric forms.
Base-catalyzed hydrolysis of DNA gave four nucleoside products, which proved to be N-glycosides of 2'-deoxyribose combined with the heterocyclic amines. Structures and names for these nucleosides will be displayed above by clicking on the heterocyclic base diagram . The base components are colored green, and the sugar is black. As noted in the 2'-deoxycytidine structure on the left, the numbering of the sugar carbons makes use of primed numbers to distinguish them from the heterocyclic base sites. The corresponding N-glycosides of the common sugar ribose are the building blocks of RNA, and are named adenosine, cytidine, guanosine and uridine (a thymidine analog missing the methyl group).
From this evidence, nucleic acids may be formulated as alternating copolymers of phosphoric acid ( P ) and nucleosides ( N ), as shown:

At first the four nucleosides, distinguished by prime marks in this crude formula, were assumed to be present in equal amounts, resulting in a uniform structure, such as that of starch. However, a compound of this kind, presumably common to all organisms, was considered too simple to hold the hereditary information known to reside in the chromosomes. This view was challenged in 1944, when Oswald Avery and colleagues demonstrated that bacterial DNA was likely the genetic agent that carried information from one organism to another in a process called "transformation". He concluded that "nucleic acids must be regarded as possessing biological specificity, the chemical basis of which is as yet undetermined." Despite this finding, many scientists continued to believe that chromosomal proteins, which differ across species, between individuals, and even within a given organism, were the locus of an organism's genetic information.
It should be noted that single celled organisms like bacteria do not have a well-defined nucleus. Instead, their single chromosome is associated with specific proteins in a region called a "nucleoid". Nevertheless, the DNA from bacteria has the same composition and general structure as that from multicellular organisms, including human beings.

Views about the role of DNA in inheritance changed in the late 1940's and early 1950's. By conducting a careful analysis of DNA from many sources, Erwin Chargaff found its composition to be species specific. In addition, he found that the amount of adenine (A) always equaled the amount of thymine (T), and the amount of guanine (G) always equaled the amount of cytosine (C), regardless of the DNA source. As set forth in the following table, the ratio of (A+T) to (C+G) varied from 2.70 to 0.35. The last two organisms are bacteria.

Nucleoside Base Distribution in DNA

In a second critical study, Alfred Hershey and Martha Chase showed that when a bacterium is infected and genetically transformed by a virus, at least 80% of the viral DNA enters the bacterial cell and at least 80% of the viral protein remains outside. Together with the Chargaff findings this work established DNA as the repository of the unique genetic characteristics of an organism.

2. The Chemical Nature of DNA

The polymeric structure of DNA may be described in terms of monomeric units of increasing complexity. In the top shaded box of the following illustration, the three relatively simple components mentioned earlier are shown. Below that on the left , formulas for phosphoric acid and a nucleoside are drawn. Condensation polymerization of these leads to the DNA formulation outlined above. Finally, a 5'- monophosphate ester, called a nucleotide may be drawn as a single monomer unit, shown in the shaded box to the right. Since a monophosphate ester of this kind is a strong acid (pK a of 1.0), it will be fully ionized at the usual physiological pH (ca.7.4). Names for these DNA components are given in the table to the right of the diagram. Isomeric 3'-monophospate nucleotides are also known, and both isomers are found in cells. They may be obtained by selective hydrolysis of DNA through the action of nuclease enzymes. Anhydride-like di- and tri-phosphate nucleotides have been identified as important energy carriers in biochemical reactions, the most common being ATP (adenosine 5'-triphosphate).

Names of DNA Base Derivatives

A complete structural representation of a segment of the DNA polymer formed from 5'-nucleotides may be viewed by clicking on the above diagram . Several important characteristics of this formula should be noted.

First, the remaining P-OH function is quite acidic and is completely ionized in biological systems.
Second, the polymer chain is structurally directed. One end (5') is different from the other (3').
Third, although this appears to be a relatively simple polymer, the possible permutations of the four nucleosides in the chain become very large as the chain lengthens.
Fourth, the DNA polymer is much larger than originally believed. Molecular weights for the DNA from multicellular organisms are commonly 10 9 or greater.

Information is stored or encoded in the DNA polymer by the pattern in which the four nucleotides are arranged. To access this information the pattern must be "read" in a linear fashion, just as a bar code is read at a supermarket checkout. Because living organisms are extremely complex, a correspondingly large amount of information related to this complexity must be stored in the DNA. Consequently, the DNA itself must be very large, as noted above. Even the single DNA molecule from an E. coli bacterium is found to have roughly a million nucleotide units in a polymer strand, and would reach a millimeter in length if stretched out. The nuclei of multicellular organisms incorporate chromosomes, which are composed of DNA combined with nuclear proteins called histones. The fruit fly has 8 chromosomes, humans have 46 and dogs 78 (note that the amount of DNA in a cell's nucleus does not correlate with the number of chromosomes). The DNA from the smallest human chromosome is over ten times larger than E. coli DNA, and it has been estimated that the total DNA in a human cell would extend to 2 meters in length if unraveled. Since the nucleus is only about 5μm in diameter, the chromosomal DNA must be packed tightly to fit in that small volume.
In addition to its role as a stable informational library, chromosomal DNA must be structured or organized in such a way that the chemical machinery of the cell will have easy access to that information, in order to make important molecules such as polypeptides. Furthermore, accurate copies of the DNA code must be created as cells divide, with the replicated DNA molecules passed on to subsequent cell generations, as well as to progeny of the organism. The nature of this DNA organization, or secondary structure, will be discussed in a later section.

3. RNA, a Different Nucleic Acid

The high molecular weight nucleic acid, DNA, is found chiefly in the nuclei of complex cells, known as eucaryotic cells, or in the nucleoid regions of procaryotic cells, such as bacteria. It is often associated with proteins that help to pack it in a usable fashion.
In contrast, a lower molecular weight, but much more abundant nucleic acid, RNA , is distributed throughout the cell, most commonly in small numerous organelles called ribosomes . Three kinds of RNA are identified, the largest subgroup (85 to 90%) being ribosomal RNA, rRNA, the major component of ribosomes, together with proteins. The size of rRNA molecules varies, but is generally less than a thousandth the size of DNA. The other forms of RNA are messenger RNA , mRNA, and transfer RNA , tRNA. Both have a more transient existence and are smaller than rRNA.
All these RNA's have similar constitutions, and differ from DNA in two important respects. As shown in the following diagram, the sugar component of RNA is ribose, and the pyrimidine base uracil replaces the thymine base of DNA. The RNA's play a vital role in the transfer of information (transcription) from the DNA library to the protein factories called ribosomes, and in the interpretation of that information (translation) for the synthesis of specific polypeptides. These functions will be described later.

A complete structural representation of a segment of the RNA polymer formed from 5'-nucleotides may be viewed by clicking on the above diagram

4. The Secondary Structure of DNA

In the early 1950's the primary structure of DNA was well established, but a firm understanding of its secondary structure was lacking. Indeed, the situation was similar to that occupied by the proteins a decade earlier, before the alpha helix and pleated sheet structures were proposed by Linus Pauling. Many researchers grappled with this problem, and it was generally conceded that the molar equivalences of base pairs (A & T and C & G) discovered by Chargaff would be an important factor. Rosalind Franklin, working at King's College, London, obtained X-ray diffraction evidence that suggested a long helical structure of uniform thickness. Francis Crick and James Watson, at Cambridge University, considered hydrogen bonded base pairing interactions, and arrived at a double stranded helical model that satisfied most of the known facts, and has been confirmed by subsequent findings.

Base Pairing
Careful examination of the purine and pyrimidine base components of the nucleotides reveals that three of them could exist as hydroxy pyrimidine or purine tautomers, having an aromatic heterocyclic ring. Despite the added stabilization of an aromatic ring, these compounds prefer to adopt amide-like structures. These options are shown in the following diagram, with the more stable tautomer drawn in blue.

A simple model for this tautomerism is provided by 2-hydroxypyridine. As shown on the left below, a compound having this structure might be expected to have phenol-like characteristics, such as an acidic hydroxyl group. However, the boiling point of the actual substance is 100º C greater than phenol and its acidity is 100 times less than expected (pKa = 11.7). These differences agree with the 2-pyridone tautomer, the stable form of the zwitterionic internal salt. Further evidence supporting this assignment will be displayed by clicking on the diagram .
Note that this tautomerism reverses the hydrogen bonding behavior of the nitrogen and oxygen functions (the N-H group of the pyridone becomes a hydrogen bond donor and the carbonyl oxygen an acceptor).

The additional evidence for the pyridone tautomer, that appears above by clicking on the diagram, consists of infrared and carbon nmr absorptions associated with and characteristic of the amide group. The data for 2-pyridone is given on the left. Similar data for the N-methyl derivative, which cannot tautomerize to a pyridine derivative, is presented on the right.

Once they had identified the favored base tautomers in the nucleosides, Watson and Crick were able to propose a complementary pairing, via hydrogen bonding, of guanosine (G) with cytidine (C) and adenosine (A) with thymidine (T). This pairing, which is shown in the following diagram, explained Chargaff's findings beautifully, and led them to suggest a double helix structure for DNA.
Before viewing this double helix structure itself, it is instructive to examine the base pairing interactions in greater detail. The G#C association involves three hydrogen bonds (colored pink), and is therefore stronger than the two-hydrogen bond association of A#T. These base pairings might appear to be arbitrary, but other possibilities suffer destabilizing steric or electronic interactions. By clicking on the diagram two such alternative couplings will be shown. The C#T pairing on the left suffers from carbonyl dipole repulsion, as well as steric crowding of the oxygens. The G#A pairing on the right is also destabilized by steric crowding (circled hydrogens).

A simple mnemonic device for remembering which bases are paired comes from the line construction of the capital letters used to identify the bases. A and T are made up of intersecting straight lines. In contrast, C and G are largely composed of curved lines. The RNA base uracil corresponds to thymine, since U follows T in the alphabet.

The Double Helix

After many trials and modifications, Watson and Crick conceived an ingenious double helix model for the secondary structure of DNA. Two strands of DNA were aligned anti-parallel to each other, i.e. with opposite 3' and 5' ends , as shown in part a of the following diagram. Complementary primary nucleotide structures for each strand allowed intra-strand hydrogen bonding between each pair of bases. These complementary strands are colored red and green in the diagram. Coiling these coupled strands then leads to a double helix structure, shown as cross-linked ribbons in part b of the diagram. The double helix is further stabilized by hydrophobic attractions and pi-stacking of the bases. A space-filling molecular model of a short segment is displayed in part c on the right.
The helix shown here has ten base pairs per turn, and rises 3.4 Å in each turn. This right-handed helix is the favored conformation in aqueous systems, and has been termed the B-helix. As the DNA strands wind around each other, they leave gaps between each set of phosphate backbones. Two alternating grooves result, a wide and deep major groove (ca. 22Å wide), and a shallow and narrow minor groove (ca. 12Å wide). Other molecules, including polypeptides, may insert into these grooves, and in so doing perturb the chemistry of DNA. Other helical structures of DNA have also been observed, and are designated by letters (e.g. A and Z).

The Double Helix Structure for DNA

Space-Filling Molecular Model

A model of a short DNA segment may be examined by

Excellent sites, incorporating Chime and Jmol models for visualizing DNA, has been created by:
Eric Martz, Univ. Mass. Amherst. Click Here.
Frieda Reichsman, Univ. Mass. Amherst. Click Here

1. DNA Replication

In their 1953 announcement of a double helix structure for DNA, Watson and Crick stated, "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." . The essence of this suggestion is that, if separated, each strand of the molecule might act as a template on which a new complementary strand might be assembled, leading finally to two identical DNA molecules. Indeed, replication does take place in this fashion when cells divide, but the events leading up to the actual synthesis of complementary DNA strands are sufficiently complex that they will not be described in any detail.

As depicted in the following drawing, the DNA of a cell is tightly packed into chromosomes. First, the DNA is wrapped around small proteins called histones (colored pink below). These bead-like structures are then further organized and folded into chromatin aggregates that make up the chromosomes. An overall packing efficiency of 7,000 or more is thus achieved. Clearly a sequence of unfolding events must take place before the information encoded in the DNA can be used or replicated.

Once the double stranded DNA is exposed, a group of enzymes act to accomplish its replication. These are described briefly here:

Topoisomerase: This enzyme initiates unwinding of the double helix by cutting one of the strands.
Helicase: This enzyme assists the unwinding. Note that many hydrogen bonds must be broken if the strands are to be separated..
SSB: A single-strand binding-protein stabilizes the separated strands, and prevents them from recombining, so that the polymerization chemistry can function on the individual strands.
DNA Polymerase: This family of enzymes link together nucleotide triphosphate monomers as they hydrogen bond to complementary bases. These enzymes also check for errors (roughly ten per billion), and make corrections.
Ligase: Small unattached DNA segments on a strand are united by this enzyme.

Polymerization of nucleotides takes place by the phosphorylation reaction described by the following equation.

Di- and triphosphate esters have anhydride-like structures and are consequently reactive phosphorylating reagents, just as carboxylic anhydrides are acylating reagents. Since the pyrophosphate anion is a better leaving group than phosphate, triphosphates are more powerful phosphorylating agents than are diphosphates. Formulas for the corresponding 5'-derivatives of adenosine will be displayed by Clicking Here, and similar derivatives exist for the other three common nucleosides. The DNA polymerization process that builds the complementary strands in replication, could in principle take place in two ways. Referring to the general equation above, R 1 could represent the next nucleotide unit to be attached to the growing DNA strand, with R 2 being this strand. Alternatively, these assignments could be reversed. In practice, the former proves to be the best arrangement. Since triphosphates are very reactive, the lifetime of such derivatives in an aqueous environment is relatively short. However, such derivatives of the individual nucleosides are repeatedly synthesized by the cell for a variety of purposes, providing a steady supply of these reagents. In contrast, the growing DNA segment must maintain its functionality over the entire replication process, and can not afford to be changed by a spontaneous hydrolysis event. As a result, these chemical properties are best accommodated by a polymerization process that proceeds at the 3'-end of the growing strand by 5'-phosphorylation involving a nucleotide triphosphate. This process is illustrated by the following animation, which may be activated by clicking on the diagram or reloading the page .

The polymerization mechanism described here is constant. It always extends the developing DNA segment toward the 3'-end (i.e. when a nucleotide triphosphate attaches to the free 3'-hydroxyl group of the strand, a new 3'-hydroxyl is generated). There is sometimes confusion on this point, because the original DNA strand that serves as a template is read from the 3'-end toward the 5'-end, and authors may not be completely clear as to which terminology is used.
Because of the directional demand of the polymerization, one of the DNA strands is easily replicated in a continuous fashion, whereas the other strand can only be replicated in short segmental pieces. This is illustrated in the following diagram. Separation of a portion of the double helix takes place at a site called the replication fork. As replication of the separate strands occurs, the replication fork moves away (to the left in the diagram), unwinding additional lengths of DNA. Since the fork in the diagram is moving toward the 5'-end of the red-colored strand, replication of this strand may take place in a continuous fashion (building the new green strand in a 5' to 3' direction). This continuously formed new strand is called the leading strand. In contrast, the replication fork moves toward the 3'-end of the original green strand, preventing continuous polymerization of a complementary new red strand. Short segments of complementary DNA, called Okazaki fragments, are produced, and these are linked together later by the enzyme ligase. This new DNA strand is called the lagging strand.

When you consider that a human cell has roughly 10 9 base pairs in its DNA, and may divide into identical daughter cells in 14 to 24 hours, the efficiency of DNA replication must be extraordinary. The procedure described above will replicate about 50 nucleotides per second, so there must be many thousand such replication sites in action during cell division. A given length of double stranded DNA may undergo strand unwinding at numerous sites in response to promoter actions. The unraveled "bubble" of single stranded DNA has two replication forks, so assembly of new complementary strands may proceed in two directions. The polymerizations associated with several such bubbles fuse together to achieve full replication of the entire DNA double helix. A cartoon illustrating these concerted replications will appear by clicking on the above diagram . Note that the events shown proceed from top to bottom in the diagram.

2. Repair of DNA Damage and Replication Errors

One of the benefits of the double stranded DNA structure is that it lends itself to repair, when structural damage or replication errors occur. Several kinds of chemical change may cause damage to DNA:

Spontaneous hydrolysis of a nucleoside removes the heterocyclic base component.
Spontaneous hydrolysis of cytosine changes it to a uracil.
Various toxic metabolites may oxidize or methylate heterocyclic base components.
Ultraviolet light may dimerize adjacent cytosine or thymine bases.

All these transformations disrupt base pairing at the site of the change, and this produces a structural deformation in the double helix.. Inspection-repair enzymes detect such deformations, and use the undamaged nucleotide at that site as a template for replacing the damaged unit. These repairs reduce errors in DNA structure from about one in ten million to one per trillion.

RNA and Protein Synthesis

The genetic information stored in DNA molecules is used as a blueprint for making proteins. Why proteins? Because these macromolecules have diverse primary, secondary and tertiary structures that equip them to carry out the numerous functions necessary to maintain a living organism. As noted in the protein chapter, these functions include:

Structural integrity (hair, horn, eye lenses etc.).
Molecular recognition and signaling (antibodies and hormones).
Catalysis of reactions (enzymes)..
Molecular transport (hemoglobin transports oxygen).
Movement (pumps and motors).

The critical importance of proteins in life processes is demonstrated by numerous genetic diseases, in which small modifications in primary structure produce debilitating and often disastrous consequences. Such genetic diseases include Tay-Sachs, phenylketonuria (PKU), sickel cell anemia, achondroplasia, and Parkinson disease. The unavoidable conclusion is that proteins are of central importance in living cells, and that proteins must therefore be continuously prepared with high structural fidelity by appropriate cellular chemistry.
Early geneticists identified genes as hereditary units that determined the appearance and / or function of an organism (i.e. its phenotype). We now define genes as sequences of DNA that occupy specific locations on a chromosome. The original proposal that each gene controlled the formation of a single enzyme has since been modified as: one gene = one polypeptide. The intriguing question of how the information encoded in DNA is converted to the actual construction of a specific polypeptide has been the subject of numerous studies, which have created the modern field of Molecular Biology.

1. The Central Dogma and Transcription

Francis Crick proposed that information flows from DNA to RNA in a process called transcription, and is then used to synthesize polypeptides by a process called translation. Transcription takes place in a manner similar to DNA replication. A characteristic sequence of nucleotides marks the beginning of a gene on the DNA strand, and this region binds to a promoter protein that initiates RNA synthesis. The double stranded structure unwinds at the promoter site., and one of the strands serves as a template for RNA formation, as depicted in the following diagram. The RNA molecule thus formed is single stranded, and serves to carry information from DNA to the protein synthesis machinery called ribosomes. These RNA molecules are therefore called messenger-RNA (mRNA).
To summarize: a gene is a stretch of DNA that contains a pattern for the amino acid sequence of a protein. In order to actually make this protein, the relevant DNA segment is first copied into messenger-RNA. The cell then synthesizes the protein, using the mRNA as a template.

An important distinction must be made here. One of the DNA strands in the double helix holds the genetic information used for protein synthesis. This is called the sense strand, or information strand (colored red above). The complementary strand that binds to the sense strand is called the anti-sense strand (colored green), and it serves as a template for generating a mRNA molecule that delivers a copy of the sense strand information to a ribosome. The promoter protein binds to a specific nucleotide sequence that identifies the sense strand, relative to the anti-sense strand. RNA synthesis is then initiated in the 3' direction, as nucleotide triphosphates bind to complementary bases on the template strand, and are joined by phosphate diester linkages. An animation of this process for DNA replication was presented earlier. A characteristic "stop sequence" of nucleotides terminates the RNA synthesis. The messenger molecule (colored orange above) is released into the cytoplasm to find a ribosome, and the DNA then rewinds to its double helix structure.
In eucaryotic cells the initially transcribed m-RNA molecule is usually modified and shortened by an "editing" process that removes irrelevant material. The DNA of such organisms is often thousands of times larger and more complex than that composing the single chromosome of a procaryotic bacterial cell. This difference is due in part to repetitive nucleotide sequences (ca. 25% in the human genome). Furthermore, over 95% of human DNA is found in intervening sequences that separate genes and parts of genes. The informational DNA segments that make up genes are called exons, and the noncoding segments are called introns. Before the mRNA molecule leaves the nucleus, the nonsense bases that make up the introns are cut out, and the informationally useful exons are joined together in a step known as RNA splicing. In this fashion shorter mRNA molecules carrying the blueprint for a specific protein are sent on their way to the ribosome factories.

The Central Dogma of molecular biology, which at first was formulated as a simple linear progression of information from DNA to RNA to Protein, is summarized in the following illustration. The replication process on the left consists of passing information from a parent DNA molecule to daughter molecules. The middle transcription process copies this information to a mRNA molecule. Finally, this information is used by the chemical machinery of the ribosome to make polypeptides.

As more has been learned about these relationships, the central dogma has been refined to the representation displayed on the right. The dark blue arrows show the general, well demonstrated, information transfers noted above. It is now known that an RNA-dependent DNA polymerase enzyme, known as a reverse transcriptase, is able to transcribe a single-stranded RNA sequence into double-stranded DNA (magenta arrow). Such enzymes are found in all cells and are an essential component of retroviruses (e.g. HIV), which require RNA replication of their genomes (green arrow). Direct translation of DNA information into protein synthesis (orange arrow) has not yet been observed in a living organism. Finally, proteins appear to be an informational dead end, and do not provide a structural blueprint for either RNA or DNA.
In the following section the last fundamental relationship, that of structural information translation from mRNA to protein, will be described

2. Translation

Translation is a more complex process than transcription. This would, of course, be expected. After all, the coded messages produced by the German Enigma machine could be copied easily, but required a considerable decoding effort before they could be read with understanding. In a similar sense, DNA replication is simply a complementary base pairing exercise, but the translation of the four letter (bases) alphabet code of RNA to the twenty letter (amino acids) alphabet of protein literature is far from trivial. Clearly, there could not be a direct one-to-one correlation of bases to amino acids, so the nucleotide letters must form short words or codons that define specific amino acids. Many questions pertaining to this genetic code were posed in the late 1950's:

• How many RNA nucleotide bases designate a specific amino acid?
If separate groups of nucleotides, called codons, serve this purpose, at least three are needed. There are 4 3 = 64 different nucleotide triplets, compared with 4 2 = 16 possible pairs.
• Are the codons linked separately or do they overlap?
Sequentially joined triplet codons will result in a nucleotide chain three times longer than the protein it describes. If overlapping codons are used then fewer total nucleotides would be required.
• If triplet segments of mRNA designate specific amino acids in the protein, how are the codons identified?
For the sequence

are the codons CUA & GGU or

• Are all the codon words the same size?
In Morse code the most widely used letters are shorter than less common letters. Perhaps nature employs a similar scheme.

Physicists and mathematicians, as well as chemists and microbiologists all contributed to unravelling the genetic code. Although earlier proposals assumed efficient relationships that correlated the nucleotide codons uniquely with the twenty fundamental amino acids, it is now apparent that there is considerable redundancy in the code as it now operates. Furthermore, the code consists exclusively of non-overlapping triplet codons.
Clever experiments provided some of the earliest breaks in deciphering the genetic code. Marshall Nirenberg found that RNA from many different organisms could initiate specific protein synthesis when combined with broken E.coli cells (the enzymes remain active). A synthetic polyuridine RNA induced synthesis of poly-phenylalanine, so the UUU codon designated phenylalanine. Likewise an alternating

RNA led to synthesis of a

The following table presents the present day interpretation of the genetic code. Note that this is the RNA alphabet, and an equivalent DNA codon table would have all the U nucleotides replaced by T. Methionine and tryptophan are uniquely represented by a single codon. At the other extreme, leucine is represented by eight codons. The average redundancy for the twenty amino acids is about three. Also, there are three stop codons that terminate polypeptide synthesis.

RNA Codons for Protein Synthesis

The translation process is fundamentally straightforward. The mRNA strand bearing the transcribed code for synthesis of a protein interacts with relatively small RNA molecules (about 70-nucleotides) to which individual amino acids have been attached by an ester bond at the 3'-end. These transfer RNA's (tRNA) have distinctive three-dimensional structures consisting of loops of single-stranded RNA connected by double stranded segments. This cloverleaf secondary structure is further wrapped into an "L-shaped" assembly, having the amino acid at the end of one arm, and a characteristic anti-codon region at the other end. The anti-codon consists of a nucleotide triplet that is the complement of the amino acid's codon(s). Models of two such tRNA molecules are shown to the right. When read from the top to the bottom, the anti-codons depicted here should complement a codon in the previous table.
Cloverleaf cartoons of three other tRNA molecules will be shown on the right by clicking on the diagram .

A cell's protein synthesis takes place in organelles called ribosomes. Ribosomes are complex structures made up of two distinct and separable subunits (one about twice the size of the other). Each subunit is composed of one or two RNA molecules (60-70%) associated with 20 to 40 small proteins (30-40%). The ribosome accepts a mRNA molecule, binding initially to a characteristic nucleotide sequence at the 5'-end (colored light blue in the following diagram). This unique binding assures that polypeptide synthesis starts at the right codon. A tRNA molecule with the appropriate anti-codon then attaches at the starting point and this is followed by a series of adjacent tRNA attachments, peptide bond formation and shifts of the ribosome along the mRNA chain to expose new codons to the ribosomal chemistry.
The following diagram is designed as a slide show illustrating these steps. The outcome is synthesis of a polypeptide chain corresponding to the mRNA blueprint. A "stop codon" at a designated position on the mRNA terminates the synthesis by introduction of a "Release Factor".

To visit an informative Tour of the Ribosome site, created by Wayne Decatur, Univ. Mass. Amherst Click Here.

3. Post-translational Modification

Once a peptide or protein has been synthesized and released from the ribosome it often undergoes further chemical transformation. This post-translational modification may involve the attachment of other moieties such as acyl groups, alkyl groups, phosphates, sulfates, lipids and carbohydrates. Functional changes such as dehydration, amidation, hydrolysis and oxidation (e.g. disulfide bond formation) are also common. In this manner the limited array of twenty amino acids designated by the codons may be expanded in a variety of ways to enable proper functioning of the resulting protein. Since these post-translational reactions are generally catalyzed by enzymes, it may be said: "Virtually every molecule in a cell is made by the ribosome or by enzymes made by the ribosome."
Modifications, like phosphorylation and citrullination, are part of common mechanisms for controlling the behavior of a protein. As shown on the left below, citrullination is the post-translational modification of the amino acid arginine into the amino acid citrulline. Arginine is positively charged at a neutral pH, whereas citrulline is uncharged, so this change increases the hydrophobicity of a protein. Phosphorylation of serine, threonine or tyrosine residues renders them more hydrophilic, but such changes are usually transient, serving to regulate the biological activity of the protein. Other important functional changes include iodination of tyrosine residues in the peptide thyroglobulin by action of the enzyme thyroperoxidase. The monoiodotyrosine and diiodotyrosine formed in this manner are then linked to form the thyroid hormones T3 and T4, shown on the right below.

Amino acids may be enzymatically removed from the amino end of the protein. Because the "start" codon on mRNA codes for the amino acid methionine, this amino acid is usually removed from the resulting protein during post-translational modification. Peptide chains may also be cut in the middle to form shorter strands. Thus, insulin is initially synthesized as a 105 residue preprotein. The 24-amino acid signal peptide is removed, yielding a proinsulin peptide. This folds and forms disulfide bonds between cysteines 7 and 67 and between 19 and 80. Such dimeric cysteines, joined by a disulfide bond, are named cystine. A protease then cleaves the peptide at arg31 and arg60, with loss of the 32-60 sequence (chain C). Removal of arg31 yields mature insulin, with the A and B chains held together by disulfide bonds and a third cystine moiety in chain A. The following cartoon illustrates this chain of events.

Nisin is a polypeptide (34 amino acids) made by the bacterium Lactococcus lactis. Nisin kills gram positive bacteria by binding to their membranes and targeting lipid II, an essential precursor of cell wall synthesis. Such antimicrobial peptides are a growing family of compounds which have received the name lantibiotics due to the presence of lanthionine, a nonproteinogenic amino acid with the chemical formula HO2C-CH(NH2)-CH2-S-CH2-CH(NH2)-CO2H. Lanthionine is composed of two alanine residues that are crosslinked on their β-carbon atoms by a thioether linkage (i.e. it is the monosulfide analog of the disulfide cystine). Lantibiotics are unique in that they are ribosomally synthesized as prepeptides, followed by post-translational processing of a number of amino acids (e.g. serine, threonine and cysteine) into dehydro residues and thioether crossbridges. Nisin is the only bacteriocin that is accepted as a food preservative. Several nisin subtypes that differ in amino acid composition and biological activity are known. A typical structure is drawn below, and a Jmol model will be presented by clicking on the diagram.

The bacterial cell wall is a cross-linked glycan polymer that surrounds bacterial cells, dictates their cell shape, and prevents them from breaking due to environmental changes in osmotic pressure. This wall consists mainly of peptidoglycan or murein, a three-dimensional polymer of sugars and amino acids located on the exterior of the cytoplasmic membrane. The monomer units are composed of two amino sugars, N-acetylglucosamine (NAG) and N-acetylmuramic acid (NAM), shown on the right. Transglycosidase enzymes join these units by glycoside bonds, and they are further interlinked to each other via peptide cross-links between the pentapeptide moieties that are attached to the NAM residues. Peptidoglycan subunits are assembled on the cytoplasmic side of the bacterial membrane from a polyisoprenoid anchor. Lipid II, a membrane-anchored cell-wall precursor that is essential for bacterial cell-wall biosynthesis, is one of the key components in the synthesis of peptidoglycan. Peptidoglycan synthesis via polymerization of Lipid II is illustrated in the following diagram. Cross-linking of the peptide side chains is then effected by transpeptidase enzymes. A model of Lipid II complexed with nisin may be examined as part of the previous Jmol display.

In order for bacteria to divide by binary fission and increase their size following division, links in the peptidoglycan must be broken, new peptidoglycan monomers must be inserted, and the peptide cross links must be resealed. Transglycosidase enzymes catalyze the formation of glycosidic bonds between the NAM and NAG of the peptidoglycan monomers and the NAG and NAM of the existing peptidoglycan. Finally, transpeptidase enzymes reform the peptide cross-links between the rows and layers of peptidoglycan making the wall strong. Many antibiotic drugs, including penicillin, target the chemistry of cell wall formation. The effectiveness of choosing Lipid II for an antibacterial strategy is highlighted by the fact that it is the target for at least four different classes of antibiotic, including the clinically important glycopeptide antibiotic vancomycin. The growing problem of bacterial resistance to many current drugs, including vancomycin, has led to increasing interest in the therapeutic potential of other classes of compound that target Lipid II. Lantibiotics such as nisin are part of this interest.

For a speculative discussion of why nature selected the components and functional groups found in the nucleic acids Click Here.

This page is the property of William Reusch. Comments, questions and errors should be sent to [email protected]
These pages are provided to the IOCD to assist in capacity building in chemical education. 05/05/2013

Beginning of supplementary topics

Analysis of Structural Similarities and Differences between DNA and RNA

1. Background

We know that living organisms have the ability to reproduce and to pass many of their characteristics on to their offspring. From this we may infer that all organisms have genetic substances and an associated chemistry that enable inheritance to occur. It is instructive to consider the essential requirements such genetic materials must fullfill.

Biologically useful information, especially instructions for protein synthesis, must be incorporated in the material.

The inherited information must be stable (unchanged) over the lifetime of the organism if accurate copies are to be conveyed to the offspring. Infrequent changes may take place (see mutability).

A method of faithfully replicating the information encoded in the material, and transmitting this copy to the offspring must exist.

Despite the inherent stability noted above, the material must be capable of incorporating stable structural change, and passing this change on to succeeding generations.

Since this genetic substance has been identified as the nucleic acids DNA and RNA, it is instructive to examine the manner in which these polymers satisfy the above requirements.

2. Information Storage

The complexity of life suggests that even simple organisms will require very large inheritance libraries. Although the four nucleotides that make up of DNA might appear to be too simple for this task, the enormous size of the polymer and the permutations of the monomers within the chain meet the challenge easily. After all, the words and graphics in this document are all presented to the computer as combinations of only two characters, zeros and ones (the binary number system). DNA has four letters in its alphabet (A, C, G & T), so the number of words that can be formed increase exponentially with the number of letters per word. Thus, there are 4 2 or 16 two letter words, and 4 3 or 64 three letter words.

Assuring the stability of information encoded by the DNA alphabet presents a serious challenge. If the letters of this alphabet are to be strung together in a specific way on the polymer chain, chemical reactions for attaching (and removing) them must be available. Simple carboxylic ester or amide links might appear suitable for this purpose (note step-growth polymerization), but these are used in lipids and polypeptides, so a separate enzymatic machinery would be needed to keep the information processing operations apart from other molecular transformations.
The overall stability of such covalent links presents a more serious problem. Under physiological conditions (aqueous, pH near 7.4 & 27 to 37º C) esters are slowly hydrolyzed. Amides are more stable, but even a hydrolytic cleavage of one bond per hour would be devastating to a polymer having tens of thousands to millions such links. Furthermore, short difunctional linking groups, such as carbonates, oxylates and malonates show enhanced reactivity, and their parent acids are unstable or toxic.

Ester Hydrolysis at 35º C and pH 7

Phosphate is an ubiquitous inorganic nutrient. Mono, di and triesters of the corresponding acid (phosphoric acid) are all known. Because of their acidity (pKa ≈ 2), the mono and diesters are negatively charged at physiological pH, rendering them less susceptible to nucleophilic attack. The influence of negative charge on the rate of nucleophilic hydrolysis of some representative esters is shown in the table on the right. Clearly, a polymer in which monomer units are joined by negatively charged diphosphate ester links should be substantially more stable than one composed of carboxylate ester bonds. The negative charge found on all biological phosphate derivatives serves other purposes as well.
The diphosphate ester links that join the nucleotides units of DNA are formed by phosphorylation reactions involving nucleotide triphosphate reagents. These reagents are the phosphoric acid analogs of carboxylic acid anhydrides, a functional group that would not survive the aqueous environment of a cell. The high density of negative charge on the triphosphate function not only solubilizes the organic moiety to which it is attached, but also reduces the rate at which it is hydrolyzed.
Living cells must conserve and employ their chemical reagents within a volume defined and enclosed by a membrane barrier. These lipid bilayer membranes have hydrophobic interiors, which resist the passage of ions. Indeed, special trans-membrane structures called ion channels exist so that controlled ion transport across a membrane may take place. Small neutral organic molecules, such as adenosine, cytidine and guanosine, may pass through lipid membranes, albeit at a reduced rate, but their mono, di and triphosphate derivatives are more tightly sequestered in the cell.

3. Why is 2'-Deoxyribose the Sugar Moiety in DNA?

Common perhydroxylated sugars, such as glucose and ribose, are formed in nature as products of the reductive condensation of carbon dioxide we call photosynthesis. The formation of deoxysugars requires additional biological reduction steps, so it is reasonable to speculate why DNA makes use of the less common 2'-deoxyribose, when ribose itself serves well for RNA. At least two problems associated with the extra hydroxyl group in ribose may be noted. First, the additional bulk and hydrogen bonding character of the 2'-OH interfere with a uniform double helix structure, preventing the efficient packing of such a molecule in the chromosome. Second, RNA undergoes spontaneous hydrolytic cleavage about one hundred times faster than DNA. This is believed due to intramolecular attack of the 2'-hydroxyl function on the neighboring phosphate diester, yielding a 2',3'-cyclic phosphate. If stability over the lifetime of an organism is an essential characteristic of a gene, then nature's selection of 2'-deoxyribose for DNA makes sense. The following diagram illustrates the intramolecular cleavage reaction in a strand of RNA.

Structural stability is not a serious challenge for RNA. The transcripted information carried by mRNA must be secure for only a few hours, as it is transported to a ribosome. Once in the ribosome it is surrounded by structural and enzymatic segments that immediately incorporate its codons for protein synthesis. The tRNA molecules that carry amino acids to the ribosome are similarly short lived, and are in fact continuously recycled by the cellular chemistry.

4. The Thymine vs. Uracil Issue

Structural formulas for the three pyrimidine bases, cytosine, thymine and uracil are shown on the right. The carbon atoms that are part of these compounds may be categorized as follows. All of these compounds are apparently put together from a three-carbon malonate-like precursor (blue colored bonds) and a single high oxidation state carbon species (colored red). Such biosynthetic intermediates are well established. Thymine is unique in having an additional carbon, the green methyl group. Biosynthesis of this compound must involve additional steps, thus adding constructional complexity to the DNA molecules in which it replaces uracil.
The reason for the substitution of thymine for uracil in DNA may be associated with the repair mechanisms by which the cell corrects damage to its DNA. One source of error in the code is the slow hydrolysis of heterocyclic enamines, such as cytosine and guanine, to their corresponding lactams. This changes the structure of the base, and disrupts base pairing in a manner that can be identified and then repaired. However, the hydrolysis product from cytosine is uracil, and this mismatched species must somehow be distinguished from the uracil-like base that belongs in the DNA. The extra methyl group serves this role nicely.

For a more complete discussion of some of the issues touched on here see an article titled "Why Nature Chose Phosphates",
authored by F. H .Westheimer, which appeared in the March 6th, 1987 issue of Science.


Nucleic acid is the overall name for both DNA (Deoxyribonucleic acid) and RNA (Ribonucleic acid). It is a biomolecule found in all living organism that stores genetic information that is transferred from parents to their offspring from one generation to another. Nucleic acid is made up of monomers called nucleotides.

Composition of nucleotides

Nucleotides are made up of three major components:

5-Carbon sugar

This includes the 5 – carbon (pentose) sugar such as deoxyribose (found in DNA) and ribose (found in RNA).

Phosphate functional group

The phosphate group connects the pentose sugar found in one nucleotide to the sugar in another nucleotide.

Nitrogenous bases

These are aromatic nitrogen containing molecular components of the nucleic acid that serves as proton acceptor. The nitrogenous bases in nucleic acid include:

Purine bases include adenine and guanine while the pyrimidine bases include cytosine, thymine (in DNA only) and uracil (in RNA only).

Differences between the DNA and RNA

Deoxyribose nucleic acid Ribose nucleic acid
Structurally contains deoxyribose sugar Structurally contains ribose sugar
It has double helix structure It does not have double helix structure
It is thermo-stable It is thermo-labile

All eukaryotes and most prokaryotes have DNA as their main genetic material. However, some retrovirus such as Human immunodeficiency virus and flavivirus such Zika virus have RNA as their genetic material while a viruses such as Hepatitis B virus, Herpes simplex virus and small pox virus have DNA as its genetic material.

If you like this post, please share and patronize us!

Hello there!,
I am Olatomide Samuel, a microbiologist and an entrepreneur, I currently manage ATG Ventures that deals inthe sale of Airtime, Data bundles, Cable TV Subscription, Electricity Bills and WAEC result checker.
Thanks for reading this post, do endeavor to patronize us too.

Watch the video: Nucleic Acids Form 4 (January 2022).