What is the reason for having an extra recognition site for a restriction enzyme?

Can the size of a supercoiled plasmid DNA be determined by using standard DNA size fragment electrophoresed in parallel? 2. An unknown DNA molecule was cleaved using several restriction enzymes individually and in various combinations. The DNA fragment sizes were determined by agarose gel electrophoresis and the restriction enzyme recognition sites were mapped. Subsequently, the DNA was sequenced and an extra recognition site was found for one of the enzymes. However, all the other mapping data was consistent, within experimental errors, with sequence data. What are the simplest explanations for this discrepancy? Assume the DNA sequence had no errors.

My answer:

the extra enzyme recognition site was cut-off with the fragment due to partial digest that occurred because of insufficient enzyme was added or the rxn was stopped after a very short time.

Am i wrong? my logic here was such that they only found the extra recognition site after they sequenced the DNA thus it was not available during the rxn and thus it was either cut off unevenly in one of the fragments OR it could have had the complementary sequence to it that was methylated that hid it.

Detailed answers are appreciated!!

The scenario in your question can be explained if the "extra" recognition site mapped very close to another recognition site for the same enzyme.

For example, if there were 2 EcoRI sites separated by between 0-50 bp of plasmid DNA, then depending on the type of gel being used, the resulting tiny DNA fragment generated could be difficult to detect (either because small fragments run off the bottom of the gel, or because they don't resolve well unless the gel contains a higher percentage of agarose or acrylamide).

Similarly, depending on the size of the adjacent fragments in the various digests, and this assay's limitations on estimating fragment sizes, you might not detect that an "extra" fragment was generated, and "missing".

Chapter 20 - Recombinant DNA Technology

an enzyme produced chiefly by certain bacteria, having the property of cleaving DNA molecules at or near a specific sequence of bases/recognition sequences.

binds at recognition sequence or restriction site

then cuts both strands of DNA within that
sequence by cleaving the phosphodiester backbone of DNA

(commonly referred to as "digestion of DNA")

produced in bacteria as def mech against infection by bacteriophage. restrict/prevent viral infection by degradation of DNA

usefulness in cloning is derived in the innate ability to accurately and reproducibly cut genomic DNA into fragments

present randomly in the genome.

# of dna restriction fragments produced by digesting DNA with a particular enzyme can be istimated from the # of times a given restriction enzyme cuts the dna (4^n, n = # of base recognition sequences)

stick together by hydrogen bonding between base pairs

capable of replicating in host cells - allow Independent replication of vector dna and any dna fragment it carries

distinguish host cells that have taken up vectors from those who have not --- selectable marker gene

Key Properties:
Can replicate cloned DNA fragments in host cell

Have several restriction enzyme sites to allow insertion of DNA fragment

Carry selectable gene marker to distinguish host cells that have taken them up from those that have not

exhibited by recognition sequences - exhibits form of symmetry

prod. by luI and BalI when both strands are cut at same nucleotide pair creating fragments with double stranded ends

extrachromosomal, double stranded dna that replicated indep from chromosomes within bacterial cells

plasmids are introduced into bacteria by this process

treating method 1: treatment with Ca ions and brief heat shock to pulse dna into cells

another technique: electroporation

dna restriction fragments (from dna to be cloned) are added to linearized vector in pres. of ligase

sticky cohesive ends anneal, joining the dna to be cloned and the plasmid

dna ligase is used to create phosphodiester bonds to seal nicks in dna backbone, producing recombinant dna

the recomb. dna is then introduced to bacterial host by transformation.

if a dna fragment is inserted anywhere in the multiple cloning site, the lacZ gene is disrupted and will not produce functional copies of b galactosidase. Transformed bacteria in this experiment are plated on agar plates that contain an antibiotic (ampiicillin)

used to cleave disaccharid lactose into its component monosaccharides glucose and galactose.

as a result bacteria cells carrying nonrecombinant plasmids (those that have self ligated and do not contain inserted DNA) have a functional lacZ gene and produce b galactosidase, which cleaves x gal in the medium and these cells turn blue.

inside the bacteria the vectors replicate and form many copies of infective phage each of which carries a dna insert

very large, but v low but low copy number (typ 1-2 copies / cell)

used to clone large fragments of DNA

has telomeres, ORI, and centromeres ---> these components are joined to selectable marker genes and to a cluster of restriction enzyme recognition sequences for insertion of foreign dna

a DNA vector, such as a plasmid, that carries a DNA sequence for the expression of an inserted gene into mRNA and protein in a host cell

dna copies made from mRNA molecules isolated from cultures cells or tissue sample thus = the genese being expressed in cell at the time the library was made

- lack non-coding regions (introns)

- only include genes that are expressed in the tissue from which mRNA was isolated

-also called expression libraries

rapid method of dna cloning

Copies specific DNA sequence via in vitro reactions can amplify target DNA sequences present in very small quantities

requires two primers
1. One complementary to 5′ end
2. Another complementary to 3′ end

primers anneal to denatured DNA

Complementary strands are synthesized by heat-stable DNA polymerase

one cycle doubles the number of dna molecules

is a widely used technique used in molecular biology to exponentially amplify a single copy or a few copies of a specific segment of DNA to generate thousands to millions of copies of a particular DNA sequence.

Add some NTPs, short primer
sequences, and heat-stable

Primers hybridize and allow
DNApol to generate new strands

double stranded dna to be cloned is denatured into single strands. can come from genomic dna, mummified remains, fossils, forensic samples, etc.

heating to 92-95deg C for about 1 min

temp of reaction is lowered to temp of 45-65deg C --> causesprimer binding (hybridization/annealing) primers flank dna ---> primers are the initiation point for dna poly to synthes comp bases

factors: primer length, base composition of primers (GC rich primers are thermally more stable than AT rich primers)

primers can be designed to distinguish between target sequences that differ by only a single nucleotide - makes it possible to synth allele specific probes for genetic testing

Location and nature
of mutation can be
determined quickly

Allele-specific probes for genetic testing can be synthesized, making PCR important for diagnosing genetic disorders

key diagnostic methodology for detecting bacteria and viruses ( HEP C HIV) in humans and pathogenic bacteria such as e coli and staphylococcus aureus in contaminated food

studying samples of single cells, fossils, or a crime scene.

DNA contains thousands of
these sites (many palindromic)

powerful methodology for studying gene expression ---> /mRNA produciton by cells or tissues./

1. RNA isolated from cells

2. Reverse transcriptase is used to generate ds-cDNA

3. PCR with primers for gene of interest

Cut it: restriction enzymes

minor contaminations of sample dna

Insert the fragment of the genome
Into a living organism and watch It grow

Once you have enough, remove
the organism, keep the DNA

the phosph backbone of dna is highly negatively charged, therefore, DNA will migrate in an electric field

method of separating serum proteins by electrical charge

The movement of suspended particles through a fluid or gel under the action of an electromotive force applied to electrodes in contact with the suspension.

-Agarose, a copolymer of mannose and galactose, when melted and cooled, forms a porous gel

-The phosphate backbone of DNA is highly negatively charged, therefore DNA will migrate in an electric field

Provides information on length of cloned insert and location of restriction sites within clone- provides info / location of restriction enzyme cleavage sites within the clone.

Methodology: Hybridization between complementary nucleic acid DNA molecules

Carried out with isolated chromosomes on slide or in situ in tissue sections or entire organisms

Helpful when embryos are used for various studies in developmental genetics

limiting factor: getting probe to destination site what cells will uptake it and being able to see the fluorescence

Deoxynucleotide with a hydrogen at 3′ instead of an OH

knockout mice have revo research

to manipulate specific allele locus, or base seuqence and learn its function on gene of interest

Comparing sequences is fundamental to understanding why alleles vary in function.

Dideoxynucleotides halt DNA polymerization at each base, generating sequences of various lengths that encompass the entire original sequence. Terminated fragments are electrophoresed and the original sequence can be deduced.

a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication

the dna to be sequenced is is mixed with a primer that is complementary to the target dna or vector along with dna plym and four deoxyribonucleotide triphosphates (dATP, dCTP, dGTP, dTTP)

called chain termination nucleotides - bc they lack the oxygen required to form a phosphodiester bond with another nucleotide

HindIII restriction enzyme

Please see the product's Certificate of Analysis for information about storage conditions, product components, and technical specifications. Please see the Kit Components List to determine kit components. Certificates of Analysis and Kit Components Lists are located under the Documents tab.

Takara Bio USA, Inc.
United States/Canada: +1.800.662.2566 &bull Asia Pacific: +1.650.919.7300 &bull Europe: +33.(0)1.3904.6880 &bull Japan: +81.(0)77.565.6999
FOR RESEARCH USE ONLY. NOT FOR USE IN DIAGNOSTIC PROCEDURES. © 2021 Takara Bio Inc. All Rights Reserved. All trademarks are the property of Takara Bio Inc. or its affiliate(s) in the U.S. and/or other countries or their respective owners. Certain trademarks may not be registered in all jurisdictions. Additional product, intellectual property, and restricted use information is available at

Takara Bio Europe is a member of the Takara Bio Group, a leading life sciences company that is committed to improving the human condition through biotechnology. Through our Takara, Clontech, and Cellartis brands, our mission is to develop high-quality innovative tools and services to accelerate discovery.


Our editors will review what you’ve submitted and determine whether to revise the article.

Nuclease, any enzyme that cleaves nucleic acids. Nucleases, which belong to the class of enzymes called hydrolases, are usually specific in action, ribonucleases acting only upon ribonucleic acids (RNA) and deoxyribonucleases acting only upon deoxyribonucleic acids ( DNA). Some enzymes having a general action (such as phosphoesterases, which hydrolyze phosphoric acid esters) can be called nucleases because nucleic acids are susceptible to their action. Nucleases are found in both animals and plants.

Restriction enzymes are nucleases that split only those DNA molecules in which they recognize particular subunits. Some split the target DNA molecule at random sites (Type I), but others split the molecule only at the recognition site (Type II) or at a fixed distance from the recognition site (Type III). Type II and III restriction enzymes are powerful tools in the elucidation of the sequence of bases in DNA molecules. They play a fundamental role in the field of recombinant DNA technology, or genetic engineering.

A restriction enzyme from Hemophilus influenzae : II. Base sequence of the recognition site☆

Hemophilus influenzae strain Rd contains an enzyme, endonuolease R, which specifically degrades foreign DNA. With phage T7 DNA as substrate the endonuclease introduces a limited number (about 40) double-strand breaks (5′-phosphoryl, 3′-hydroxyl). The limit product has an average length of about 1000 nucleotide pairs and contains no single-strand breaks. We have explored the nucleotide sequences at the 5′-ends of the limit product by labeling the 5′- phosphoryl groups (using polynucleotide kinase) and characterizing the labeled fragments released by various nucleases. Two classes of 5′-terminal sequences were obtained: pApApCpNp … (60%) and pGpApCpNp … (40%), where N indicates that the base in the 4th position is not unique. The dinucleoside monophosphates at the 3′-ends were isolated after micrococcal nuclease digestion of the limit product and identified as TpT(60%) and TpC (40%). We conclude that endonuclease R of H. influenzae recognizes the following specific nucleotide sequence: 5′ … pGpTpPy ¦pPupApCp … 3′ 3′ … pCpApPup ¦PypTpGp … 5′ The implications of the twofold rotational symmetry of this sequence are discussed.

A-Level Biology: Glossary

A-Level Biology is a whole lot easier when you know the definitions of commonly used words.

A good command of biological terminology can mean the difference between a poor grade and an excellent one! (Since of course you get marks awarded for using the correct technical language in the right context when answering exam questions).

. So, make sure learning and revising these essential biological words and their definitions is an integral part of your A-Level Biology Revision.

Acetyl: Chemical group derived from acetic acid . Acetyl groups are important in metabolism and are added covalently to some proteins as a post- translational modification.

Acetyl CoA: Acetyl group linked to coenzyme A (CoA). Acetyl CoA is a small water-soluble molecule that carries acetyl groups in cells.

Acetylcholine: A neurotransmitter found in the brain and in the peripheral nervous system.

Acetylcholine receptor: An Ion channel that opens in response to the binding of acetylcholine , resulting in the conversion of a chemical signal into an electrical one.

Acid hydrolase: Hydrolytic enzymes (e.g. proteases , nucleases , glycosidases ) with optimal enzyme activity at approximately pH 5.0 (acidic pH). Acid hydrolase enzymes and are found in lysosomes .

Acquired Immunological Tolerance: An unresponsiveness of the immune system to a given foreign antigen .

Acrosomal vesicle: The region at the head end of a sperm cell . The acrosome contains hydrolytic enzymes which digest the protective coating of the egg.

Actin: Actin is an abundant protein which forms actin filaments in all eukaryotic cells .

Actin filament: A helical protein filament. A major part of the cytoskeleton of all eukaryotic cells and part of the contractile proteins of skeletal muscle .

Actin-binding protein: Myosin is an example of an actin-binding protein a protein that associates with either actin monomers or actin filaments in cells and modifies their properties.

Action potential ( Nerve impulse ): The rapid, short-lived, self-propagating electrical excitation in the plasma membrane of neurones and muscle cells . Action potentials, are responsible for long distance signalling in the nervous system .

Activation energy: The 'extra' energy required in order to undergo a particular chemical reaction.

Active site: The active site is the part of an enzyme to which a substrate binds in order to carry out a catalysed reaction.

Active Transport: The energy driven movement of a molecule across a cell membrane.

Adaptive Immune Response: Response of the immune system to a specific antigen t hat generates immunological memory .

Adenosine Triphosphate : see ATP

ADP (Adenosine Diphosphate): A nucleotide that regenerates ATP when phosphorylated by an energy-generating process, i.e. oxidative phosphorylatio n.

Adrenaline (epinephrine): A hormone released by cells in the adrenal glands (and some neurones) in response to stress. Adrenaline brings about the "fight or flight&rdquo response, e.g. increased heart rate and blood sugar levels.

Aerobic : A process that requires oxygen (O2), or occurs in the presence of, gaseous oxygen (O2).

Algae : Unicellular and multicellular eukaryotes which are photosynthetic organisms, e.g. Nitella and Volvox.

Alkane: A compound of carbon and hydrogen that has only single covalent bonds, e.g. ethane.

Alkene: A hydrocarbon with one or more carbon-carbon double bonds, e.g. ethylene.

Allele : An alternative form of a gene, e.g. fur colour gene brown or black allele. In a diploid cell each gene will have two alleles , each occupying the same position ( locus ) on homologous chromosomes .

Alpha helix (&alpha helix): a helical folding pattern in the secondary structure of proteins where a linear sequence of amino acids folds/coils into a right-handed helix stabilised by hydrogen bonding.

Amino acid: Organic molecule containing both an amino group and a carboxyl group . Amino acids are the monomers of proteins .

Anaerobic Threshold: when tissues can not obtain enough oxygen, the cells begin to respire anaerobically. Usually applies to exercise e.g. &ldquothe runner increased her speed so that she crossed the anaerobic threshold, and began to accumulate lactate&rdquo

Anticodon : Three bases complementary to a codon. Found on tRNA, allowing it to attach to mRNA. The anticodon for GUG would be CAC.

Aorta : the main artery from the left ventricle. Has highest pressure of any blood vessel.

Aortic Body: a patch of chemoreceptors (cells sensitive to changes in CO2/ H+ levels in blood). Located on aortic arch just above heart. Sends sensory impulses to medulla. See also carotid body

Apoplast Pathway: route of water and ions from soil water to xylem through the plant cell walls and gaps in between cells. This route is non living, and water/ions never cross a membrane so the plan has no control over the rate of flow. Contrast with Symplast pathway

Arable farming: Growing crops

Arteriole: blood vessels with a particularly muscular wall. Can vasodilate (widen) or vasoconstrict (narrow) to control the blood flow to a particular area.

Artery: blood vessels whose walls have a lot of elastic fibres, so they have good recoil properties to withstand the surges of pressure created when the heart beats. Contrast with arteriole.

Attenuated: Having reduced virulence live (weakened), non-pathogenic organisms used to activate adaptive immunity. The non-pathogenic organisms administered to the vaccinated individual stimulate lymphocytes in order to produce antibodies and activated T cells , without producing serious disease.

ATP: Adenosine Tri-Phosphate the cell&rsquos immediate source of energy. In respiration, the energy in glucose (or lipid) is released and used to make ATP from ADP and P. One glucose molecule can yield enough energy to make up to 36 molecules of ATP, which can then be used to provide the energy for muscular contraction, active transport, protein synthesis etc.

Atrial Systole : Contraction of the atria.

Atrium : Upper heart chambers. Receive blood into heart from vena cava and pulmonary vein. Atria are basically loading chambers for the ventricles, so they don&rsquot need thick muscular walls.

Autosome: Normal chromosome - not a sex chromosome. Humans have 22 pairs.

AV Valve : Valves between the atria and ventricles of the heart, preventing backflow into atria. Old names bicuspid and tricuspid valves.

AVN: Atrio-ventricular Node. Vital part of the conducting pathway in the heart. Receives impulses from SAN, creates a delay to allow the ventricles to fill, and then transfer and impulses down the bundles of His.


Staphylococcus aureus displays a clonal population structure in which horizontal gene transfer between different lineages is extremely rare. This is due, in part, to the presence of a Type I DNA restriction–modification (RM) system given the generic name of Sau1, which maintains different patterns of methylation on specific target sequences on the genomes of different lineages. We have determined the target sequences recognized by the Sau1 Type I RM systems present in a wide range of the most prevalent S. aureus lineages and assigned the sequences recognized to particular target recognition domains within the RM enzymes. We used a range of biochemical assays on purified enzymes and single molecule real-time sequencing on genomic DNA to determine these target sequences and their patterns of methylation. Knowledge of the main target sequences for Sau1 will facilitate the synthesis of new vectors for transformation of the most prevalent lineages of this ‘untransformable’ bacterium.


Influence of viral genome type on RS avoidance

R-M systems generally only target double-stranded DNA molecules. However, we also analyzed compositional bias values for RNA and single-stranded DNA bacteriophages. In Table 3 we present only the data for palindromic Type II RS in prokaryotic (Experimental dataset and Control dataset 1) and eukaryotic (Control dataset 2) viruses with different types of genomes. The analogous tables for the other Types of R-M systems and for asymmetric Type II RS are presented in Additional file 6. These tables contain some values that could indicate significant site under-representation. However, they either share the trends of Table 3 (see below) or appear to be independent of the presence of R-M systems.

The fractions of RS with a reduced observed frequency (CB < 1) differ significantly among all three datasets (one experimental and two controls) in the case of dsDNA and ssDNA viruses, but this is not in the case for dsRNA and ssRNA viruses (Table 3, Fig. 1). This clearly indicates the existence of a selective pressure on RS in the genomes of DNA bacteriophages but not in the genomes of RNA phages. The significant difference between the fractions with CB < 1 and CB > 1 in Control dataset 1 cannot be completely explained by the inclusion of the Experimental dataset. This could mean the presence of a large number of bacteriophages that meet the corresponding R-M systems, but we have no data regarding these interactions. The slight deviation of the eukaryotic viral control fractions from 50% could be caused by some functional role of 4–6 bp palindromic sites unrelated to the activity of R-M systems.

Percentages of sites with reduced numbers of occurrences in the viral genomes of different types. Blue circles are for the Experimental dataset, red squares are for Control dataset 1 (prokaryotic viral control), and gray diamonds are for Control dataset 2 (eukaryotic viral control)

The observed avoidance in case of ssDNA bacteriophages could reflect R-M system influence during the double-stranded stage of their lifecycle. It should be noted that the fractions of RS with CB < 1 reliably differ between dsDNA and ssDNA phages. Only dsDNA phages were selected for the subsequent analysis because of insufficient data for ssDNA phages in the majority of the studied cases.

REase type effect on RS avoidance

We separately analyzed Type I, Type IIG, orthodox Type II, Type IIM, Type III, and Type IV REases. For this purpose, we split our datasets by REase type. The substantially increased fractions of RS with reduced counts (CB < 1) were observed only for two REase types: orthodox Type II and Type IV (see Fig. 2). Note that, in case of Type IV, this is true only for CB < 1 but not for under-representation (CB < 0.8, see the next paragraph).

Percentages of RS with reduced numbers of occurrences calculated for the different types of R-M systems. Designations are the same as in Fig. 1

We calculated histograms of CB values for the subsets of the experimental and control datasets for different R-M system types (Fig. 3). The histograms demonstrate that only the Type II subset is significantly enriched for under-represented sites. Namely, there are 27.2% under-represented sites of orthodox Type II R-M systems and fewer than 5% of such sites for other types of R-M systems. The fraction of eliminated (CB < 0.1) RS is 7.1% for the Type II R-M systems and less than 0.5% for the R-M systems of the other types.

Histograms of compositional bias values for different types of R-M systems. Dotted blue lines correspond to the subsets of the Experimental dataset, solid red lines are for Control dataset 1, and gray solid lines are for Control dataset 2

Our experimental dataset include a significant fraction of predicted RS which were not experimentally verified. Erroneous annotations of RS may result in decreased fractions of under-represented RS. However, the fractions of eliminated and under-represented sites for proved RS are similar to the fractions for predicted RS. There are 34.5 and 3.3% under-represented sites among proved RS of Type II and the other types, correspondingly. The analogous fractions of eliminated sites among proved RS are 7.1 and 0.3%, respectively.

In the cases of Type I and Type IIG RS, there is no significant difference between the distributions of CB values for the experimental and control datasets (Figs. 2 and 3). Thus, there is likely no selective pressure against Type I and IIG sites in the phage genomes, or at least such pressure is not strong and stable enough to induce RS avoidance.

For Type III RS, the difference between the experimental and control distributions is almost undistinguished on Figs. 2 and 3. However, the fractions of Type III sites with CB < 1 and CB < 0.8 significantly differ between the experimental and each of the control datasets (p-value< 0.008 in all cases, Fisher’s exact test). This is due to a minor fraction (approximately 5%) of Type III sites that are under-represented in the phage genomes. Nevertheless, RS avoidance does not seem to be a considerable anti-restriction strategy in the case of Type III R-M systems. Note that the fractions of over-represented Type III sites also significantly differ in the experimental and the control datasets.

The histogram for the Type IIM subset of the Experimental dataset (Fig. 3) demonstrates that the fraction of sites with CB > 1.1 are slightly, but significantly, larger than the corresponding fractions of both control datasets (p-value< 1e-10 in each case, Fisher’s exact test). The list of Type IIM RS includes nine palindromic and five asymmetric sites. The percentage of (genome, asymmetric site) pairs in the Type IIM fraction of the Experimental dataset is only 6.0%, and thus the asymmetric sites cannot significantly affect the observed histogram. All Type IIM palindromic sites are also recognition sites of some orthodox Type II R-M systems. The fractions of genomes with CB < 1 vary significantly for different sites: 100% for GATC, 88.5% for GCNNGC, 66.3% for GCWGC, and only 22.5% for GCNGC (five other known Type IIM RS are not present in our Experimental dataset). The site GCNGC exhibits pairs that form a bulge at the CB

1.1 area of the histogram (Fig. 3). Such an overabundance of GCNGC sites in the phage genomes seems to be unrelated to R-M systems because the fractions with CB > 1 for GCNGC are also large in both controls (60.7% in Control dataset 1 and 93.6% in Control dataset 2).

The majority (91.4%) of Type IV sites have CB < 1, but for the most of them CB is close to 1 (see Fig. 3). There are only two different Type IV RS that meet our criteria, SCNGS and YCGR. Both sites are rather small, GC-rich and have strong intersections with the common Type II sites CCNGG and CCGG, respectively. The control dataset of eukaryotic viruses, unlike the control dataset of bacteriophages, contains almost identical fractions of Type IV sites that are more frequent and less frequent than expected. This suggests that the observed difference might be related to R-M system activity.

Avoidance of RS in genomes of phages with different lifestyles

We compared histograms of CB values for orthodox Type II RS in genomes of temperate and non-temperate DNA bacteriophages (Fig. 4a). Fractions of sites with reduced (CB < 1) and significantly reduced (CB < 0.8) numbers of occurrences are similar for temperate and non-temperate phages. However, temperate phages have a notably lower fraction of RS that are eliminated from their genomes (CB < 0.1). Such differences may be because of the prophage period, during which an extreme bias of oligonucleotide composition does not give any selective advantage.

Fractions of Type II sites with different CB values. Bar height corresponds to the fraction of sites with the reduced number (CB < 1), the colored portion is for sites with CB < 0.8, and the hatched portion indicates the fraction with CB < 0.1. “Control” stands for the subset of Control dataset I with Type II sites and dsDNA genomes. a Comparison of temperate and non-temperate dsDNA viruses. b Comparison of the coliphages with or without the hydroxymethylase (HM) gene

Phages with anti-restriction mechanisms

We estimated how other anti-restriction mechanisms could influence RS avoidance. We used coliphages of the Myoviridae family as a convenient example. Some phages of the family encode an anti-restriction enzyme (DNA hydroxymethylase), which modifies the genomic DNA and thus prevents DNA cleavage by REases. Other related phages do not encode this enzyme [29]. Figure 4b shows that five phages with such an anti-restriction mechanism avoid (CB < 0.8) only 2.4% of actual E. coli RS, while 55.8% of such sites are avoided in genomes of four coliphages of the same family that do not encode the DNA hydroxymethylase.


Type II restriction endonucleases recognize specific DNA sequences and cleave DNA at specified locations within or adjacent to their recognition sites, in reactions that normally require only Mg 2+ as a cofactor ( 1 – 3 ). In many cases, the recognition sequence is a symmetrical palindrome of 4–8 consecutive base pairs, though some recognize discontinuous palindromes, interrupted by a segment of specified length but unspecified sequence ( 4 , 5 ). Many of the enzymes that act at palindromic sites are dimers that interact symmetrically with their targets, positioning one active site on each strand of the DNA: e.g. EcoRI, EcoRV and BamHI ( 6 ). They normally cut both strands within the lifetime of the enzyme–DNA complex ( 7 – 9 ). The phosphodiester bonds cleaved by these enzymes are usually located within the sequence: in some cases, near the 5′ ends, to leave duplexes with 5′ single-strand extensions in others, at the middle of the site, to leave flush-ended duplexes in further cases, near the 3′ ends. However, a subset of the Type II systems, called Type IIS ( 5 ), recognize asymmetric sequences and cleave both strands at specific locations on one side of the recognition site. Another subset, the Type IIB enzymes, also cut the DNA at specified positions distant from their recognition sites, but on both sides of the sequence.

The myriad applications of these enzymes in molecular biology ( 10 ) have driven extensive searches for new nucleases with novel specificities. At present, more than 3500 Type II enzymes have been identified from many different species of bacteria, and these encompass about 240 distinct recognition sequences ( 4 ). Hence, in many instances, two or more enzymes from different species have a common recognition site. In these instances, the first enzyme found to recognize a particular sequence is known as the prototype, while others that cleave the same sequence are called isoschizomers. Some isoschizomers cleave at different positions from the prototype and these are called neoschizomers ( 5 ). With the exceptions of certain isoschizomers, the Type II endonucleases show little similarity in amino acid sequence ( 11 ), apart from some residues that bind the Mg 2+ ions needed for catalysis ( 12 – 14 ). Nevertheless, restriction enzymes with dissimilar amino acid sequences often have similar core structures ( 3 , 6 ). Isoschizomers can, however, be virtually identical to each other, with >50% amino acid identity ( 15 , 16 ), but some isoschizomers lack any similarity in primary sequence ( 17 ). Neoschizomers are expected to deviate from the prototype, since they must position their active sites against the DNA differently ( 12 ), but this re-positioning need not involve radically different architectures. For example, EcoRV and BglI cleave DNA to leave flush-ended or 3′-extended fragments, respectively, yet the individual subunits in these two enzymes are very similar: they differ only at the subunit interface ( 18 ) and operate by similar mechanisms ( 19 ).

The restriction enzymes often taken to represent the Type II systems, such as EcoRV and BamHI, act at individual copies of their recognition sites ( 2 , 3 ). But many restriction enzymes become active only after interacting with two copies of the target sequence ( 2 , 20 , 21 ). The Type II enzymes that need two sites include some that cleave at a palindromic sequence ( 22 – 30 ), others (the Type IIS enzymes) that act adjacent to asymmetric sites ( 31 – 34 ), and still others (the Type IIB enzymes) on both sides of their targets ( 35 ). A number of the restriction enzymes that interact with two sites, the Type IIE systems, use one copy of the sequence to activate the enzyme to cleave a second copy: these enzymes are usually dimers but with two functionally distinct DNA-binding clefts one with the catalytic functions for DNA cleavage and one for the activator ( 22 , 25 ). After interacting with two sites, a Type IIE enzyme cuts just one of the sites ( 31 , 36 ). For another group of enzymes that interact with two sites, the Type IIF systems, the interaction leads to the concerted cleavage of the DNA at both sites ( 20 , 21 ). The Type IIF enzymes usually act as tetramers ( 37 ), with two identical DNA-binding clefts, each made of two subunits, but they seem to have virtually no activity until both clefts are filled with cognate DNA ( 38 ).

The Type I and the Type III restriction endonucleases need to interact with two copies of their recognition site for most—or all—of their DNA cleavage reactions ( 21 , 39 ), since these often require the collision of two enzyme molecules bound to separate sites after they have pushed the intervening DNA into expanding loops ( 39 ). The same applies to most enzymes that restrict methylated DNA ( 39 ). Amongst the Type II enzymes, the majority of the Type IIS enzymes ( 31 ), and almost all of the Type IIB enzymes (J. J. T. Marshall, D. M. Gowers and S. E. Halford, unpublished data), display full activity only after binding two copies of their recognition site. But it has yet to be established what fraction of the Type II enzymes with palindromic sites need two sites. To address this question, we examined seven Type II endonucleases that all cleave DNA at one palindromic sequence, 5′-GGCGCC-3′. The reason for selecting these enzymes is that this particular sequence is cleaved at a greater variety of positions than any other hexanucleotide site for the Type II enzymes ( 4 ): the cleavage points in this sequence include four of the five internal phosphodiester bonds. The seven enzymes were tested against DNA substrates with one or two copies of this sequence, to determine whether they need to bind to two sites and, if so, how many phosphodiester bonds are cleaved per turnover.

Understanding the immutability of restriction enzymes: crystal structure of BglII and its DNA substrate at 1.5 Å resolution

Restriction endonucleases are remarkably resilient to alterations in their DNA binding specificity. To understand the basis of this immutability, we have determined the crystal structure of endonuclease BglII bound to its recognition sequence (AGATCT), at 1.5 Å resolution. We compare the structure of BglII to endonuclease BamHI, which recognizes a closely related DNA site (GGATCC). We show that both enzymes share a similar α/β core, but in BglII, the core is augmented by a β-sandwich domain that encircles the DNA to provide extra specificity. Remarkably, the DNA is contorted differently in the two structures, leading to different protein–DNA contacts for even the common base pairs. Furthermore, the BglII active site contains a glutamine in place of the glutamate at the general base position in BamHI, and only a single metal is found coordinated to the putative nucleophilic water and the phosphate oxygens. This surprising diversity in structures shows that different strategies can be successful in achieving site-specific recognition and catalysis in restriction endonucleases.