Is it possible to determine what tissue a DNA sample came from based solely on its sequence?

All cells have the same genome and differ only by the expression pattern. Is it possible to determine the tissue origin of a cell based on its DNA sequence using short tandem repeat (STR) analysis or copy number variation (CNV) analysis?

The answer is in the question but I suppose you wanted to ensure you were not missing some info.


The DNA sequence is the same in all cells of a multicellular organism. Only the expression pattern varies among tissues. It is therefore impossible to tell from the DNA sequence alone what tissue a cell come from.

It would be feasible by looking at the transcriptome or the proteome or eventually by looking at epigenetic modifications.

Potential exception

Of course, as you talk about CNV, if you consider that the number of chromosome is part of the info present in a DNA sequence, then haploid cells (spermatozoids and ovules) are an exception and you could tell them apart. Similarly anucleated cells would be an exception as they contain no nuclear DNA.

Further potential exceptions are explained by @tsttst in his comment.

There was a paper in Science, August of this year (apologies to the author for not remembering their name). They worked up a protocol for single cell bisulfite sequencing and through the use of known marker genes managed to identify multiple neuronal sub-types based on non-CpG methylation across gene bodies.

Non-CpG methylation across gene bodies is seen often to be inversely correlated to gene expression, through this you can infer an expression profile of known marker genes giving you have a solid poke at a cell type with only a DNA sequence.

(Edit: Assuming this holds true in tissues outside of the cortex)

i would rather start with an assumption that no two cells in the human body have the same genome. Mutations occur frequently; most of them are repaired, but some remain unrepaired. Human body is a genetic mosaic.

Abyzov et al. (Genome Research, 2017):

We estimate that on average a fibroblast cell in children has 1035 mostly benign mosaic SNVs. (… ) These findings reveal a large degree of somatic mosaicism in healthy human tissues, link de novo and cancer mutations to somatic mosaicism, and couple somatic mosaicism with cell proliferation.



DNA sequencing is very big business. Approximately US$3 billion was spent in 2003 on sequencing reagents and enzymes, and on the analyzer equipment and software for automated sequence acquisition. The majority of this sequence output was determined using capillary electrophoresis (CE) technology, which has commensurately developed rapidly over the past 10 years. CE offers high resolution and high throughput, automatic operation, and data acquisition, with online detection of dyes bound to DNA extension products. Operational advances such as pulsed-field and graduated electric fields and automated thermal ramping programs as the run progresses result in higher base resolution and longer sequence reads. Advanced base-calling algorithms and DNA marker additives that utilize known fragment sizing landmarks can also help to improve fragment base-calling, increasing call accuracy and read lengths by 20–30%. Despite the high efficiency of CE sequencers, the complete delineation of the human genome and its implication for genome-wide analysis for personalized medicine is driving the development of devices and chemistries capable of massively increased sequence throughput, compared to the conventional CE sequencers. Miniaturization of CE onto chip-based devices provides all of the above facilities – a significant improvement in the speed and improved automation of analysis. New array-based sequencing devices also promise a quantum increase in efficiency. Each of these new devices provides an extremely high throughput, high-quality-data, and low-process costs. This article also examines the automation and improvement of sequencing processes, DNA amplification processes, and alternative approaches to sequencing.

The risk of contamination of any crime scene can be reduced by limiting incidental activity. It is important for all law enforcement personnel at the crime scene to make a conscious effort to refrain from smoking, eating, drinking, littering or any other actions which could compromise the crime scene. Because DNA evidence is more sensitive than other types of evidence, law enforcement personnel should be especially aware of their actions at the scene to prevent inadvertent contamination of evidence.

The chain of custody of evidence is a record of individuals who have had physical possession of the evidence. Documentation is critical to maintaining the integrity of the chain of custody. Maintaining the chain of custody is vital for any type of evidence. In addition, if laboratory analysis reveals that DNA evidence was contaminated, it may be necessary to identify persons who have handled that evidence.

In processing the evidence, the fewer people handling the evidence, the better. There is less chance of contamination and a shorter chain of custody for court admissibility hearings.

Materials and methods


All moth specimens belong to the species Euxoa messoria. They were collected over a 45-year period (Table 1) and were preserved pinned with no additional preservative. Specimens of three different frog species (Table 2) were collected as part of ongoing research unrelated to this study and preserved using standard methods (e.g., Lit. [25]). Frogs were killed using an aqueous solution of chloretone and, for adult frogs, a sample of liver tissue was preserved in 95% ethanol. Adult specimens were then fixed in 3.7% neutral-buffered formaldehyde overnight and then transferred to 70% ethanol for long-term storage. In large specimens (e.g., Astylosternus), a small volume (

1 milliliter) was injected into the body cavity during fixation. For tadpoles, a small piece of tail (mostly muscle) was excised and stored in 95% ethanol following common practice, the remaining specimen was fixed and stored in 3.7% formaldehyde. Animal care procedures are approved by the Harvard University/Faculty of Arts and Sciences Standing Committee on the use of Animals in Research and Teaching. An Animal Welfare Assurance statement is on file with the university's Office for Laboratory Welfare (OLAW).

After returning from the field, tissue samples in 95% ethanol were stored at -80°C. For this study, another piece of the same tissue (i.e., liver or tail) was excised from the whole preserved specimens these tissue samples were transferred to 95% ethanol. To qualitatively evaluate the effect of storage time and reduce the effect of species or developmental stage on our results, we analyzed tissues from adults collected over two different years, as well as tadpoles of the same species. Those samples that were stored or fixed using formaldehyde will be referred to as exposed to formaldehyde.

DNA extraction

Small aliquots of frog tissue (1–3 mg) were obtained from the preserved specimens in March 2007. The tissue was lysed and DNA was purified using the DNeasy kit (Qiagen) following the manufacturer's protocol. Extracted DNA was stored in TE buffer at 4°C.

A leg from each moth specimen was used for DNA extraction, using the NucleoSpin96 kit (Macherey-Nagel). Elution was performed with 40 μl water. The eluate was stored at -20°C.

Fragment analysis by capillary electrophoresis

An aliquot of 1–5 μl of extracted DNA was labeled with Fluorescein-12-ddATP (PerkinElmer, Boston, MA) using Terminal Transferase (NEB, Ipswich, MA) according to the accompanying protocol, resulting in a 10 μL reaction volume. The reaction was incubated at 37°C for 1 h, then applied to a Centri-Sep column (Princeton Separations) [26].

For the removal of terminal phosphates on the DNA fragments, aliquots of 3 μl DNA were treated with 5U Antarctic Phosphatase (NEB) in a total reaction volume of 10 μl. The reaction was incubated at 37°C for 1 h, followed by inactivation of the phosphatase at 65°C for 5 min. This was followed by labeling with TdT as described above.

An aliquot of 1–2 μl of the eluate was mixed with 9 μl Hi-Di (Applied Biosystems) and 0.5 ml GENESCAN LIZ1200 size standard (Applied Biosystems). Samples were analyzed on a 3130xl Genetic Analyzer (Applied Biosystems), using a 36 cm array, POP7 polymer, an injection time of 10 s and a total run time of 6200 s. An example of the raw data used for fragment size determination is shown in Figure 1c for moth sample 3.

Panel A shows the size distribution of DNA extracted from moth samples. See methods for details of size determination. Panel B shows the raw data of the FAM-labeled DNA fragments, averaged for each year. Data were scaled to the same height for comparison. Note the decrease in peak width with sample age. Panel C shows the raw data obtained for moth sample 3 from a Capillary Electrophoresis run. Labeling the DNA without any prior treatment results in the fragment distribution shown here in red. An aliquot of the same sample was treated with Antarctic Phosphatase before the TdT labeling reaction, shown in blue. The size distribution of the fragments does not change, while the intensity is increased by a factor of 2–15 for different samples. The LIZ1200 size standard is shown in orange, numbers indicate the fragment size in bases.

Raw data were imported into Origin7.5 (Microcal) for detailed analysis. For the determination of the most abundant fragment size of a sample, the data curve for the FAM fluorescence was subjected to smoothing, using the adjacent average method over 500 points. The smoothed curve was fitted to a peak function, equation 1, to determine the position of the maximum (in scan numbers).

w: width, xc: center, y 0 : offset, A: Amplitude

To convert this into base pairs, the elution times of the size standard fragments (in scan numbers) were plotted against the known size of each fragment of the LIZ1200 standard and fitted to a sigmoidal growth curve, equation 2.

A 1 : initial value, A 2 : final value, x 0 : center, dx: time constant

The fitting result for the size standard together with the peak of the FAM fluorescence were used to determine the most abundant size of DNA fragments in a given sample. For the distribution of fragment sizes, the peak width (full width at half height) was used, as determined from the fit of equation 1.

For quantitation of total DNA content, a baseline was fitted to the total FAM signal, the signal was then integrated using this baseline. As a test for the linearity of detection in our CE, we used the Φ X174 DNA ladder (NEB) in a serial dilution. We found a linear correlation between data integral and sample concentration in the range of 2–20 ng/μl (R = 0.997, data not shown).

DNA digestion

Extracted DNA was digested using a published method [27] with modifications. Aliquots of 1–10 μl of extracted DNA were incubated with 1 μl DNase I (2U/μl, NEB), 10 μl Snake Venom Phosphodiesterase (0.26 mU/μl, Sigma-Aldrich) and 2 μl Antarctic Phosphatase (5U/μl, NEB) at 37°C overnight. Using this procedure, unmodified DNA was completely digested to the mononucleoside level as judged by HPLC (data not shown).

HPLC separation

Digested DNA samples were analyzed on an Agilent 1100 HPLC system equipped with a Develosil RP-Aqueous C30 column (Nomura Chemical Co.). Solvent A was MilliQ water containing 1% (v/v) formic acid and solvent B was gradient grade methanol containing 0.25% (v/v) formic acid. An elution profile was used of 2–20% B over 30 min increasing to 98% over another 20 min then 98% B for 10 min and finally returning to 2% B over 20 min. The flow rate was set to 20 μl/min and the eluate monitored at 254 nm. Typically, 4 μl of each sample were injected using the well-plate sampler.

Mass spectrometric analysis

For mass spectrometric analysis the HPLC system described above was connected directly to the sample inlet of an Agilent ESI-TOF mass spectrometer. Mass spectral data were recorded in positive ion mode over the entire duration of the HPLC run. Data were analyzed using Analyst QS (Agilent).

Pulsed field agarose gel electrophoresis

For the detection of large DNA fragments, aliquots of the frog DNA were loaded on a 1% agarose gel and separated over 15 h with a switch time from 1–12 s and a voltage of 6 V/cm. The marker was PFG marker N0350 (NEB).

A 500 bp piece of the Euxoa messoa barcode sequence was amplified using primers pJZ-moth1-se TTAGGTAATCCAGGATCTTTAATTG and pJZ-moth1-as ATGATAATAATAATAAAAATGCAGT. Amplification was performed with Taq DNA polymerase (NEB), with an initial denaturation at 95°C for 2 min., then 30 cycles of 95°C for 15 s, 55°C for 10 s, 72°C for 30 s, and a final extension at 72°C for 5 min.

Primer sequences for the PCR of frog mitochondrial 16S ribosomal RNA correspond to those of Darst and Cannatella [28]. The primers for the first exon of the nuclear gene for rhodopsin are ACGGAACAGAAGGTCCCAAC (5' primer) and AGCGAAGAAGCCTTCAAAGT (3' primer). PCR reactions were carried out with Phusion DNA polymerase (NEB), with initial denaturation at 98°C for 30 s, then 30 cycles of 98°C for 10 s, 60°C for 10 s, 72°C for 45 s, and a final extension at 72°C for 10 min.

Modeling of DNA nicking

An algorithm was written in C to simulate fragmentation of double-stranded DNA by repeated nicking events. The simulation required four input parameters: simulated time period length (t) in years, DNA size (L) in megabases, nick rate (n) in nicks per megabase per day, and proximity of opposite strand nicks (p) that result in a double-stranded break given in bases. The program initiates the C library random number generator function so that repeated calls to the generator will return uniformly distributed random integers between 1 and 2*L*10 6 . Random number r will represent a nick on the rth position of the forward strand if r <L*l0 6 , otherwise the program assigns the nick at position rc = r - L*l0 6 on the reverse strand. The imaginary sequence is "nicked" n*L*365*y times at positions indicated by the random numbers returned from consecutive calls to the random number generator. Next the program identifies where opposite-strand nicks occur within p bases, and records double-stranded breaks. Distances between consecutive breaks, measured on the forward strand of DNA, give fragment lengths. These are tabulated and reported in a size-sorted list. The simulation is run with different combinations of input parameters.

DNA Technology in Forensic Science (1992)

"DNA typing" is a catch-all term for a wide range of methods for studying genetic variations. Each method has its own advantages and limitations, and each is at a different state of technical development. Each DNA typing method involves three steps:

Laboratory analysis of samples to determine their genetic-marker types at multiple sites of potential variation.

Comparison of the genetic-marker types of the samples to determine whether the types match and thus whether the samples could have come from the same source.

If the types match, statistical analysis of the population frequency of the types to determine the probability that such a match might have been observed by chance in a comparison of samples from different persons.

Before any particular DNA typing method is used for forensic purposes, it is essential that precise and scientifically reliable procedures be established for performing all three steps. This chapter discusses the first two&mdashlaboratory analysis and pattern comparison&mdashand Chapter 3 focuses on statistical analysis.

There is no scientific dispute about the validity of the general principles underlying DNA typing: scientists agree that DNA varies substantially among humans, that variation can be detected in the laboratory, and that DNA comparison can provide a basis for distinguishing samples from different persons. However, a given DNA typing method might or might not be scientifically appropriate for forensic use. Before a method can be ac-

cepted as valid for forensic use, it must be rigorously characterized in both research and forensic settings to determine the circumstances under which it will and will not yield reliable results. It is meaningless to speak of the reliability of DNA typing in general&mdashi.e., without specifying a particular method. Some states have adopted vaguely worded statutes regarding admissibility of DNA typing results without specifying the methods intended to be covered. Such laws obviously were intended to cover only conventional RFLP analysis of single-locus probes on Southern blots&mdashthe only method in common use at the time of passage of the legislation. We trust that courts will recognize the limitations inherent in such statutes.

Forensic DNA analysis should be governed by the highest standards of scientific rigor in analysis and interpretation. Such high standards are appropriate for two reasons: the probative power of DNA typing can be so great that it can outweigh all other evidence in a trial and the procedures for DNA typing are complex, and judges and juries cannot properly weigh and evaluate conclusions based on differing standards of rigor.

The committee cannot provide comprehensive technical descriptions for DNA typing in this report: too many methods exist or are planned, and too many issues must be addressed in detail for each method. Instead, our main goal is to provide a general framework for the evaluation of any DNA typing method.


Scientific Foundations

The forensic use of DNA typing is an outgrowth of its medical diagnostic use&mdashanalysis of disease-causing genes based on comparison of a patient's DNA with that of family members to study inheritance patterns of genes or with reference standards to detect mutations. To understand the challenges involved in such technology transfer, it is instructive to compare forensic DNA typing with DNA diagnostics.

DNA diagnostics usually involves clean tissue samples from known sources. It can usually be repeated to resolve ambiguities. It involves comparison of discrete alternatives (e.g., which of two alleles did a child inherit from a parent?) and thus includes built-in consistency checks against artifacts. It requires no knowledge of the distribution of patterns in the general population.

Forensic DNA typing often involves samples that are degraded, contaminated, or from multiple unknown sources. It sometimes cannot be repeated, because there is too little sample. It often involves matching of samples from a wide range of alternatives present in the population and thus lacks built-in consistency checks. Except in cases where the DNA evidence

excludes a suspect, assessing the significance of a result requires statistical analysis of population frequencies.

Despite the challenges of forensic DNA typing, we believe that it is possible to develop reliable forensic DNA typing systems, provided that adequate scientific care is taken to define and characterize the methods. We outline below the principal issues that must be addressed for each DNA typing procedure.

Written Laboratory Protocol

An essential element of any clinical or forensic DNA typing method is a detailed written laboratory protocol. Such a protocol should not only specify steps and reagents, but also provide precise instructions for interpreting results, which is crucial for evaluating the reliability of a method. Moreover, the complete protocol should be made freely available so that it can be subjected to scientific scrutiny.

Procedure For Identifying Patterns

There must be an objective and quantitative procedure for identifying the pattern of a sample. Although the popular press sometimes likens DNA patterns to bar codes, laboratory results from most methods of DNA testing are not discrete data, but rather continuous data. Typically, such results consist of an image&mdashsuch as an autoradiogram, a photograph, spots on a strip, or the fluorometric tracings of a DNA sequence&mdashand the image must be quantitatively analyzed to determine the genotype or genotypes represented in the sample. Quantitation is especially important in forensic applications, because of the ever-present possibility of mixed samples.

Patterns must be identified separately and independently in suspect and evidence samples. It is not permissible to decide which features of an evidence sample to count and which to discount on the basis of a comparison with a suspect sample, because this can bias one's interpretation.

Procedure For Declaring a Match

When individual patterns of DNA in evidence sample and suspect sample have been identified, it is time to make comparisons to determine whether they match. Whether this step is easy or difficult depends on the resolving power of the system to distinguish alleles. Some DNA typing methods involve small collections of alleles that can be perfectly distinguished from one another&mdashe.g., a two-allele RFLP system based on a polymorphism at a single locus. Other methods involve large collections of similar alleles that are imperfectly distinguished from one another&mdashe.g., the hypervariable VNTR

systems in common forensic use, in which a single sample might yield somewhat different allele sizes on repeat measurements. 1 It is easy to determine whether two samples match in the former case (assuming that the patterns have been correctly identified), but the latter case requires a match criterion&mdashi.e., an objective and quantitative rule for deciding whether two samples match. For example, a match criterion for VNTR systems might declare a match between two samples if the restriction-fragment sizes lie within 3% of one another.

The match criterion must be based on the actual variability in measurement observed in appropriate test experiments conducted in each testing laboratory. The criterion must be objective, precise, and uniformly applied. If two samples lie outside the matching rule, they must be declared to be either ''inconclusive" or a "nonmatch." Considerable controversy arose in early cases over the use of subjective matching rules (e.g., comparison by eye) and the failure to adhere to a stated matching rule.

Identification of Potential Artifacts

All laboratory procedures are subject to potential artifacts, which can lead to incorrect interpretation if not recognized. Accordingly, each DNA typing method must be rigorously characterized with respect to the types of possible artifacts, the conditions under which they are likely to occur, the scientific controls for detecting their occurrence, and the steps to be taken when they occur, which can range from reinterpreting results to correcting for the presence of artifacts, repeating some portion of the experiment, or deciding that samples can be reliably used.

Regardless of the particular DNA typing method, artifacts can alter a pattern in three ways: Pattern A can be transformed into Pattern B, Pattern A can be transformed into Pattern A + B and Pattern A + B can be transformed into Pattern B. It is important to identify the circumstances under which each transformation can occur, because only then can controls and corrections be devised. For example, RFLP analysis is subject to such artifacts as band shifting, in which DNA samples migrate at different speeds and yield shifted patterns (A&rarrB), and incomplete digestion, in which the failure of a restriction enzyme to cleave at all restriction sites results in additional bands (A&rarrA + B).

Some potential problems can be identified on the basis of the chemistry of DNA and the mechanism of detection in the genetic-typing system. Anticipation of potential sources of DNA typing error allows systematic empirical investigation to determine whether a problem exists in practice. If so, the range of conditions in which an assay is subject to artifact must be characterized. In either case, the results of testing for artifacts should be documented. Empirical testing is necessary, whether one is considering a new method, a new locus, a new set of reagents (probe or enzyme) for a

pre-existing locus, or a new device. Under some circumstances, even small changes in procedure can change the pattern of artifacts.

Once potential artifacts have been identified, it is necessary to design scientific controls to serve as internal checks in each experiment to test whether the artifacts have occurred. Once the appropriate controls are identified, analysts must use them consistently when interpreting test results. If the appropriate control has not been performed, no result should be reported. When a control indicates irregularities in an experiment, the results in question must be considered inconclusive if possible, the experiment should be repeated. A well-designed DNA typing test should be a matter of standardized, objective analysis.

Sensitivity to Quantity, Mixture, and Contamination

Evidence samples might contain very little DNA, might contain a mixture of DNA from multiple sources, and might be contaminated with chemicals that can interfere with analysis. It is essential to understand the limits of each DNA typing method under such circumstances.

Experiential Foundation

Before a new DNA typing method can be used, it requires not only a solid scientific foundation, but also a solid base of experience in forensic application. Traditionally, forensic scientists have applied five steps to the implementation of genetic marker systems: 2 , 3

Gain familiarity with a system by using fresh samples.

Test marker survival in dried stains (e.g., bloodstains).

Test the system on simulated evidence samples that have been exposed to a variety of environmental conditions.

Establish basic competence in using the system through blind trials.

Test the system on nonprobative evidence samples whose origin is known, as a check on reliability.

When a technique is initially developed, all five steps should be carefully followed. As laboratories adopt the technique, it will not always be necessary for them to repeat all the steps, but they must demonstrate familiarity and competence by following steps 1, 4, and 5. 4

Most important, there is no substitute for rigorous external proficiency testing via blind trials. Such proficiency testing constitutes scientific confirmation that a laboratory's implementation of a method is valid not only in theory, but also in practice. No laboratory should let its results with a new DNA typing method be used in court, unless it has undergone such proficiency testing via blind trials. (See Chapter 4 for discussion of proficiency testing.)

Publication and Scientific Scrutiny

If a new DNA typing method (or a substantial variation on an existing one) is to be used in court, publication and scientific scrutiny are very important. Extensive empirical characterization must be undertaken. Results must be published in appropriate scientific journals. Publication is the mechanism that initiates the process of scientific confirmation and eventual acceptance or rejection of a method.

Some of the controversy concerning the forensic use of DNA typing can be traced to the failure to publish a detailed explanation and justification of methods. Without the benefit of open scientific scrutiny, some testing laboratories initially used methods (for such fundamental steps as identifying patterns, declaring matches, making comparison with a databank, and correcting for band shifting) that they later agreed were not experimentally supported. In some cases, those errors resulted in exclusion of DNA evidence or dismissal of charges.


Choice of Probes

A DNA probe used in forensic applications should have the following properties:

It should recognize a single human locus (or site), preferably one whose chromosomal location has been determined.

It should detect a constant number of bands per allele in most humans.

It should be characterized in the published literature, including its typical range of alleles, and its tendency to recognize DNA from other species.

It should be readily available for scientific study by any interested person.

The committee recommends against forensic use of multilocus probes, which detect many fragments per person. Because such probes might detect fragments with quite different intensities, it is difficult to know whether one has detected all fragments in a sample&mdashparticularly with small and degraded forensic samples&mdashand difficult to recognize artifacts and mixtures. Such problems increase the difficulty of pattern interpretation. Multilocus probes increase the risk of incorrect interpretation, and numerous single-locus probes, which do not pose such problems, are available. The use of enough single-locus probes gains the advantages of the single multilocus probes without the problems of interpretation.

Southern Blot Preparation

The basic protocol for preparing Southern blots is fairly standard, but testing laboratories vary in such matters as choice of restriction enzyme, gel length and composition, and electrophoresis conditions. Such differences do not fundamentally affect the reliability of the general method, but some enzymes might require characterization (e.g., each restriction enzyme must be characterized for sensitivity to inhibitors, for tendency to cut at anomalous recognition sites under some conditions&mdashoften called "star activity"&mdashand for tendency to produce partial digestions), and differences in gels and electrophoresis conditions will affect resolution of fragments and retention of small fragments.

Questions have arisen concerning the use of ethidium bromide, a fluorescent dye that binds to DNA and so allows it to be visualized. Some laboratories incorporate ethidium bromide into analytical gels before electrophoresis others stain gels with ethidium bromide after electrophoresis. The committee strongly recommends the latter, for two reasons:

Ethidium bromide binds to DNA in a concentration-dependent manner and has been shown to alter the mobility of fragments at high DNA concentrations, thus decreasing the reliability of fragment-size measurements.

Staining after electrophoresis requires smaller amounts of ethidium bromide, and that is preferable, because the dye is a known carcinogen and thus poses problems of exposure and disposal.

Because there are several advantages and no drawbacks to staining after electrophoresis, we conclude that there is no present justification for use of ethidium bromide in analytical gels.

Identification of DNA Patterns

Identification of the DNA pattern of each sample should be carried out very carefully. When analyzed with a single-locus probe, each lane will ideally show at the most fragments derived from two alleles and nothing else. However, complications can arise. To interpret such complications properly, an examiner requires considerable knowledge and skill and might need to examine control experiments.

Examination of a Control Pattern

Every Southern blot procedure should be applied to a known DNA sample (in addition to the evidence samples in question), to verify that the hybridization was performed correctly. If this control sample does not yield

a clean result that shows the correct pattern for a particular hybridization, the result of the test hybridization should be discounted.

Single-Band Patterns

Sometimes, only a single band will be detected when two distinct alleles are present. That might occur because the second allele is so small that it has migrated off the end of the gel, because the second allele is similar in size to the first allele and thus is not resolved, or because the second allele is much larger, and larger fragments are preferentially lost in partially degraded samples.

When only a single band is found, the interpretation should always include the possibility that a second band has been missed&mdashi.e., that the pattern is actually of a heterozygote, not a homozygote. (For statistical interpretation, the frequency of a single-band pattern should be taken to be the sum of the frequencies of all patterns containing this band. This is approximately twice the allele frequency of the band.) In some cases, it could be important to interpret the absence of a second larger fragment&mdashe.g., when two samples match in a smaller band, but the questioned sample lacks a second larger band. That could arise either because the samples are from different persons or because the samples come from the same person but the questioned sample is partially degraded. Ideally, to distinguish these alternatives, one should determine whether a second larger band could have been detected in the questioned sample by hybridizing the membrane with a single-copy probe that detects an even larger monomorphic fragment&mdashi.e., one that is constant in all humans. In contrast, it would not be sufficient simply to estimate the degree of degradation from the ethidium bromide staining pattern of the sample.

Anomalous Bands

A sample might show more than two bands for various reasons. E.g., the hybridization conditions were improper and caused the probe to hybridize to incorrect fragments the probe was contaminated with another sequence, which caused it to recognize other fragments the membrane was incompletely stripped after a previous use, so a pattern seen on the previous hybridization is still being detected the restriction digestion did not proceed to completion, so the region recognized by the probe is present in incompletely cut fragments of multiple sizes or the sample actually contains a mixture of multiple DNAs. The last example is extremely important to recognize, because it can bear importantly on a case. Whenever extra bands are observed, their origin should be determined.

The following clues provide a partial decision tree:

If the hybridization conditions were improper or the probe contaminated, the pattern in the control DNA should be seen to be incorrect the hybridization should be repeated.

If the membrane was improperly stripped, the extra bands will be in the same location as in the previous hybridization and might be present in the control sample the hybridization should be repeated.

If the restriction digestion was incomplete, one should see additional bands, even with the use of monomorphic probes that typically give only a single constant band. To ascribe extra bands to incomplete digestion, one should therefore perform such a hybridization. If incomplete digestion has occurred, the sample ideally should be re-extracted and redigested, and a new Southern blot should be prepared. If that is not possible, because there is too little sample, it will usually be difficult to get a reliable result.

If the samples are mixtures from more than one person, one should see additional bands for all or most polymorphic probes, but not for a single-copy monomorphic probe. Mixed samples can be very difficult to interpret, because the components can be present in different quantities and states of degradation. It is important to examine the results of multiple RFLPs, as a consistency check. Typically, it will be impossible to distinguish the individual genotypes of each contributor. If a suspect's pattern is found within the mixed pattern, the appropriate frequency to assign such a "match" is the sum of the frequencies of all genotypes that are contained within (i.e., that are a subset of) the mixed pattern.

Another possible cause of extra bands is leakage between adjacent sample lanes or misloading of two samples in a single lane. Such an occurrence can be exceedingly difficult to detect and could result in an incorrect conclusion. It is therefore important to leave a blank lane between a suspect sample and an evidence sample, so that leakage can be detected and will not lead to false-positive results.

Reporting of Anomalies

Examiners should document their interpretations of samples thoroughly in writing. They should note all observed bands and any questionable densities that they do not consider to be bands. Anomalous bands should be explained on the basis of appropriate control experiments of the sorts described above.

Measurement of Fragments

Molecular-weight measurements of fragments should initially be made by comparing band positions with known molecular-weight standards run in separate lanes on the same gel (so-called external molecular-weight stan-

dards). Measurements should be performed with a computer-assisted or computer-automated system, in which the operator identifies the positions of the bands with a digitizing pen or similar device that directly records them, visually inspects them, or both. Computer-based procedures ensure appropriate documentation of the measurement and promote objectivity.

External molecular-weight standards alone, however, are not sufficient, because anomalies in electrophoresis can lead to errors in RFLP typing caused by band shifting. 5 Such anomalies can be due to differences in salt or DNA concentrations among samples (which could be corrected by repeated extraction) or to covalent or noncovalent modifications of the DNA (which might be irreversible). Band shifting could cause two DNA samples from one person to show different patterns or DNA samples from two different persons to show the same pattern. Band shifting also makes it impossible to measure fragment sizes relative to external molecular-weight standards, because the standards have migrated at a different speed.

Band shifting is easy to detect by hybridizing the Southern blot with monomorphic probes&mdashthat is, probes that detect constant-length fragments that are always in the same position in all people. If several monomorphic fragments are in the same position in both lanes, it is safe to assume that no band shifting has occurred. If the monomorphic fragments are in different positions, band shifting is present. The committee considers it desirable for all samples to be tested for band shifting by hybridization with monomorphic probes that cover a wide range of fragment sizes in the gel. That approach will eliminate the rare production of a match by shifting of bands in an evidence sample to the same positions as in a suspect sample. Testing laboratories now investigate the possibility of band shifting only when they find two samples with patterns that appear to be similar but shifted relative to one another. (Multiple monomorphic probes might not be available for some systems and might need to be developed.)

Testing for band shifting is easy, but correcting it is harder. The best approach is to clean the samples (by re-extraction, dialysis, or other measures) and repeat the experiment in the hope of avoiding band shifting. When that is impossible because too little sample is available or it fails (perhaps because of covalent modification of the DNA), it is possible in principle to determine the molecular weights of polymorphic fragments in a sample by comparing them with monomorphic human bands in the same lane&mdashso-called internal molecular-weight standards. These monomorphic fragments are expected to have undergone the same band shift, so they should provide an accurate internal ruler for measurement. (Note that the polymorphic fragments and the internal molecular-weight standards are visualized on separate hybridizations, but can be superimposed on one another, if the external molecular-weight standards are used to align the gels.)

In practice, however, the use of internal standards presents serious dif-

ficulties. Accurate size determination requires a number of internal standards. If band shifting caused all fragments to change their mobility by the same percentage, one would need only a single monomorphic fragment to determine the extent of shift. But band shifting appears to be more complex than that. Different regions of the gel shift by different amounts.

Little has been published on the nature of band shifting, on the number of monomorphic internal control bands needed for reliable correction, and on the accuracy and reproducibility of measurements made with such correction. For the present, several laboratories have decided against attempting quantitative corrections samples that lie outside the match criterion because of apparent band shifting are declared to be "inconclusive." The committee urges further study of the problems associated with band shifting. Until testing laboratories have published adequate studies on the accuracy and reliability of such corrections, we recommend that they adopt the policy of declaring samples that show apparent band shifting to be "inconclusive." The committee recommends that all measurement data be made readily available, including the computer-based images and records. Any analytical software for image processing or molecular-weight determination should also be readily available. All fragment sizes for both known and questioned samples should be clearly listed on the formal report of the testing laboratory.

Match Criteria

Current RFLP-based tests use VNTR probes that have dozens of closely spaced alleles. On the one hand, the high degree of polymorphism increases the power of the test to detect differences among persons. On the other hand, the large number of alleles increases the complexity of matching samples, because gels have little ability to resolve nearby alleles (which can differ by as little as 9 basepairs, so that, for practical purposes, the distribution of alleles can appear to be continuous).

Because of the limited resolution, two samples from a single person will often lead to slightly different measurements&mdashe.g., 3.00 and 2.45 kilo-bases (kb) in one case, 3.03 and 2.40 kb in another. To decide whether two samples match, each laboratory must have a match criterion. 6 The match criterion should provide an objective and quantitative rule for deciding whether two patterns match&mdashe.g., all fragments must lie within 2% of one another. When samples fall outside the match criterion, they should be declared to be "inconclusive" or "nonmatching."

The match criterion must be based on reproducibility studies that show the actual degree of variability observed when multiple samples from the same person are separately prepared and analyzed under typical forensic

Materials and methods

Definition of DNAm age using a penalized regression model

Using the training data sets, I used a penalized regression model (implemented in the R package glmnet [45]) to regress a calibrated version of chronological age on 21,369 CpG probes that a) were present both on the Illumina 450K and 27K platform and b) had fewer than 10 missing values. The alpha parameter of glmnet was chosen to 0.5 (elastic net regression) and the lambda value was chosen using cross-validation on the training data (lambda = 0.0226). DNAm age was defined as predicted age. Mathematical details are provided in Additional file 2.

Short description of the healthy tissue data sets

All data are publicly available (Additional file 1). Many data sets involve normal adjacent tissue from TCGA. Details on the individual data sets can be found in Additional file 2. To give credit to the many researchers who generated the data, I briefly mention relevant citations. Data sets 1 and 2 (whole blood samples from a Dutch population) were generated by Roel Ophoff and colleagues [14]. Data set 3 (whole blood) consists of whole blood samples from a recent large scale study of healthy individuals [24]. The authors used these and other data to estimate human aging rates and developed a highly accurate predictor of age based on blood data. Data set 4 consists of leukocyte samples from healthy male children from Children’s Hospital Boston [46]. Data set 5 consists of peripheral blood leukocyte samples [47]. Data set 6 consists of cord blood samples from newborns [30]. Data set 7 consists of cerebellum samples, which were provided by C Liu and C Chen (Gene Expression Omnibus (GEO) identifier GSE38873). Data sets 8, 9, 10, and 13 consist of cerebellum, frontal cortex, pons, and temporal cortex samples, respectively, obtained from the same subjects [48]. Data set 11 consists of prefrontal cortex samples from healthy controls [22]. Data set 12 consists of neuron and glial cell samples from [49]. Data set 14 consists of normal breast tissue samples [50]. Data set 15 consists of buccal cells from 109 15-year-old adolescents from a longitudinal study of child development [51]. Data set 16 consists of buccal cells from eight different subjects [15]. Data set 17 consists of buccal cells from monozygotic (MZ) and dizygotic (DZ) twin pairs from the Peri/postnatal Epigenetic Twins Study (PETS) cohort [52]. Data set 18 consists of cartilage (chondrocyte) samples from [53]. Data set 19 normal consists of adjacent colon tissue from TCGA. Data set 20 consists of colon mucosa samples from [54]. Data set 21 consists of dermal fibroblast samples from [21]. Data set 22 consists of epidermis samples from [55]. Data set 23 consists of gastric tissue samples from [56]. Data set 24 consists of head/neck normal adjacent tissue samples from TCGA (HNSC data). Data set 25 consists of heart tissue samples from [57]. Data set 26 consists of normal adjacent renal papillary tissue from TCGA (KIRP data). Data sets 27 consists of normal adjacent tissue from TCGA (KIRC data). Data set 28 consists of normal adjacent liver samples from [58]. Data set 29 consists of normal adjacent lung tissue from TCGA (LUSC data). Data set 30 consists of normal adjacent lung tissue samples from TCGA (LUAD data). Data set 31 is from TCGA (LUSC). Data set 32 consists of mesenchymal stromal cells isolated from bone marrow [59]. Data set 33 consists of placenta samples from mothers of monozygotic and dizygotic twins [60]. Data set 34 consists of prostate samples from [61]. Data set 35 consists of normal adjacent prostate tissue from TCGA (PRAD data). Data set 36 consists of male saliva samples from [62]. Data set 37 consists of male saliva samples from [23]. Data set 38 consists of stomach from TCGA (STAD data). Data set 39 consists of thyroid TCGA (THCA data). Data set 40 consists of whole blood from type 1 diabetics [10, 63]. Data set 41 consists of whole blood from [15]. Data sets 42 and 43 consist of involve whole blood samples from women with ovarian cancer and healthy controls, respectively these are the samples from the United Kingdom Ovarian Cancer Population Study [10, 63]. Data set 44 consists of whole blood from [64]. Data set 45 consists of leukocytes from healthy children of the Simons Simple Collection [46]. Data set 46 consists of peripheral blood mononuclear cells from [65]. Data set 47 consists of peripheral blood mononuclear cells from [66]. Data set 48 consists of cord blood samples from newborns provided by N Turan and C Sapienza (GEO GSE36812). Data set 49 consists of cord blood mononuclear cells from [67]. Data set 50 consists of cord blood mononuclear cells from [60]. Data set 51 consists of CD4 T cells from infants [68]. Data set 52 consists of CD4+ T cells and CD14+ monocytes from [15]. Data set 53 consists of immortalized B cells and other cells from progeria, Werner syndrome patients, and controls [69]. Data sets 54 and 55 are brain samples from [70]. Data sets 56 and 57 consist of breast tissue from TCGA (27K and 450K platforms, respectively). Data set 58 consists of buccal cells from [71]. Data set 59 consists of colon from TCGA (COAD data). Data set 60 consists of fat (adipose) tissue from [72]. Data set 61 consists of human heart tissue from [27]. Data set 62 consists of kidney (normal adjacent) tissue from TCGA (KIRC). Data set 63 consists of liver (normal adjacent tissue) from TCGA (LIHC data). Data set 64 consists of lung from TCGA. Data set 65 consists of muscle tissue from [72]. Data set 66 consists of muscle tissue from [73]. Data set 67 consists of placenta samples from [74]. Data set 68 consists of female saliva samples [62]. Data set 69 consists of uterine cervix samples from [50, 75]. Data set 70 consists of uterine endometrium (normal adjacent) tissue from TCGA (UCEC data). Data set 71 consists of various human tissues from the ENCODE/HAIB Project (GEO GSE40700). Data set 72 consists of chimpanzee and human tissues from [27]. Data set 73 consists of great ape blood samples from [28]. Data set 74 consists of sperm samples from [76]. Data set 75 consists of sperm samples from [77]. Data set 76 consists of vascular endothelial cells from human umbilical cords from [60]. Data sets 77 and 78 (special cell types) involve human embryonic stem cells, iPS cells, and somatic cell samples measured on the Illumina 27K array and Illumina 450K array, respectively [78]. Data set 79 consists of reprogrammed mesenchymal stromal cells from human bone marrow (iP-MSC), initial mesenchymal stromal cells, and embryonic stem cells [79]. Data set 80 consists of human ES cells and normal primary tissue from [80]. Data set 81 consists of human ES cells from [81]. Data set 82 consists of blood cell type data from [82].

Description of the cancer data sets

An overview of the cancer tissue and cancer cell line data sets is provided in Additional file 12. More details can be found in Additional file 2.

All data are publicly available as can be seen from the column that reports GSE identifiers from the GEO database and other online resources. Most cancer data sets came from TCGA. Data set 3, GBM from [44] data set 4, breast cancer from [83] data set 5, breast cancer from [84] data set 6, breast cancer from [50] data set 10, colorectal cancer from [39] data set 23, prostate cancer from [61] data set 30, urothelial carcinoma from [85].

DNA methylation profiling and normalization steps

All of the public Illumina DNA data were generated by following the standard protocol of Illumina methylation assays, which quantifies DNA methylation levels by the β value. A detailed description of the pre-processing and data normalization steps is provided in Additional file 2.

Meta analysis for measuring pure age effects (irrespective of tissue type)

I used the metaAnalysis R function in the WGCNA R package [86] to measure pure age effects (Additional file 9) as detailed in Additional file 2.

Analysis of variance for measuring tissue variation

To measure tissue effects in the training data (Additional file 8), I used analysis of variance (ANOVA) to calculate an F statistic as follows. First, a multivariate regression model was used to regress each CpG (dependent variable) on age and tissue type. The analysis adjusted for age since the different data sets have very different mean ages (Additional file 1). Next, ANOVA based on the multivariate regression model was used to calculate an F statistic, F.tissueTraining, for measuring the tissue effect in the training data. This F statistic measures the tissue effect after adjusting for age in the training data sets. I did not translate the F statistic into a corresponding P-value since the latter turned out to be extremely significant for most CpGs. Additional file 8D shows that F.tissueTraining is highly correlated with an independent measure of tissue variance (defined using adult somatic tissues from data set 77).

Characterizing the CpGs using sequence properties

I studied occupancy counts for Polycomb-group target (PCGT) genes since they have an increased chance of becoming methylated with age compared to non-targets [10]. Toward this end, I used the occupancy counts of Suz12, Eed, and H3K27me3 published in [87]. To obtain the protein binding site occupancy throughout the entire nonrepeat portion of the human genome, Lee et al. [87] isolated DNA sequences bound to a particular protein of interest (for example, Polycomb-group protein SUZ12) by immunoprecipitating that protein (chromatin immunoprecipitation) and subsequently hybridizing the resulting fragments to a DNA microarray. More details on the chromatin state data from [29] can be found in Additional file 2.

For more information about genetic testing procedures:

The National Society of Genetic Counselors offers an overview of the genetic testing process.

A brief overview of how genetic testing is done is also available from The National Cancer Institute.

The Genetic Science Learning Center at the University of Utah provides an interactive animation of DNA extraction techniques.

Key Concepts and Summary

Functional groups are structural units within organic compounds that are defined by specific bonding arrangements between specific atoms. Organic chemist learn to correlate functional groups to the chemistry that they do. For example, any molecule that contains a carboxylic acid, will be able to do an acid-base reaction when mixed with a base. This allows the organic chemist to have a sense of the various chemistry a compound can or cannot do based on the functional groups that are present in the structure.


Make yourself a stack of small sized Qcards. On one side have the name of the functional group (e.g. alcohol) and on the other side have its structure (see Table 1). Make a complete set of all the functional groups you should know (see Table 1). You can even include some compounds like those below in exercise 1 – on one side of the card have the compound, on the other the names of the functional groups. Then use these Qcards to quiz yourself. This will help you recognize the functional groups in larger compounds.


1. Answer the following questions for each of these compounds.

a) Name the circled functional groups: A = ?, B =?, C = ?

b) What is the chemical formula of the compound?

c) How many lone pairs are there in the compound?

Phenylpropanolamine is a psychoactive drug which is used as a stimulant and decongestant in prescription and over-the-counter cough and cold medicines.

Triiodothyronine is a thyroid hormone which affects many physiological processes in the body, such as growth, metabolism and heart rate.

Aldosterone is a hormone which is involved in the function of the kidneys.

Ephedrine is a drug commonly used as a stimulant, decongestant and also as a concentration aid.

Clomifene is mainly used for ovarian stimulation in female infertility which is due to anomulation.

2. Among the five compounds listed in question 1, which would do an acid-base reaction when mixed with sodium hydroxide (which is a base).


a) A = Primary Amine, B = Secondary Alcohol, C = Arene

c) 3 lone pairs – one on the nitrogen and two on the oxygen


a) A = Carboxylic Acid, B = Ether, C = Primary Amine

c) 18 lone pairs – one on each nitrogen, two on each oxygen, and three on each iodine


a) A = Secondary Alcohol, B = Aldehyde, C = Ketone

c) 10 lone pairs – two on each oxygen

a) A = Secondary Amine, B = Secondary Alcohol, C = Arene

c) 3 lone pairs – one on the nitrogen and two on the oxygen

a) A = Arene, B = Ether, C = Tertiary Amine

c) 6 lone pairs – one on the nitrogen, two on the oxygen, and three on the chlorine

2. Any molecule that contains a carboxylic acid, will be able to do an acid-base reaction when mixed with a base. Therefore among the five compounds in question 1, the only one that has a carboxylic acid is Triiodothyronine.

Applications of DNA fingerprinting:

Using the DNA fingerprinting method, the biological identity of a person can be revealed. For validating one’s identity, there is no other better option than DNA fingerprinting.

Badly damaged dead bodies can be identified.

It is used to detect maternal cell contamination.

One of the major drawbacks of prenatal diagnosis is maternal cell contamination. The amniotic fluid or CVS sample contains the maternal DNA or maternal tissue, sometimes.

Contamination increases the chance of false-positive results, especially in the case of carrier identification.

Using VNTRs and STRs markers with PCR-gel electrophoresis, maternal cell contamination can be identified during pregnancy genetic testing.

The image represents the VNTR profile for a family to know the maternal cell contamination.

One of the most important applications of the present technique is in the crime scene investigation and criminal verification.

The sample is collected from the crime site which could be saliva, blood, hair follicle, or semen. DNA is extracted and analyzed against the suspect, using the two markers we explained above. By matching DNA band patterns criminal’s link to crime can be established.

Different countries have different criteria to use STRs. 13, 11 and 12 STRs are commonly used in criminal verification in the USA, UK, and India, respectively. As the number of STR markers increases the accuracy of the result also increases.

Identification of blood relatives:

No two individuals are genetically identical. To establish or to know blood relation between two unrelated individuals, the present method is adopted.

Besides these, the present method is often employed for checking graft rejection in case of organ transplantation. Known as HLA typing, different markers of HLA region genes are amplified and matched between donor and recipient.

The exact match score indicates graft acceptance. This means an organ can be transplanted.

Moreover, scientists use the present method to screen inherited & non-inherited disease, to find genetic abnormalities, to detect any genetic disorders or mutations, and to create phylogeny between various organisms.

Identification of linked traits

It is often possible to correlate, or link, an allele of a molecular marker with a particular disease or other trait of interest. One way to make this correlation is to obtain genomic DNA samples from hundreds of individuals with a particular disease, as well as samples from a control population of healthy individuals. The genotype of each individual is scored at hundreds or thousands of molecular marker loci (e.g. SNPs), to find alleles that are usually present in persons with the disease, but not in healthy subjects. The molecular marker is presumed to be tightly linked to the gene that causes the disease, although this protein-coding gene may itself be as yet unknown. The presence of a particular molecular polymorphism may therefore be used to diagnose a disease, or to advise an individual of susceptibility to a disease.

Molecular markers may also be used in a similar way in agriculture to track desired traits. For example, markers can be identified by screening both the traits and molecular marker genotypes of hundreds of individuals. Markers that are linked to desirable traits can then be used during breeding to select varieties with economically useful combinations of traits, even when the genes underlying the traits are not known.

Watch the video: Evrendeki En Küçük ve En Büyük Şey 1 Femtometreden 93 Milyar Işık Yılna (January 2022).