When to decide that two similar sequences are different genes or not?

Working on RNA-Seq data directed me to ask this question. In RNA-Seq jobs, after de novo assembly we have lots of transcripts with different rates of similarity (0 to almost 100 percent) that we sometimes don't know their loci.

So regardless of locus, on what reasons we differentiate two or more sequences?

How can we differentiate alleles of a gene from paralogous genes?

I hope my question is clear enough.

Comparative genomics

Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. [2] [3] The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. [3] In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. [2] [4] [5] The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. [6] Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences (sequences that share a common ancestry) in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics. [7]

Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. [2] [8] With the explosion in the number of genome projects due to the advancements in DNA sequencing technologies, particularly the next-generation sequencing methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study. [9] Comparative genomics has revealed high levels of similarity between closely related organisms, such as humans and chimpanzees, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast Saccharomyces cerevisiae. [4] It has also showed the extreme diversity of the gene composition in different evolutionary lineages. [8]

The Institute for Creation Research

With the advent of modern biotechnology, researchers have been able to determine the actual sequence of the roughly three billion bases of DNA (A,T,C,G) that make up the human genome. They have sequenced the genomes of many other types of creatures as well. Scientists have tried to use this new DNA data to find similarities in the DNA sequences of creatures that are supposedly related through evolutionary descent, but do genetic similarities provide evidence for evolution?

DNA Supports Distinct Kinds

In the June 2009 Acts & Facts, an article was published by the author that showed how this approach has been used in an attempt to demonstrate an evolutionary relationship between humans and chimpanzees. 1 The article showed that scientists incorporate a large amount of bias in their analyses in order to manipulate the data to support evolution, when in fact the DNA data support the obvious and distinctive categorization of life that is commonly observed in the fossil record and in existing life forms.

In reality, there is a clear demarcation between each created kind (humans, chimps, mice, chickens, dogs, etc.), and there is no blending together or observed transition from one kind of animal to another. All created kinds exhibit a certain amount of genetic variability within their grouping while still maintaining specific genetic boundaries. In other words, one kind does not change into another, either in the fossil record or in observations of living organisms.

Similar DNA Sequences

While the genome of each created kind is unique, many animal kinds share some specific types of genes that are generally similar in DNA sequence. When comparing DNA sequences between animal taxa, evolutionary scientists often hand-select the genes that are commonly shared and more similar (conserved), while giving less attention to categories of DNA sequence that are dissimilar. One result of this approach is that comparing the more conserved sequences allows the scientists to include more animal taxa in their analysis, giving a broader data set so they can propose a larger evolutionary tree.

Although these types of genes can be easily aligned and compared, the overall approach is biased towards evolution. It also avoids the majority of genes and sequences that would give a better understanding of DNA similarity concepts.

Tumor Suppressor Genes

As an example, there is a group of genes that not only have been used in evolutionary studies, but also have a significant impact on human health: the tumor suppressor genes. Aberrations within tumor suppressor genes can lead to cancer, thus it is important that their sequences remain unaltered. These genes tend to be very similar across many types of animals, making them ideal for comparative purposes. The close similarities of these genes between many animal taxa have led to their use by scientists in an attempt to prove evolution or common descent. 2 What is really going on with these types of similar genes and how can they be interpreted within a special creation model as opposed to a naturalistic framework?

In very general terms, tumor suppressor genes are key genomic features (blocks of genetic code) that help regulate the growth and division of animal cells. When these genes are functioning properly, they code for proteins that can prevent or inhibit the out-of-control cell proliferation that forms the basis for the growth of tumors. When tumor suppressor genes are inactivated due to a DNA mutation, cell growth and division are no longer kept in check, resulting in cancer.

There are three main types of tumor suppressor genes. One type signals cells to slow down and stop dividing. Another type of tumor suppressor gene produces a protein that is responsible for checking and fixing damage in DNA that can happen when cells divide and proliferate. A third is responsible for telling cells when to die in a process called apoptosis. Cell growth, proliferation, and controlled cell death are essential to the development and maintenance of all animal systems.

For example, human hands develop from an initial fan-shaped structure, where apoptosis (programmed cell death) removes cells between fingers, and cell growth and division build up the fingers. How these genes are regulated will vary with the organism. However, because the basic aspects of the cell cycle are generally similar in many animals, one would actually expect a high level of DNA sequence conservation (similarity) between the coding parts of the genes as well as the proteins they produce.

The Ultimate Genetic Programmer

Generally, the more common a cellular process is between organisms, the more similar its various components will be. Does this indicate random chance evolutionary processes, or could it be an example of the Creator&rsquos wise and efficient use and re-use of genetic code in different creatures to accomplish a common and basic cellular function?

Consider the computer world. Ask seasoned computer programmers how often they completely re-write long, complicated blocks of code when they already have what they need somewhere on file. When a long piece of previously-written code is needed and available, programmers will tailor it to fit in its new context, but they will usually not completely re-write it.

Of course, God is the ultimate programmer, and the genetic code He developed will produce the best possible protein needed for the system in which it works. If another organism has a similar physiology, one can expect many of the same genes to be present in its genome. There are a finite number of ways to accomplish the same task in cells. Thus, the genes that are used to accomplish that task will usually be quite similar, with minor key variations. These slight differences exist because the Creator has optimized the genes for that particular kind of creature and its biochemistry.

What the data really show is that high levels of efficiency and utility in genetic information seem to be a recurring theme in the study of genomes. In fact, with the limited number of genes in the human genome (about 25,000), over one million different protein variants are derived. 3 Although not the topic of this article, a single animal gene can code for a wide variety of different proteins through a variety of complicated regulatory mechanisms. When scientists discovered this phenomenon, it totally negated the one-gene/one-protein mentality that originally existed when DNA sequence first began to be studied. That is pretty efficient code usage, which has never been equaled by even the most complex computer programs devised by man.

Genetic Regulatory Elements

While evolutionists have focused on genes that code for proteins, work is just beginning on an equally essential and complicated class of DNA sequence called regulatory elements. These are DNA sequences that do not code for protein but are involved in the regulation of genes. While efficient code usage and re-usage is common among many genomes, what is important is not just the protein the gene generates, but how much, how often, how fast, and when and where in the body it is produced. This is where the gene regulatory process begins to get really complicated. These regulatory differences play a key role in defining what makes a certain kind of organism unique.

After the human genome sequence was obtained to a completion level satisfactory to the scientific community, a separate but heavily-funded and related effort was initiated called the ENCODE (ENCyclopedia of DNA Elements) project. 4 This involves ongoing research to determine the identity and characteristics of the regulatory elements in the human genome. At present, ENCODE has barely scratched the surface, but the results have revolutionized the concept of genetics by showing whole new levels of complexity and efficiency of code and gene activation.

The genetic picture that is beginning to emerge is one of incredible networked and regulatory complexity combined with an extremely high level of efficiency in code usage--certainly nothing that could have evolved on its own through chance random evolutionary processes. As is easily seen, trying to use common genes related to common processes as proof of evolution quickly falls apart in light of the bigger genomic picture. In fact, it really speaks of smart coding by the ultimate bio-systems programmer--God Himself.

  1. Tomkins, J. 2009. Human-Chimp Similarities: Common Ancestry or Flawed Research?Acts & Facts. 38 (6): 12-13.
  2. Jensen, L. J. et al. 2006. Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature. 443 (7111): 594-597. . Posted on July 2007, hosted by the Swiss Institute of Bioinformatics. . Posted on the National Human Genome Research Institute website at

* Dr. Tomkins is Research Associate at the Institute for Creation Research.

Cite this article: Tomkins, J. 2009. Common DNA Sequences: Evidence of Evolution or Efficient Design? Acts & Facts. 38 (8): 12-13.

Changes at the DNA Level

Point mutations are classified in molecular terms in Table 7-1, which shows the main types of DNA changes and their functional effects at the protein level.

Table 7-1

Gene Mutations at the Molecular Level.

At the DNA level, there are two main types of point mutational changes: base substitutions and base additions or deletions. Base substitutions are those mutations in which one base pair is replaced by another. Base substitutions again can be divided into two subtypes: transitions and transversions. To describe these subtypes, we consider how a mutation alters the sequence on one DNA strand (the complementary change will take place on the other strand.) A transition is the replacement of a base by the other base of the same chemical category (purine replaced by purine: either A to G or G to A pyrimidine replaced by pyrimidine: either C to T or T to C). A transversion is the opposite—the replacement of a base of one chemical category by a base of the other (pyrimidine replaced by purine: C to A, C to G, T to A, T to G purine replaced by pyrimidine: A to C, A to T, G to C, G to T). In describing the same changes at the double-stranded level of DNA, we must state both members of a base pair: an example of a transition would be G୼ →𠂊·T that of a transversion would be G୼ → T୺.

Addition or deletion mutations are actually of nucleotide pairs nevertheless, the convention is to call them base-pair additions or deletions. The simplest of these mutations are single-base-pair additions or single-base-pair deletions. There are examples in which mutations arise through simultaneous addition or deletion of multiple base pairs at once. As we shall see later in this chapter, mechanisms that selectively produce certain kinds of multiple-base-pair additions or deletions are the cause of certain human genetic diseases.

What are the functional consequences of these different types of point mutations? First, consider what happens when a mutation arises in a polypeptidecoding part of a gene. For single-base substitutions, there are several possible outcomes, which are direct consequences of two aspects of the genetic code: degeneracy of the code and the existence of translation termination codons.

Silent substitutions: the mutation changes one codon for an amino acid into another codon for that same amino acid.

Missense mutations: the codon for one amino acid is replaced by a codon for another amino acid.

Nonsense mutations: the codon for one amino acid is replaced by a translation termination (stop) codon.

Silent substitutions never alter the amino acid sequence of the polypeptide chain. The severity of the effect of missense and nonsense mutations on the polypeptide will differ on a case-by-case basis. For example, if a missense mutation causes the substitution of a chemically similar amino acid, referred to as a synonymous substitution, then it is likely that the alteration will have a less-severe effect on the protein’s structure and function. Alternatively, chemically different amino acid substitutions, called nonsynonymous substitutions, are more likely to produce severe changes in protein structure and function. Nonsense mutations will lead to the premature termination of translation. Thus, they have a considerable effect on protein function. Typically, unless they occur very close to the 3′ end of the open reading frame, so that only a partly functional truncated polypeptide is produced, nonsense mutations will produce completely inactive protein products.

Like nonsense mutations, single-base additions or deletions have consequences on polypeptide sequence that extend far beyond the site of the mutation itself. Because the sequence of mRNA is “read” by the translational apparatus in groups of three base pairs (codons), the addition or deletion of a single base pair of DNA will change the reading frame starting from the location of the addition or deletion and extending through to the carboxy terminal of the protein. Hence, these lesions are called frameshift mutations. These mutations cause the entire amino acid sequence translationally downstream of the mutant site to bear no relation to the original amino acid sequence. Thus, frameshift mutations typically exhibit complete loss of normal protein structure and function.

Now let’s turn to those mutations that occur in regulatory and other noncoding sequences. Those parts of a gene that are not protein coding contain a variety of crucial functional sites. At the DNA level, there are sites to which specific transcription-regulating proteins must bind. At the RNA level, there are also important functional sequences such as the ribosome-binding sites of bacterial mRNAs and the self-ligating sites for intron excision in eukaryote mRNAs.

The ramifications of mutations in parts of a gene other that the polypeptide-coding segments are much harder to predict. In general, the functional consequences of any point mutation (substitution or addition or deletion) in such a region depend on its location and on whether it disrupts a functional site. Mutations that disrupt these sites have the potential to change the expression pattern of a gene in terms of the amount of product expressed at a certain time or in response to certain environmental cues or in certain tissues. We shall see numerous additional examples of such target sites as we explore mechanisms of gene regulation later on (Chapters 14�). It is important to realize that such regulatory mutations will affect the amount of the protein product of a gene, but they will not alter the structure of the protein. Alternatively, some mutations might completely inactivate function (such as polymerase binding or intron excision) and be lethal.

It appears that genes also contain noncoding sequences that cannot be “point mutated” to produce detectable phenotypes. These sequences are interspersed with the mutable sites. These sequences are either functionally irrelevant or protected from mutational damage in some way.

New mutations are categorized as induced or spontaneous. Induced mutations are defined as those that arise after purposeful treatment with mutagens, environmental agents that are known to increase the rate of mutations. Spontaneous mutations are those that arise in the absence of known mutagen treatment. They account for the �kground rate” of mutation and are presumably the ultimate source of natural genetic variation that is seen in populations.

The frequency at which spontaneous mutations occur is low, generally in the range of one cell in 10 5 to 10 8 . Therefore, if a large number of mutants is required for genetic analysis, mutations must be induced. The induction of mutations is accomplished by treating cells with mutagens. The mutagens most commonly used are high-energy radiation or specific chemicals examples of these mutagens and their efficacy are given in Table 7-2 on the following page. The greater the dose of mutagen, the greater the number of mutations induced, as shown in Figure 7-1. Note that Figure 7-1 shows a linear dose response, which is often observed in the induction of point mutations. The molecular mechanisms whereby mutagens act will be covered in subsequent sections.

Table 7-2

Forward Mutation Frequencies Obtained with Various Mutagens in Neurospora.

Figure 7-1

Linear relation between X-ray dose to which Drosophila melanogaster were exposed and the percentage of mutations (mainly sex-linked recessive lethals).

Recognize that the distinction between induced and spontaneous is purely operational. If we are aware that an organism was mutagenized, then we infer that any mutations that arise after this mutagenesis were induced. However, this is not true in an absolute sense. The mechanisms that give rise to spontaneous mutations also are in action in this mutagenized organism. In reality, there will always be a subset of mutations recovered after mutagenesis that are independent of the action of the mutagen. The proportion of mutations that fall into this subset depends on how potent a mutagen is. The higher the rate of induced mutations, the lower the proportion of recovered mutations that are actually “spontaneous” in origin.

Induced and spontaneous mutations arise by generally different mechanisms, so they will be covered separately. After considering these mechanisms, we shall explore the subject of biological mutation repair. Without these repair mechanisms, the rate of mutation would be so high that cells would accumulate too many mutations to remain viable and capable of reproduction. Thus, the mutational events that do occur are those rare events that have somehow been overlooked or bypassed by the repair processes.

When to decide that two similar sequences are different genes or not? - Biology

NOVA scienceNOW: Bird Brains

Activity Summary
Students will compare the sequence of amino acids in a gene shared between humans and six other organisms and infer evolutionary relationships among the species.

Learning Objectives
Students will be able to:

explain that different organisms often have the same genes.

understand how scientists use genetic differences to infer evolutionary relationships.

relate how shared genes may be a result of shared evolutionary history.

provide evidence suggesting that living things share common ancestors.

Suggested Time
One class period

In the NOVA scienceNOW segment Bird Brains, students learn that organisms as diverse as mushrooms, fish, flies, and humans share a gene called FOXP2. This gene produces a type of protein called a transcription factor, which turns other genes "on" or "off." Transcription factors regulate many other genes, and because of this, they may affect multiple processes in different organisms. In animals, the FOXP2 gene is especially active during embryonic development in the brain, gut, heart, and lungs, but scientists are still unraveling which genes it regulates in each of these tissues.

As explained in the NOVA scienceNOW segment, FOXP2 also plays a role in the processes involved in human speech and birdsong: people with an altered form of the gene have difficulty with many aspects of speech, and birds whose FOXP2 activity is disrupted have trouble learning songs. Despite these and other observations, scientists still don't know which other genes FOXP2 regulates or what its function is in the numerous other species that share this gene with birds and humans. That FOXP2 is so widespread raises additional questions, not only about its role in other organisms, but also how the gene differs from one organism to the next.

All life on Earth arose from a single common ancestor, and our genes reflect this shared ancestry. As species differentiated over evolutionary time, the DNA sequences in their genes acquired slight changes. According to evolutionary theory, these changes accumulate over time: species that diverged from each other long ago have more differences in their DNA than species that diverged recently. Scientists use this degree of difference as a molecular clock to help them predict how long ago species split apart from one another. In general, scientists say the longer ago two species split, the more distantly related they are.

You may need to remind your students about the nature of DNA, genes, proteins, and amino acids and how they differ from one another. DNA is a molecule made up of four types of units called bases. The four bases—adenine (A), cytosine (C), guanine (G) and thymine (T)—collectively make up the DNA "alphabet." Genes are distinct locations along the length of a DNA molecule. The sequence of bases in a gene determines the order of amino acids in a protein, and the order of amino acids acts as the blueprint for protein assembly.

Because the DNA sequence determines a protein's amino acid sequence, a gene shared by two closely related organisms should have similar, or even identical, amino acid sequences. That's because closely related species most likely diverged from one another fairly recently in the evolutionary span. Thus, they haven't had as much time to accumulate random mutations in their genetic codes.

For years, scientists have used DNA and amino acid sequences to decipher relationships between closely related species, such as different types of reptiles, birds, and even bacteria. The approach, called "molecular phylogeny," compares sequence data and ranks organisms' degree of relatedness based on the differences in their DNA. As researchers sequence the genomes of an increasing number of organisms every year, they uncover more data to use in evolutionary studies. In the emerging field of phylogenomics, researchers simultaneously compare numerous genes—and will one day compare complete genomes—to build new evolutionary trees.

In this activity, your students will analyze a suite of amino acid sequences from a gene that makes the protein Cytochrome C. All eukaryotic organisms share this protein, which plays a central role in the energy-producing process of cellular respiration. Cytochrome C is an iron-containing molecule that carries electrons during the electron transport chain in cellular respiration. The protein is found in many lineages, including those of animals, plants, and numerous unicellular species. Its ubiquity makes it a convenient tool for studying evolution. By counting the number of amino acid differences between humans and six other species, your students will be able to make predictions about how closely related humans are to each species.

  • Bookmark the Web sites Bird Brains and Biology: Molecular Differences.
  • Prepare enough copies of the Predicting Evolutionary Relationships student handout so that each student will have one.
  • As a class, watch the NOVA scienceNOW segment Bird Brains.
  • If necessary, review the terms "DNA," "amino acid," "gene," and "protein" with the class.
  1. Lead a short brainstorm session about how scientists classify organisms. What criteria might scientists use to determine how closely related two species are? They might look for similarity in physical features, behavior, mode of reproduction, or genes.
  2. Introduce the concept of using molecular evidence, such as DNA or amino acid sequence data, to unravel evolutionary relationships between species (see background). You might point out that for some species, physical traits alone don't offer enough clues. For example, is a horse more closely related to a dog or to a buffalo? All three have fur and walk on four legs, but these clues don't tell you much about evolution. Optional: If possible, show the short animation Biology: Molecular Differences. Ask students what additional information DNA evidence provides scientists studying evolution.
  3. Divide the class into pairs and distribute the Predicting Evolutionary Relationships handout.
  4. Work through an example as a class.
    • Explain that each letter in the table Amino Acids in the Protein Cytochrome C represents an amino acid in the protein Cytochrome C. The key shows them which amino acid corresponds to each letter.
    • Call students' attention to the amino acid sequences for humans and tuna. Be sure students understand that because the sequence is too long to fit on one line of text, it wraps to a second line. Explain that they will look for the number of amino acids that differ between humans and tuna. Also explain that plain-text letters represent amino acids that may vary between species, while letters in bold are amino acids that are identical in all species.
    • First, count the number of differences in the sequence together. The first difference is at position 17 humans have an "I," while tuna have a "T." Be sure all students can identify the 21 differences between humans and tuna.
  5. Have students complete the handouts.
  6. To wrap up, discuss the following points as a class:
    • The table lists three species of fungi: Candida, Neurospora, and baker's yeast. How similar are their Cytochrome C sequences? Their sequences are quite different, with 41 differences between neurospora and baker's yeast, 43 between neurospora and Candida, and 27 between baker's yeast and Candida. What can you say about the evolutionary relationships among the fungi compared to the relationship between the two insects on the table, the screwworm fly and the silkworm moth? The fly and the moth are more closely related in evolutionary time there are only 14 differences between the fly and moth Cytochrome C sequences.
    • Pigs, cows, and sheep have identical Cytochrome C sequences. How can they have the same sequence but be different species? The difference between species is determined by many factors different species can still have identical sequences, especially if they diverged from a common ancestor recently in evolutionary time.
    • Is it appropriate for scientists to infer evolutionary relationships based on information from only one protein? Why or why not? These animals each have thousands of genes. The fact that one gene is identical for the three animals says nothing about the other genes. It's better to look at multiple proteins or other sources of DNA evidence. Proteins evolve at different rates, and additional pieces of evidence will make a prediction about an evolutionary relationship stronger.

Divide the class into four teams. Assign each team one of the following genes: FOXP2, hemoglobin alpha, eyeless, and sonic hedgehog. Have students visit the Kyoto Encyclopedia of Genes and Genomes and look up their gene's amino acid sequence in humans. Have students research how many of the six species from their handouts share this gene with humans for all cases in which species share the gene, have students write down the first ten amino acids listed in the database. Then have students prepare a short report about the gene, how much similarity they discovered between humans and other species, and what scientists know about the gene's function.

Activity answers:
Human-tuna: ____21___
Human: gray whale ___9____
Human: snapping turtle: ____15___
Human-rhesus monkey: ___1__
Human: chicken/turkey: ___13____
Human: neurospora (a type of bread mold): __51_______

  1. Based on the amino acid sequence data you collected, which organism are humans most closely related to? Which organisms are humans most distantly related to? Explain your reasoning.
    Humans are most closely related to the monkey there is only one amino acid difference between the two. Humans are most distantly related to Neurospora there are 51 amino acid differences between the two.
  2. What additional data or information might help you confirm the statement you made above?
    Information from other genes would strengthen the statement we also could use fossil evidence or physical evidence such as similarity in physical structures and features.
  3. Does your answer to Question 1 above match the prediction you made in Step 2 of the Procedure? Explain your answer.
    Answers will vary look for evidence that students compare their answers and explain why they are the same, or why they are different.
  4. Explain how amino acid sequence data can help scientists infer patterns of evolutionary relationships between species.
    An amino acid is one of the building blocks of a protein. A gene's DNA sequence determines the order of amino acids that make up a protein, so changes in the DNA sequence often result in changes in the amino acid sequence as well. By looking for amino acid sequence differences between species, scientists can infer how closely or distantly related two species are in evolutionary time.

Use the following rubric to assess each team's work.

  • Students clearly understand how molecular evidence relates to inferring patterns of evolution
  • Students ask follow-up questions showing creativity and critical thinking
  • Students miscount amino acid difference between species and do not make a connection between molecular evidence and patterns of evolution
  • Students make little effort to complete handouts or participate in discussion.

The "Bird Brains" activity aligns with the following National Science Education Standards (see

Classroom Activity Author

Jennifer Cutraro and WGBH Educational Outreach Staff

Jennifer Cutraro has 12 years of experience in science writing and education. She has written text and ancillaries for Houghton Mifflin, K 12 , and Delta Education and has taught science and environmental education at science centers across the country. She also contributes news and feature stories about science and health to media outlets including The Los Angeles Times, The Boston Globe, Science News for Kids and Scholastic Science World.

What causes a Substitution Mutation?

A substitution mutation can be caused by a number of sources directly related to the reading and storage of DNA. For instance, every hour each cell in your body losses around 1,000 nucleotides from the DNA backbone. These nucleotides fall off due to the process of depurination. In the process of replacing them, the proteins that manage the DNA make a mistake approximately 75% of the time, because there are 4 nucleotides to choose from. Other proteins must come along after and check the DNA for errors. If they miss the substitution mutation, it may stay and be replicated.

Another factor which can drive a substitution mutation is deamination, the process by which amino groups degrade off of nucleotides. One of the only ways the protein machinery can differentiate between nucleotides is the amino groups attached to them. As these fall off, the protein machinery can misrecognize the nucleotide, and supply the wrong nucleotide pair. When the DNA replicates, the new nucleotide will become established in a new cell line.

Sickle-Cell Anemia

The blood disease Sickle-cell anemia is caused by a simple substitution mutation. In the mutation, a single nucleotide is replaced in the portion of DNA which codes for a unit of hemoglobin. Hemoglobin is a multi-protein complex, responsible for carrying oxygen and supporting the shape of blood cells. The substitution mutation causes a glutamic acid in the protein to be changed to a valine amino acid.

While this might not seem like much of a change in a protein which contain over 140 amino acids, it makes all the difference. Valine, unlike glutamic acid, is hydrophobic. As such, it repels polar interactions where glutamic acid would attract them. This severely impacts the protein’s ability to function. Blood cells immediately reflect this change, becoming shriveled and sickle-shaped. With a lower ability to carry oxygen, these cells also are more prone to clot within the small capillaries of organs. This can lead to an increased risk of heart attack, stroke, and other cardiovascular diseases.

Interestingly, the substitution mutation has survived in the population for a surprising reason. The parasite which causes malaria depends on human blood cells for part of its life cycle. People with the sickle-cell substitution mutation are less susceptible to getting malaria. Apparently the different shape and function of the blood cells impedes their reproductive processes.

Color Blindness

In your eye, certain cells are responsible for picking up the colors red, green, and blue. These cells rely on different proteins, which react to the various colors. A substitution mutation in the DNA that codes for one of these proteins can lead to the condition of color blindness. People with this condition have a hard time distinguishing between the colors, while their vision is still clear otherwise. Oftentimes, only one color is knocked out. The various proteins are coded for on different places on the DNA, which makes a substitution unlikely to occur in all three genes.

7.11E: Complementation

  • Contributed by Boundless
  • General Microbiology at Boundless

Complementation refers to a relationship between two different strains of an organism which both have homozygous recessive mutations that produce the same phenotype (for example, a change in wing structure in flies) but which do not reside on the same (homologous) gene.

These strains are true breeding for their mutation. If, when these strains are crossed with each other, some offspring show recovery of the wild-type phenotype, they are said to show &ldquogenetic complementation&rdquo. When this occurs, each strain&rsquos haploid supplies a wild-type allele to &ldquocomplement&rdquo the mutated allele of the other strain&rsquos haploid, causing the offspring to have heterozygous mutations in all related genes. Since the mutations are recessive, the offspring will display the wild-type phenotype.

A complementation test (sometimes called a &ldquocis-trans&rdquo test) refers to this experiment, developed by American geneticist Edward B. Lewis. It answers the question: &ldquoDoes a wild-type copy of gene X rescue the function of the mutant allele that is believed to define gene X?&rdquo. If there is an allele with an observable phenotype whose function can be provided by a wild type genotype (i.e., the allele is recessive), one can ask whether the function that was lost because of the recessive allele can be provided by another mutant genotype. If not, the two alleles must be defective in the same gene. The beauty of this test is that the trait can serve as a read-out of gene function even without knowledge of what the gene is doing at a molecular level.

Figure: Complementation Test: Example of a complementation test. Two strains of flies are white eyed because of two different autosomal recessive mutations which interrupt different steps in a single pigment-producing metabolic pathway. Flies from Strain 1 have complementary mutations to flies from Strain 2 because when they are crossed the offspring are able to complete the full metabolic pathway and thus have red eyes.

Complementation arises because loss of function in genes responsible for different steps in the same metabolic pathway can give rise to the same phenotype. When strains are bred together, offspring inherit wildtype versions of each gene from either parent. Because the mutations are recessive, there is a recovery of function in that pathway, so offspring recover the wild-type phenotype. Thus, the test is used to decide if two independently derived recessive mutant phenotypes are caused by mutations in the same gene or in two different genes. If both parent strains have mutations in the same gene, no normal versions of the gene are inherited by the offspring they express the same mutant phenotype and complementation has failed to occur.

In other words, if the combination of two haploid genomes containing different recessive mutations yields a mutant phenotype, then there are three possibilities: Mutations occur in the same gene One mutation affects the expression of the other One mutation may result in an inhibitory product. If the combination of two haploid genomes containing different recessive mutations yields the wild type phenotype, then the mutations must be in different genes.

What is Recombination

Recombination refers to the exchange of DNA strands, producing new nucleotide rearrangements. It occurs between regions with similar nucleotide sequences by breaking and rejoining DNA segments. Recombination is a natural process regulated by various enzymes and proteins. Genetic recombination is important in maintaining genetic integrity and generating genetic diversity. The three types of recombination are homologous recombination, site-specific recombination, and transposition. Both site-specific recombination and transposition can be considered as non-chromosomal recombination where no exchange of DNA sequences occurs.

Homologous Recombination

Homologous recombination is responsible for the meiotic crossing-over as well as the integration of transferred DNA into yeast and bacterial genomes. It is described by the Holliday model. It occurs between identical or nearly identical sequences of two different DNA molecules that can share homology in a limited region. The homologous recombination during meiosis is shown in figure 4.

Figure 4: Chromosomal Crossing-Over

Site-Specific Recombination

The site-specific recombination occurs between DNA molecules with very short homologous sequences. It is involved in the integration of the DNA of the bacteriophage λ (λ DNA) during its infection cycle into the E. coli genome.


Transposition is a process used by recombination to transfer DNA segments between genomes. During transposition, the transposons or the mobile DNA elements are flanked by a pair of short direct repeats, facilitating the integration into the second genome through recombination.

Recombinases are the class of enzymes which catalyze the genetic recombination. The recombinase, RecA is found in E. coli. In bacteria, recombination occurs through mitosis and the transfer of genetic material between their organisms. In archaea, RadA is found as the recombinase enzyme, which is an ortholog of RecA. In yeast, RAD51 is found as a recombinase and DMC1 is found as a specific meiotic recombinase.

What is a Promoter?

Promoter is a sequence of DNA that is located near the site of the transcription initiation of the gene. It serves as the binding site for the RNA polymerase enzyme. RNA polymerase is the enzyme that catalyses the transcription of the gene. Promoter is always located near to the transcriptional unit of the gene. Promoter contains special DNA sequences that ensure the specific binding of RNA polymerase at the correct binding site for the correct transcription of the transcriptional unit. Main elements of the promoter region are core promoter element and regulatory elements. Transcriptional factors do the recruiting of the RNA polymerase. These factors have activator and repressor sequences to attach into the promoter region and regulate the transcription.

Figure 02: Promoter

Eukaryotic promoters have a conserved sequence known as TATA box that is located 25 to 35 base pairs upstream of the transcription start site. Promoter sequences can contain 100 to 1000 base pairs.

When to decide that two similar sequences are different genes or not? - Biology

We've seen that Pax6 from vertebrates and eyeless from flies are remarkably similar in sequence and function, but what about our other visionaries — the squid and the flatworm? Despite the major differences in their eyes, they all have genes similar to Pax6. Here are corresponding sections of the Pax6-like eye-building genes for our visionaries. Similarities to the mouse gene are highlighted in green:

But why are these genes so similar when the animals from which they come, and the eyes that they develop, are so different? As discussed earlier, there are two basic evolutionary explanations for similarities: homology and analogy. Are these genes homologous (i.e., were they passed down from the common ancestor of all these different organisms) or analogous (i.e., did they all evolve independently through convergent evolution)?

Based on the observations that all of these gene versions are remarkably similar in sequence, have related functions, and are incredibly widespread (animals all across the tree of life have them), scientists have concluded that they must be homologous and must have been inherited from the common ancestor of all these animals. It is just too unlikely that all these different animal lineages happened to independently evolve remarkably similar genes that do remarkably similar jobs. The most parsimonious explanation is that the gene evolved just once long ago and was then passed down to all these different modern animal lineages.

Watch the video: Wenn du dich nicht entscheiden kannst.. (January 2022).