Is the fixation rate always equal to the mutation rate for neutral alleles?


A classical result of population genetic is that the rate of fixation of netreual alleles is the mutation rate $mu$. The reason is that each generation $PN_emu$ mutations enter the population, where $P$ is the ploidy number (e.g. 2 for diploids) and $N_e$ is the effective population size. The probability of each neutral mutation to reach fixation is simply its frequency $p$. When the mutation occur $p=frac{1}{PN_e}$ and therefore the rate of fixation is:

$$lambda = PN_emu frac{1}{PN_e} = mu$$

This result s typically very much used in phylogenetic, as assuming a constant mutation rate only, one can estimate the divergence time between two extant lineages.

The above is also explained on wiki>fixation_rate


How robust is the result $lambda = mu$?

I understand that the result $lambda=mu$ is independent of the effective population size but is it also independent of…

  • changes in population size?
  • background selection?
  • selective sweep?
  • population structure?
  • selfing rate?
  • etc…

The answer to your headline question is that no, fixation rate is not always equal to the mutation rate for neutral alleles. For instance:

Fixation rates for neutral alleles are affected by changes in population size, given a constant mutation rate. In general, fixation rates are lower in growing populations (Waxman 2012).

This makes instinctive sense if you consider the single-generation fixation probability of a neutral allele which only exists at all but one copy in the population, i.e., the common allele (A) exists at 2N(t)-1 copies, whereas the less common allele (a) exists at 1 copy. To simplify, let's assume complete random mating. Under negative population growth (say N(t+1) = 0.9N(t)), there will be 0.9*(2N) places that the rare allele could possibly exist at t+1, and it has a 1/(2*N(t)) chance of filling each of them. Under positive population growth (say, N(t+1) = 1.1*N(t)), the rare allele could potentially occupy 1.1*(2N) places at t+1, but still has 1/(2*N(t)) chance of filling each of them. It is clear in this case that A has a reduced generation-wise likelihood of fixation under positive population growth than under negative population growth, simply because the rare allele has an increased chance of doggedly persisting. The finding applies to any allele near fixation.

If I have understood Kim and Stephan (2000) correctly, then overall fixation rates are also reduced by background selection. The main mechanism for this is that background selection purges linked neutral alleles, decreasing overall heterozygosity and thereby reducing the chance that a new, rare allele will become established. The same surely applies to selective sweeps, given linked neutral alleles.

Is the fixation rate always equal to the mutation rate for neutral alleles? - Biology

Evolution requires genetic variation.

Yet eventually directional selection and genetic drift will act to decrease variation. This is a natural consequence of these processes and biologists were at one time concerned that most evolution, or change in the direction of adaptation, would cease.

Researchers became very interested in how much variation existed in natural populations for selection to work on.

Mid sixties biologists used electrophoresis to measure variability.

Technique of choice separates protein on the basis of mobility through a gel under the influence of an electric current.

Generated new estimates of "genetic diversity" of h, probability that two alleles chosen at random from all the alleles at that locus in the population are different.

or h =Sum of xi xj or 1-sum of xi xi + xjxj

Under random mating this equals the population heterozygosity and equal to the number of heterozygous individuals in classical HW populations. H for a population consisting of 25 AA, 50 Aa and 25 aa individuals is 0.5.

Electrophoresis allows a new approach where genetic diversity can also be expressed as the percent of polymorphic loci found in the population.

For example, if 20 loci are studied by electrophoresis and 16 show no variation and 4 have more than one band on the gel, then the percent polymorphism for that individual would be 4/20 X 100 = 20%. Can determine these for several individuals and obtain an average for a population or even a group.

In animals, a broad range in average heterozygosity was found and was more than expected.

Birds 15%, Insects 50%, mammals 20%, fish 30%

Scientists were astonished at the variability shown at the protein level and even in proteins playing important roles in growth and maintenance. But is this variation enough? These findings and considerations of how variability could be maintained, even during strong selection, lead to several ideas, the most important of these is the role of genetic drift in the process.

1. Neutrality .

Kimura was the first to propose that most of evolutionary change at the molecular level occurs as a consequence of random genetic drift, because most mutations at this level are essentially neutral. Assuming neutrality would allow populations to maintain substantial levels of variation. Also neutral alleles will not be exposed to selection in some sense and so any changes in frequency would be due to genetic drift.

Now near neutrality is proposed for many alleles. Genotypes are composed of a large number of alleles that may be only slightly deleterious or advantageous. These would essentially not be seen by selection on the entire phenotype and found in phenotypes of even high fitness. These alleles then would simply be maintained in the population by drift until they in new environments could become much more deleterious or advantageous.


Paper with more information for interested students, class will not be responsible for content on exams.

The first evidence proposed for neutrality or near neutrality was molecular clocks.

Molecular clocks: A concept that correlates the number of substitutions to time, assuming that (a) the mutations are selectively neutral (or nearly neutral) and (b) the substitution rate is uniform. Consequently, the number of substitutions that separate two gene copies would be a function of the elapsed time since their most recent common ancestor

The first attempt to look at molecular evolution appeared to reveal a fairly constant and characteristic rate of change per amino acid in a protein or class of proteins as expected by this theory and the term molecular clock was born. In fact today, molecular differences between species are often used to infer phylogenies because constant rates per unit time are assumed.

Kimura argues that it is easier to explain constant changes assuming neutrality than selection. Mutations occur randomly, but if most are neutral, this rate influences the number the drift to fixation and over large amounts of time that rate will appear constant. Under selection, this would require too steady of a rate for environmental change.

Evidence:rate seems constant over time

Problems: Different rates depending on groups and proteins compared. Some of this is expected, as we compare groups that may represent different time periods so longer for the molecular clock to tick. Also classification schemes are somewhat arbitrary. There are many more arthropods than there are chordates and yet in the classic scheme of classification, they are both phyla.

But still it looks like different organisms even if we account for this, do show different rates and certainly some proteins do (see graph below). But how variable given these considerations should different rates among protein amino acid substitutions be to reject neutrality?

Problems with testing for neutrality:

The theory of neutral alleles is difficult to test because most proponents of neutrality are not discounting selection entirely, in fact they look upon selection and other forces as constraints to neutrality. So they would acknowledge that some proteins because of vital roles they play, may be more strongly selected than others. They simply hold most amino acid substitution is neutral.

Proponents that tend to discount neutrality as a significant force even at the molecular level are not dismissing it or genetic drift, just saying eventually selection triumphs. But neutrality could still explain much of the variation we see at the molecular level.

(There are those that proposed the extreme, that all AA replacements are the result of neutral mutation and drift, called pan-neutralists, but their interpretation is not the most common one).

Examine these two examples :


Assume if you have a protein that needs a negatively charged amino acid for the resulting polypeptide to fold into the proper 3 dimensional shape to be functional. Proponents of the theory will allow selection to weed out any mutation that does not result in a negative amino acid, but will assume the any negatively charged amino acid will do, and be fixed by chance (the result of neutral mutation and genetic drift).

Also probability would predict the some irregularity to the clock in any case, so how much irregularity should be allowed?

If you find a negative amino acid that seems more or less prevalent than expected on the basis of strict neutrality, is it because of selection? Maybe this is an amino acid that is more difficult to hand metabolically or more difficult to obtain in the environment. Or since we never expect perfection in data can we dismiss the deviation as due to experimental error. Again even the most forceful proponents of selection, allow for some drift.


Some argue that the clock should be influenced by generation time. The shorter the generation time, the greater the number of mutation, including neutral ones, so species with short generations times should evolve faster. Yet most protein clocks are generation independent.

A triumph for selection? Maybe not?

Species with short generation times tend to have large populations and species with long generation times tend to have small populations. So the increased rate of fixation in long generation, but small population organisms, offsets the increased number of mutations in short generation, but large population organisms, and gives rise to a similar "clock".

If change at the molecular level is caused to some extent by genetic drift and neutrality, then this change may restrict the variation natural selection has to work on when the organisms in question encounter novel environments. In this way, genetic drift coupled with neutrality or near-neutrality can affect the resulting adaptations at the macro level.

Neutral Theory of Evolution Debunked

Michael Behe’s new book Darwin Devolves is already number one in new releases on Amazon.[1] To adequately summarize it would take a small book, so I will look at one small section where he reviews the attempts to salvage Darwinism, which he shows always fail. I have also added a few references to support Behe’s conclusions.

The main problem in evolution, as stated by the late Harvard Biology Professor William E. Castle, is the origin “of a new organism is one of the least understood of all natural phenomena. Even to the trained biologist it is an unexplained mystery.”[2] This statement is still true over a century later. One attempt to explain the origin of new organisms is the neutral theory of evolution. Neutral theory, along with genetic drift, natural selection and random mutation, is viewed by its supporters to be a basic mechanism of macroevolution.[3]

The Neutral Theory of Evolution

A neutral mutation is one that does not adversely affect either an organism’s phenotype nor its fitness.[4] The neutral theory of evolution postulates the accumulation of neutral mutations, such as accidental duplication of a section of DNA that causes no harm. This occurs until a new combination produces a DNA set that, in the future, confers some specific survival advantage to the organism.[5]

The theory accepts the view that about one percent of the human DNA codes for proteins, and significant portions of the rest is evidence of, or could be due to, neutral mutations.[6] Later,

other lucky mutations could occur in the extra DNA to confer some helpful feature, perhaps a regulatory site. Repeat this scenario many times over, and small populations of bacteria could evolve larger and larger genomes with more and more sophisticated features.[7]

The theory proposes that, when environmental conditions change, some of these neutral mutations may have produced a new gene, or a set of bases, that turns out to be beneficial in the new environment.[8] Neutral evolution theory has earned the qualified support of many leading evolutionary scientists, including Arizona State University Professor Michael Lynch, Eugene Koonin of the National Center for Biotechnology, and the late Harvard Professor Steven Jay Gould.[9]

University of Chicago evolutionist Jerry Coyne wrote, the two main neo-Darwinian evolutionary mechanisms are natural selection and the genetic variety produced by genetic drift.[10] The theory is anti-neo-Darwinian as explained by one of the early leaders of the idea, Motoo Kimura, who wrote that in

sharp contrast to the Darwinian theory of evolution by natural selection, the neutral theory claims that the overwhelming majority of evolutionary changes at the molecular level are caused by random fixation (due to random sampling drift in finite populations) of selectively neutral (i.e., selectively equivalent) mutants under continued inputs of mutations.[11]

DNA transcription (Illustra Media)

Neo-Darwinism postulates that evolution works by fine-tuning genes that give a slight survival advantage to the population so that it gradually gives the organism a progressively greater survival advantage. Evolution is not the active agent in this scenario. It is a result, not an action. The neutral theory of evolution holds that most evolutionary changes are random, and most of the variation within, and between, species is ultimately not caused by natural selection, but by random genetic drift of neutral alleles originally produced by mutations. Kimura adds that neutral theory

also asserts that most of the genetic variability within species at the molecular level (such as protein and DNA polymorphism) are selectively neutral or very nearly neutral and that they are maintained in the species by the balance between mutational input and random extinction.[12]

He concludes that “since the origin of life on Earth, neutral evolutionary changes have predominated over Darwinian evolutionary changes” (1991, p. 367).

Genetic Drift

The basis of neutral theory is genetic drift, which postulates that genomic DNA base pairs change primarily by random genetic mutations and other genetic events. Genetic drift (or allelic drift) is a change in the frequency of a gene variant (allele) in a population that does not confer an immediate selection advantage to the organism. If this event occurs in gametes, the end result is the creation of new genetic variety that may, in the future, be evolutionarily advantageous. Drift can also occur in selective alleles, and can also have a selective disadvantage.

A major reason neutral theory was proposed is because “all the central assumptions of the Modern Synthesis (often called Neo-Darwinism) have been disproved.”[13] The late geneticist Dr. Motoo Kimura proposed neutral theory in 1968 because many molecular research findings were “quite incompatible with the expectations of Neo-Darwinism.”[14] A major problem was that evolution from one gene family type to another different family type, has never been directly documented. Furthermore, intermediate gene forms between the old, less-functional or non-functional and new, more-functional, gene by random mutations as proposed by Darwinism, is seriously problematic.

Thus, neutral theory “is in sharp contrast to the traditional neo-Darwinian (i.e., the synthetic) theory of evolution, which claims that the spreading of mutants within the species in the course of evolution can occur only with the help of positive natural selection.”[15] Neutral theory also, in contrast to Darwinism, postulates that, if selection occurs, the genes selected in the next generation are more likely to be those genes from a “lucky” few individuals, and not necessarily from those life-forms that are healthier or in some way “better.”

Neutral theory supporters accept the conclusion that most mutations are slightly deleterious, but claim that, because these mutant genes are rapidly purged by natural selection, they do not make significant contributions to the variation within and between species at the molecular level. They claim that only neutral mutations, or those that are close-to-neutral, can achieve this.

Sanford’s book examines the impact of mutations that are invisible to selection.

In contrast to the neutral theory, much evidence now exists for the view that most mutations are not strictly neutral, but near-neutral, meaning not harmful as a single entity, but collectively accumulate, eventually causing disease or death. Aging is the most well-documented example of the accumulation of near-neutral mutations. As all animals age and die, so too does a species by the same mechanism, the accumulation of near-neutral mutations.

The Junk and Duplicated DNA Problem

A major difficulty with neutral theory is the assumption that most or at least much DNA is selectively neutral based on the belief that most DNA is non-functional. As Kimura concluded, neutral theory “asserts that most intraspecific variability … is selectively neutral.”[16] Junk DNA was assumed to be a major source of raw, genetic material that can gradually be modified by genetic drift or mutations to change into a gene that eventually becomes functional. However, the ENCODE project has documented that over 80 percent of all so-called junk DNA is actually functional,[17] thus creating a major problem for neutral theory.

Biology Professor Nathan Lents admits, even if only one mutation renders a gene broken, repair “is like a lightning strike… The odds of lightning striking the same place twice are so infinitesimally tiny as to be nonexistent…. it’s exceeding unlikely that a mutation will fix a broken gene because, following the initial damage, the gene will soon rack up additional mutations.”[18] He adds, if over half of our genes are broken, how can we survive as a species?

His answer is “the majority of these pseudogenes are the result of accidental gene duplications [which] .. . explains why the disrupting mutations and subsequent death of the gene didn’t have any deleterious effects on the individual.” This is the common explanation for why most mutations do not appear to adversely affect the genome, a view that was falsified by the ENCODE findings. If one specific base change is very unlikely, the probability of massive changes that result in a new gene that proves beneficial in the future is far less likely.

Another hypothesized source of new genes is gene duplication, enabling one gene to continue to carry out the function that it was originally evolved to fulfill, and the other gene to evolve into a new gene that can serve another, new function in the genome.[19] The problems with this view have been well documented.[20] One problem with the duplication theory is that both genes are generally equally susceptible to new mutations, likely damaging both the original and the new gene.

Another problem is, if one gene is duplicated and mutations occur in the original, or the copied gene, it will not be selected until, and unless, the new protein the gene produces is functional and confers some selective advantage to the organism. Until then, if it produces a protein, the protein will often be cut up and the parts recycled. The evolution from junk DNA theory faces the same problem. To solve these problems was one reason why the neutral theory was originally proposed.

The major problem with the neutral theory is the fact that a non-functional gene is not just useless, but worse. If it does not serve some beneficial function in the organism it could adversely affect the organism. The high cost of duplicating and maintaining the gene is one reason why nonfunctional genes are costly for the cell.

Amino acids (Illustra Media)

Another problem is mutations are not always random. Most occur in hot spots and tend to degenerate into certain bases, such as thymine, and also into a code for certain amino acids, namely those produced by six different sets of bases such as Serine, which is coded for by TCT, TCC, TCA, TCG, ATG, AGC). Both Arginine, and Leucine are also coded for by six combinations of base pairs. The result is random combinations will code for these amino acids 9.4 percent of the time and those produced by one different base set, such as Tryptophan (coded for by TGG) and Methionine (coded for by ATG) will be produced by chance only 1.6 percent of the time.

Gene Regulation

To be functional, a gene requires the proper transcription factors and other regulation and control systems. A gene that has evolved by neutral theory, even if it could produce a useful product, is useless until it has the proper regularity and control mechanisms, including the spliceosome system required to remove introns. Control of both up- and down-regulation of all genes is also critical for cell and organism survival.

This is illustrated by the transposition of a gene somewhere else in the genome, such next to a gene that, as a result, is improperly regulated. An example is a housekeeping gene that is transposed to a gene which causes up-regulation of cell division.[21] Also, transposition of a gene next to a regulatory sequence that is constitutively expressed can also cause that gene to be over-expressed, resulting in cancer or other problems.[22]

The DNA repair systems also work against genetic drift. It is now well-documented that “DNA is an alarmingly fragile molecule…. vulnerable to UV light and mutagenic chemicals, as well as spontaneous decay. Life has survived through the ages because enzymes inside every cell ensure that DNA remains in proper working order.”[23] Critical to this survival are the dozen or so DNA repair mechanisms that resist genetic drift, thus working against neutral evolution theory.

Gene transcription is tightly regulated by enzymes and repair mechanisms (Illustra Media)

The mechanism that repairs DNA to ensure that the molecule is very stable repairs most genetic-drift changes in spite of the fact that without this repair system “under normal conditions, DNA quickly suffers enough damage to make life impossible.”[24]

The DNA repair system is highly effective except in cells that have accumulated a large amount of DNA damage, such as cancer cells. Cancer is often due to mutations of key parts of the repair system, such as p53, the so-called “guardian of the genome.” Cells that can no longer effectively repair DNA damage enter one of three possible states: 1) an irreversible state of dormancy known as senescence, 2) cell suicide known as apoptosis or programmed cell death, or 3) unregulated cell division, which can lead to a cancerous tumor. None of these conditions permits the genetic drift that allows for neutral evolution.

The Molecular Clock Problem

The main factors that motivated the neutral theory proposal include two observations that created problems for Neo-Darwinism. One was the so-called evolutionary genetic clock that was based on base substitutions of amino acids that resulted from DNA changes.[25] Functioning of this clock requires a fairly consistent rate of change in most organisms.

In large populations, if mutation rates are roughly the same for most genes, then simple, random models will predict a molecular clock.[26] Because both of these considerations are erroneous, the molecular clock is not consistent. The major problem is that the genetic clock

makes no sense in Darwin’s world, where molecules subject to strong selection should evolve faster than others, and where organisms exposed to different changes and challenges from the environment should vary their evolutionary rates accordingly.[27]

Gould acknowledged that the “molecular clock is neither as consistent nor as regular as Kimura once hoped.”[28]

The molecular clock hypothesis depends on a constant rate of change.

The refutation of the molecular clock was only one of several major blows to neutral theory.[29] Kimura referred to the discovery that high levels of variation are maintained by many genes in the population. The problem for neutral theory was too much variation in genetic changes

poses a problem for conventional Darwinism because a cost can be associated with the replacement of an ancestral gene by a new and more advantageous state of the same gene—namely, the differential death, by natural selection, of the new disfavored parental forms. This cost poses no problem if only a few old genes are being pushed out of a population at any time.[30]

Furthermore, “if hundreds of genes are being eliminated” by natural selection because they are deleterious, then any one organism likely possesses many of the deleterious mutant genes, impairing its survival chances. Consequently,

the data on copious variability seemed to indicate a caldron of evolutionary activity at far too many genetic sites—too many, that is, if selection governs the changes in each varying gene. Kimura, however, recognized a simple and elegant way out of this paradox. If most of the varying forms of a gene are neutral with respect to selection, then they are drifting in frequency by the luck of the draw, invisible to natural selection because they make no difference to the organism.[31]

Today, the term neutral theory is often defined narrowly in terms of the result of sampling disparities, although this narrow definition is problematic.

Neutral Theory Conflicts with Darwinism

Kimura’s conception of neutral theory obviously posed serious problems for Darwinism. To avoid the problem of directly challenging Darwinism, which could produce enormous opposition to his theory, Kimura does not openly deny it, but rather views the Darwinian “processes as quantitatively insignificant to the total picture—a superficial and minor ripple upon the ocean of neutral molecular change, imposed every now and again when selection casts a stone upon the waters of evolution.”[32] Conversely, orthodox Darwinians, “tended to argue that neutral change occupied a tiny and insignificant corner of evolution—an odd process occasionally operating in small populations at the brink of extinction anyway.”[33]


A major evolutionary problem that neutral theory attempts to address is, ever since

Darwin proposed his theory of natural selection to explain evolution, most evolutionary theories have always been a matter of debate and controversy. The neutral theory was not an exception.[34]

Genetics research has progressed well beyond that in Kimura’s day, refuting neutral theory. Furthermore, because “the neutral theory is quantitative, it is able to make testable predictions.”[35] As the late Cornell Professor William Provine and others have documented, the testable predictions for neutral theory, especially random drift, have largely failed.[36] The evidence against neutral theory is now overwhelming, and as a result the theory has been regulated to the dustpan of history. As Alvarez-Valin noted, the predictions of neutral theory flatly do not agree with many of the scientific facts.[37]

[1] Behe, Michael. 2019. Darwin Devolves: New Science About DNA that Challenges Evolution. New York, NY: HarperOne.

[2]Castle, William E.. 1916. Genetics and Eugenics. A Textbook for Students of Biology. Cambridge, MA: Harvard University Press, p. 4.

[3] Tomkins, Jeffrey and Jerry Bergman. 2017. Neutral model, genetic drift and the third way—A synopsis of the self-inflicted demise of the evolutionary paradigm. Journal of Creation. 31(3):94–102

[4]Duret, Laurent. 2008. Neutral theory: The null hypothesis of molecular evolution. Nature Education. 1(1):218.

[5] Alvarez-Valin, F. 2002. Neutral theory. Encyclopedia of Evolution. New York, NY: Oxford University Press, pp. 815–821 Behe, 2019, p. 99.

[8] Kimura, M. 1979. “The Neutral Theory of Molecular Evolution.” Scientific American. November, 241:98-129.

[10] Coyne, Jerry. 2015. Faith vs Fact: Why Science and Religion Are Incompatible. New York, NY: Viking, pp. 139–140..

[11]Kimura, M. 1991. Recent development of the neutral theory viewed from the Wrightian tradition of theoretical population genetics. Proceedings of the National Academy of Science. 88:5969–5973, p. 367. Kimura, M. 1991. The neutral theory of molecular evolution: a review of recent evidence. Japanese Journal of Genetics. 6(4):367-386.

[13]Noble, D. 2013. Physiology is rocking the foundations of evolutionary biology. Experimental Physiology. 98(8):1235–1243, p. 1235.

[14]Kimura, M. 1983. The Neutral Theory of Molecular Evolution. New York, NY: Cambridge, p. 25.

[17] Luskin, C. (2012, September 5) “Junk No More: ENCODE Project Nature Paper Finds ’Biochemical Functions for 80% of the Genome’.” Evolution News. Retrieved June 7, 2018 from

[18] Lents, Nathan. 2018. Human Errors. Boston, MA: Houghton Mifflin. p. 72.

[20] Bergman, Jerry. 2006. Does gene duplication provide the engine for evolution? Journal of Creation. 20(1):99–104

Not the first time evolutionists have blundered.

[21]Prelich, G. 2012. Gene overexpression: Uses, mechanisms, and interpretation. Genetics. 190:841–854. Retrieved June 7, 2018 from

[23]Stokstad, E. 2015. DNA’s repair tricks win chemistry’s top prize.[not a subtitle only the website’s tease for the article] Science. 350(6258):266, p. 266.

[25] Kimura, M. 1987. Molecular evolutionary clock and the neutral theory. Journal of Molecular Evolution. 26:24–33. Kimura, M. 1968. Evolutionary rate at the molecular level. Nature. 217:624–626.

[26]Gould, S.J. 1989. Through a lens darkly. Natural History. September, pp. 16–24, p. 17.

[29] Tomkins, Jeffrey and Jerry Bergman. 2015. Evolutionary molecular genetic clocks—A perpetual exercise in futility and failure. Journal of Creation. 29(2):26–35.

[36] Provine, William B. 2014. The “Random Genetic Drift” Fallacy. Published by Author.

[37] Alvarez-Valin, 2002, p. 821 Wolf, J.B., E.D. Brodie, III, and M.J. Wade, eds. 2000. Epistasis and the Evolutionary Process. New York, NY: Oxford University Press.

Dr. Jerry Bergman has taught biology, genetics, chemistry, biochemistry, anthropology, geology, and microbiology at several colleges and universities including for over 40 years at Bowling Green State University, Medical College of Ohio where he was a research associate in experimental pathology, and The University of Toledo. He is a graduate of the Medical College of Ohio, Wayne State University in Detroit, the University of Toledo, and Bowling Green State University. He has over 1,300 publications in 12 languages and 40 books and monographs. His books and textbooks that include chapters that he authored, are in over 1,500 college libraries in 27 countries. So far over 80,000 copies of the 40 books and monographs that he has authored or co-authored are in print. For more articles by Dr Bergman, see his Author Profile.

Genetic Diversity

Theories of Molecular Evolution: Selection versus Neutrality

Theoretically, balancing selection could account for protein polymorphism ( Gillespie, 1991 ). In contrast, the neutral theory of molecular evolution ( Kimura, 1983 ) suggests that most of the molecular–genetic diversity within and between species is neutral (i.e., non-selective) or “non-Darwinian.” The neutralist–selectionist debate has been one of the major controversies in evolutionary biology since the late 1960s. How much of the genetic diversity at single and multilocus structures is adaptive, processed by natural selection and contributing to differences in fitness? The problem of distinguishing between deterministic and stochastic forces in evolution has pervaded evolutionary biology at all levels, genotypic and phenotypic, and is now focused on DNA polymorphisms. I recognize the contribution of the neutral and nearly neutral theories of molecular evolution, primarily by representing a null hypothesis to selection. Nevertheless, by ignoring the ecological heterogeneity and stress in evolution, neutral and nearly neutral theories have stripped genetic diversity from nature. I believe that in-depth understanding of genetic diversity in nature is intimately linked to the interface between ecology and genetics hence, to ecological genetics and now to ecological genomics. I submit that only this essential interface can meaningfully highlight the dynamic evolution of genetic diversity in nature.


According to the neutral theory of molecular evolution, the rate at which molecular changes accumulate between species should be equal to the rate of neutral mutations and hence relatively constant across species. However, this is a per-generation rate. Since larger organisms have longer generation times, the neutral theory predicts that their rate of molecular evolution should be slower. However, molecular evolutionists found that rates of protein evolution were fairly independent of generation time.

Noting that population size is generally inversely proportional to generation time, Tomoko Ohta proposed that if most amino acid substitutions are slightly deleterious, this would increase the rate of effectively neutral mutation rate in small populations, which could offset the effect of long generation times. However, because noncoding DNA substitutions tend to be more neutral, independent of population size, their rate of evolution is correctly predicted to depend on population size / generation time, unlike the rate of non-synonymous changes. [2]

In this case, the faster rate of neutral evolution in proteins expected in small populations (due to a more lenient threshold for purging deleterious mutations) is offset by longer generation times (and vice versa), but in large populations with short generation times, noncoding DNA evolves faster while protein evolution is retarded by selection (which is more significant than drift for large populations) [2] In 1973, Ohta published a short letter in Nature [1] suggesting that a wide variety of molecular evidence supported the theory that most mutation events at the molecular level are slightly deleterious rather than strictly neutral.

Between then and the early 1990s, many studies of molecular evolution used a "shift model" in which the negative effect on the fitness of a population due to deleterious mutations shifts back to an original value when a mutation reaches fixation. In the early 1990s, Ohta developed a "fixed model" that included both beneficial and deleterious mutations, so that no artificial "shift" of overall population fitness was necessary. [2] According to Ohta, however, the nearly neutral theory largely fell out of favor in the late 1980s, because the mathematically simpler neutral theory for the widespread molecular systematics research that flourished after the advent of rapid DNA sequencing. As more detailed systematics studies started to compare the evolution of genome regions subject to strong selection versus weaker selection in the 1990s, the nearly neutral theory and the interaction between selection and drift have once again become an important focus of research. [3]

The importance of the Neutral Theory in 1968 and 50 years on: A response to Kern and Hahn 2018

A recent article reassessing the Neutral Theory of Molecular Evolution claims that it is no longer as important as is widely believed. The authors argue that “the neutral theory was supported by unreliable theoretical and empirical evidence from the beginning, and that in light of modern, genome-scale data, we can firmly reject its universality.” Claiming that “the neutral theory has been overwhelmingly rejected,” they propose instead that natural selection is the major force shaping both between-species divergence and within-species variation. Although this is probably a minority view, it is important to evaluate such claims carefully in the context of current knowledge, as inaccuracies can sometimes morph into an accepted narrative for those not familiar with the underlying science. We here critically examine and ultimately reject Kern and Hahn's arguments and assessment, and instead propose that it is now abundantly clear that the foundational ideas presented five decades ago by Kimura and Ohta are indeed correct.

The Neutral Theory of Molecular Evolution asserts that most de novo mutations are either sufficiently deleterious in their effects on fitness that they have little chance of becoming fixed in the population, or are under such weak selection that they may become fixed as a result of genetic drift (Kimura 1968 , 1983 King and Jukes 1969 ). Furthermore, the rate of substitution of neutral mutations between species is equal to the mutation rate (Kimura 1968 ). A critical first extension of this framework involved the inclusion of nearly neutral mutations, along with the recognition that the proportion of the genome represented by selectively constrained sites (where mutations have low probabilities of fixation by drift) depends on the effective population size of the species or genomic region (Ohta 1973 ). While drifting to fixation or loss, neutral and nearly neutral mutations contribute to DNA sequence variation within populations. The Neutral Theory further hypothesizes that advantageous mutations are sufficiently rare, compared to the constant input of neutral and deleterious variants, that they should be rarely present in samples of segregating variation, especially because of their rapid spread to fixation.

These ideas greatly changed the thinking of evolutionary biologists. Genetic drift was taken much more seriously than previously, stimulating a large body of fruitful empirical research into molecular evolution and variation, as well as fundamental advances in the stochastic theory of evolution, summarized in Kimura's influential book (Kimura 1983 ). It is now difficult to appreciate how radical a departure this view of evolution represented: in the 1950s and 1960s, almost all evolutionary changes were attributed to directional natural selection, and most polymorphisms with alleles at intermediate frequencies were thought to be maintained by balancing selection (e.g., Ford 1975 ). Despite his pioneering contributions to stochastic population genetic theory, Fisher famously rejected any significant evolutionary role for genetic drift (Fisher 1930 ), though it is notable that Wright had simultaneously developed a deep appreciation for the importance of these stochastic effects that was later justified when molecular variants began to be studied (Wright 1931 ).

It is against this historical backdrop that Kern and Hahn ( 2018 ) discuss a purported controversy in population genetics concerning the predictive power and applicability of the Neutral Theory, beginning with the suggestion that “the ubiquity of adaptive variation both within and between species means that a more comprehensive theory of molecular evolution must be sought.” Although those who initially developed the Neutral Theory did not claim that all sequence changes are neutral—indeed, Kimura himself developed some of the most fundamental theoretical formulations of selection and its interactions with genetic drift—Kern and Hahn ( 2018 ) argue that modern data have demolished the original evidence supporting the Neutral Theory. This is not a new claim. For example, Gillespie criticized some of the original arguments in favor of neutrality (e.g., Gillespie 1991 ), and nearly identical views were expressed in Hahn ( 2008 ). The novelty of the arguments of Kern and Hahn ( 2018 ) mainly lies in their emphasis on the effects of selection at linked sites on patterns of variation within genomes. Accordingly, we focus primarily on this aspect of their paper. As will become clear, a major problem with Kern and Hahn's views arise from their narrow definition of the Neutral Theory, which they summarize as follows: “differences between species are due to neutral substitutions (not adaptive evolution), and (….) polymorphisms within species are not only neutral but also have dynamics dominated by mutation-drift equilibrium.”

To support this narrow view, Kern and Hahn argue for pervasive effects of selection, relying heavily on a small number of population-genomic studies suggesting that as many as 50% of amino-acid replacement substitutions in Drosophila are adaptive (see, for example, the review by Sella et al. 2009 ), which they claim contradicts Kimura's ( 1968 , 1983 ) and King and Jukes’ ( 1969 ) assertion that most such substitutions are caused by genetic drift. Apart from the inherent uncertainty in these estimates (discussed by Fay 2011 ), it is misleading to use them to make the general claim that the Neutral Theory is insufficient to explain genome-wide patterns of variation and evolution these inferred frequencies of adaptive substitutions mostly concern only the small fraction of the genome that codes for proteins (e.g., <2% of the human genome see Lander et al. 2001 ). Kern and Hahn further overstate the pervasiveness of adaptive substitutions by highlighting studies in humans and plants that focus on the limited subset of genes that evolve rapidly. The circularity involved in ignoring the vast majority of neutral or nearly neutral substitutions across the genome, and then rejecting a significant role for neutrality, hardly justifies the need for the “selection theory of molecular evolution” advocated by Hahn ( 2008 ).

Second, with regard to the effects of selection on linked neutral or nearly neutral sites, Kern and Hahn ( 2018 ) emphasize the well-established positive correlation between recombination rates and levels of variation that has been observed in several species (Cutter and Payseur 2013 ). They begin with the very strong assertion that “these results imply that almost no loci are free from the effects of selection, in any organism.” This broad claim is unjustified, given that there are relatively few species for which such data are available. Although this correlation (first documented in Drosophila melanogaster by Begun and Aquadro 1992 ) indeed suggests that selection reduces neutral variation at linked sites through the process of hitchhiking, the mutagenic effects of recombination itself may also contribute to this pattern (Pratto et al. 2014 Arbeithuber et al. 2015 ). Hitchhiking can involve both selective sweeps caused by the spread of favorable mutations (Maynard Smith and Haigh 1974 ), and the removal of neutral variants closely linked to deleterious mutations—background selection (Charlesworth et al. 1993 Charlesworth 2012 ). In an explicit comparison between models of widespread purifying selection on weakly deleterious alleles versus recurrent positive selection on beneficial alleles, Lohmueller et al. ( 2011 ) found a much better fit of the former to the observed pattern in humans (see also Pouyet et al. 2018 ), as did Comeron ( 2014 ) for Drosophila.

Importantly, observations from eukaryotic genomes, including humans and mice, show that levels of polymorphism are low in the neighborhood of coding or conserved noncoding sequences and increase approximately monotonically away from them (Cutter and Payseur 2013 Johri et al. 2017 Lynch et al. 2017 ). While selective sweeps may contribute to this pattern, and are indeed required to explain other observations (Campos et al. 2017 ), these findings imply that any selective sweeps involved must have rather local effects. Despite these results, Kern and Hahn ( 2018 ) emphasize studies that invoke pervasive positive selection to explain genome-wide patterns of variation (e.g., Garud et al. 2015 Schrider and Kern 2017 ). However, these claimed effects must be evaluated with caution owing to their failure to exclude or take proper account of the effects of the (unknown) non-equilibrium demographic histories of the populations in question.

Regardless of the precise interplay of the two forms of hitchhiking, background selection and selective sweeps, in shaping patterns of variation, it is important to note that neither affect the probability of fixation of neutral mutations (Birky and Walsh 1988 ), which determines the rate of neutral sequence evolution. Both models are based on strong evidence that the vast majority of segregating variation is neutral or nearly neutral, and neither model contradicts the evidence that the vast majority of fixed differences between populations and species are also neutral or nearly neutral. Furthermore, both background selection and selective sweeps may be viewed as reducing the effective population size (Ne) of affected genomic regions, at least as a first approximation (see Charlesworth 2009 ). As shown by Kimura and Ohta (Kimura and Ohta 1971 Ohta 1973 Kimura 1983 ), a reduction in Ne causes the fixation probabilities of mutations with selective effects to be closer to those of neutral mutations, such that the rate of fixation of beneficial mutations is reduced, and the rate of fixation of deleterious mutations is increased—thereby, increasing the fraction of mutations that behave as effectively neutral. Thus, these hitchhiking effects only further emphasize the fundamental evolutionary role of genetic drift. Although the earliest formulations of the Neutral Theory focused on the dynamics of individual loci, and the effects of selection in reducing the Ne values at linked loci were not studied, we could not have understood these patterns without the contributions of Kimura and Ohta. It is simply a misunderstanding of the role of theoretical models in illuminating the interpretation of data to claim, as do Kern and Hahn ( 2018 ), that hitchhiking effects imply that levels of polymorphism are not at mutation-drift equilibrium, and “therefore, current data appear to be fundamentally incompatible with the neutral theory.”

A large fraction of the genome of organisms studied to date is subject to mutations that are effectively neutral with respect to their fitness effects, and hence evolve under genetic drift.

The great majority of newly arising mutations that do affect fitness (i.e., non-neutral mutations) are deleterious, and the predominant mode of natural selection is purifying in nature, removing these deleterious mutations from populations.

Natural populations are rarely at demographic equilibrium, and commonly have undergone recent historical changes. The combined effects of population size changes, structure, and migration all shape patterns of within-species variation. These demographic histories cannot be assumed to affect patterns of variation uniformly across the genome, and indeed may produce different effects in different genomic regions, mimicking expectations under selection (e.g., Wall et al. 2002 Thornton and Jensen 2007 ).

A combination of genetic drift (as modulated by the demographic history of the population) with both direct and linked purifying selection shapes patterns of genomic variation. Thus, a model taking joint account of all of these effects is essential for genomic analysis (Comeron 2017 ), and progress is being made towards this goal (e.g., Zeng and Charlesworth 2010 ).

Beneficial mutations occasionally arise and some may reach fixation or high frequencies, and localized hitchhiking effects related to such events have been convincingly described in a variety of organisms. In some cases, these genotypic changes have been meaningfully connected with both phenotype and fitness. However, the effects of these comparatively rare, localized positive selection events are best characterized and quantified as additional to the genome-wide processes described above (Stephan 2010 ). In the absence of an appropriate null model accounting for these processes that are common to the genome as a whole, inappropriate adaptive story-telling will be likely to proliferate.

All five points are fully consistent with the ground-breaking work of Kimura and Ohta. Furthermore, developments made in the light of empirical observations subsequent to Kimura's initial publication are straightforward extensions of the Neutral Theory. They demonstrate its continued importance, rather than demolishing it. Over the past five decades, such insights have enhanced our understanding of the interplay of population size with drift-selection dynamics (Ohta 1973 ), and described the hitchhiking effects of selection induced by the comparatively rare class of beneficial mutations (Maynard Smith and Haigh 1974 ), as well as those caused by the much more common class of deleterious mutations (Charlesworth et al. 1993 ). This framework has also served as an organizing principle for understanding patterns of variation in genome architecture (Lynch 2007 ), and for understanding the evolution of cellular features, including the mutation rate itself (Lynch et al. 2016 ).

Thus, our use of the term “ground-breaking” to describe the Neutral Theory is not meant to imply a scientific advance that was fully formed at the outset. Like other major scientific advances, the Neutral Theory has been adjusted and modified over time in light of later observations and thought, yet retains its value. For example, Darwin's findings and reasoning supporting the operation of natural selection were not abandoned owing to his lack of a satisfactory theory of heredity—indeed, the incorporation of that subsequent knowledge only strengthened the underlying concepts (Fisher 1930 ). Similarly, the Neutral Theory should not be dismissed because of the lack of emphasis on the effects of selection at linked sites in its initial formulation, as subsequent studies have only served to emphasize the fundamental role of near neutrality and genetic drift in shaping the variation observed within and between species. Indeed, Ohta and Kimura were among the first to study such effects, in their analysis of the apparent overdominance at neutral sites induced by linkage to sites subject to heterozygote advantage or selection against deleterious mutations (Ohta and Kimura 1970 Ohta 1971 ).

In sum, the transition to molecular biology has increased the importance of population genetics for our understanding of evolution. Moreover, instead of unraveling the prior theoretical framework, the influx of molecular data has lent support to many pre-genomic theoretical developments. Although the edifice may not yet be complete, the Neutral Theory changed how people thought about evolution at the molecular level, and this framework appropriately continues to serve as the basis of modern evolutionary genomics. Thus, great credit is owed to the scientists who worked this theory out in detail and anticipated much of what it could tell us once genes (and genomes) could be sequenced.

The fixation probability of beneficial mutations

The fixation probability, the probability that the frequency of a particular allele in a population will ultimately reach unity, is one of the cornerstones of population genetics. In this review, we give a brief historical overview of mathematical approaches used to estimate the fixation probability of beneficial alleles. We then focus on more recent work that has relaxed some of the key assumptions in these early papers, providing estimates that have wider applicability to both natural and laboratory settings. In the final section, we address the possibility of future work that might bridge the gap between theoretical results to date and results that might realistically be applied to the experimental evolution of microbial populations. Our aim is to highlight the concrete, testable predictions that have arisen from the theoretical literature, with the intention of further motivating the invaluable interplay between theory and experiment.

1. Introduction

Mathematical population genetics is a field with an extremely rich historical literature. The first questions about gene frequency distributions were posed in analytical form by Fisher independent studies were conducted by Wright and Haldane. Fisher, Haldane and Wright together shaped the foundations of the field and are referred to as the ‘great trinity’ (Crow 1994) of population genetics. The works of these authors (Fisher 1922, 1930 Haldane 1927 Wright 1931) are now considered to be the classic papers in the field.

One of the central ideas addressed by these authors is the fixation probability: the probability that the frequency of a particular allele in a population will ultimately reach 100 per cent. Mathematically, there are several approaches to computing fixation probabilities, and interest in this problem has been sustained for almost a century: the first papers were written in the early 1920s, and there have been important advances in every decade since. Empirically, the fixation probability is necessary in order to estimate the rate at which a population might adapt to a changing environment, the rate of loss of genetic diversity or the rate of emergence of drug resistance.

The last several years have seen two key advances in this field. First, a number of important, and fascinating, theoretical advances have been made, each bringing us one step closer to theoretical predictions that might pertain in a ‘real’ laboratory population. Second, in parallel with this effort, experimental techniques in microbial evolution have advanced to the point where the fate of a novel mutant strain within a controlled population can be followed over many generations. Thus, these experiments are on the verge of being able to test our theoretical predictions of the fixation probability—predictions that have in many cases stood untested for 80 or 90 years. This is extremely exciting.

Although neutral and deleterious mutations may also reach fixation in finite populations, in the following review we will restrict our attention to beneficial mutations. The selective advantage, s, of a beneficial mutation is typically defined for haploids as follows: if each wild-type individual has on average W offspring per generation, each mutant individual has on average W(1+s) offspring. Throughout this review we will assume that this definition of s holds, unless stated otherwise. For simplicity, for diploid individuals we will use s to denote the advantage of the heterozygote, although the notation hs is also typically used.

In a deterministic model, an initially rare beneficial mutation will increase in frequency in each generation, and fixation is certain. In reality, however, the frequency of any particular lineage fluctuates over time. These fluctuations, ‘genetic drift’, are very likely to cause the extinction of a beneficial lineage when its frequency is low, and require a stochastic treatment. Once the frequency of the mutant is sufficiently large, further increases are well approximated by a deterministic model. Estimating the fixation probability for a beneficial mutation is thus usually equivalent to estimating the probability that the mutation survives genetic drift when initially rare.

The underlying distribution of s, i.e. the distribution of selective effects for all possible beneficial mutations, is a topic of current interest, both theoretically and experimentally. Although beyond the scope of this review, we refer the interested reader to several recent papers (Rozen et al. 2002 Orr 2003 Rokyta et al. 2005 Kassen & Bataillon 2006). A closely related, or even overlapping, issue is adaptation: the rate of fitness increase or overall rate at which beneficial mutations arise and become fixed. While fixation probabilities are essential building blocks in the models of adaptation, such models also require further assumptions, such as an underlying distribution of selective effects or a model for combining the effects of multiple mutations. Estimating the rate of adaptation has a rich literature in its own right, and again we refer the interested reader to a few key references (Orr 1994, 2000 Wilke 2004 Desai & Fisher 2007 Goncalves et al. 2007). We touch on this issue again in §5.3.

2. Historical overview

Broadly speaking, there are three approaches to computing fixation probabilities. When the state space of a population (exactly how many individuals have exactly which genotype) can be enumerated, a Markov chain approach can determine the fixation probability exactly. This approach is nicely outlined for the non-specialist reader by Gale (1990), and is typically feasible only when the population size is quite small (but see Parsons & Quince 2007a,b, discussed in §3.3). When the population size is large, methods based on discrete branching processes are often used. These methods build on the ‘Haldane–Fisher’ model (Fisher 1922, 1930 Haldane 1927, 1932), which is itself based on a Galton–Watson branching process. We note that any branching process approach provides an approximation to the true fixation probability, as it assumes that the wild-type population is sufficiently large that the fate of each mutant allele is independent of all others. This approach has been widely, and successfully, applied to a number of interesting recent questions regarding the fixation probability (Athreya 1992 Haccou & Iwasa 1996 Lange & Fan 1997 Otto & Whitlock 1997 Wahl & Gerrish 2001 Johnson & Gerrish 2002 De Oliveira & Campos 2004 Wahl & DeHaan 2004 Champagnat & Lambert 2007). Finally, when the population is large and the change in gene frequency is small in each generation (i.e. selection is weak), methods that incorporate a diffusion approximation may be used. These approaches follow from the pioneering ‘Wright–Fisher–Kimura’ model (Fisher 1922, 1930 Wright 1931, 1945 Kimura 1957, 1962), and are also in wide use today (Yamazaki 1977 Wahl & Gerrish 2001 Gavrilets & Gibson 2002 Whitlock 2003). Significant effort has also been made towards unifying or reconciling the discrete and continuous approaches (Kimura & Ohta 1970 Otto & Whitlock 1997 Wahl & Gerrish 2001 Lambert 2006). We will discuss many of these recent papers in turn in the sections to follow.

The most widely known result regarding the fixation probability is Haldane's celebrated approximation, obtained for weak selection using a discrete-time branching process. Haldane (1927) demonstrated that the probability of ultimate fixation, π, of an advantageous allele is given by π≈2s, when the allele is initially present as a single copy in a large population.

Haldane's elegant result necessarily relies on a number of simplifying assumptions. The population size is large and constant, generations are discrete and the number of offspring that each individual contributes to the next generation is Poisson distributed. This last simplification masks an assumption on which the fixation probability critically depends: individuals in such a branching process cannot die before having offspring. In effect, individuals die in such models only by having zero offspring. But since the probability of having zero offspring is completely determined by the mean of the Poisson distribution, there is no room in Haldane's approach to independently specify a survival probability. This will become important as we review some recent work that relaxes this assumption.

This work by Haldane, as well as Wright (1931) and Fisher (1992), was later generalized in a number of different directions, most notably by Kimura (Kimura 1957, 1962, 1964, 1970 Kimura & Ohta 1970). Kimura's approach was to use a diffusion approximation to model small changes, over many generations, in the frequency of a particular allele. To understand Kimura's foundational result, we must briefly introduce Ne, the variance effective population size. If we imagine a diploid population in which, for example, mating is not random or the sex ratio is not 1 : 1, these effects may change the variance in the number of offspring alleles per parental allele. Ne is then the size of an ‘ideal’ population—a large population of constant size, in which mating is random and we have equal numbers of males and females—that would give the same variance as the real population in question. Kimura's most widely known result is that the probability of ultimate fixation, π, of an allele with an initial frequency p and an additive selective effect s is

For large diploid populations, equation (2.1) implies that the fixation probability for a new mutation that arises as a single copy decreases with larger effective population sizes. However, the decay of this function is extremely rapid for example, for s=0.01, a population size of 100 is already sufficient that the denominator is approximately 1. For all but extremely small populations or nearly neutral mutations, we then find that π≈2sNe/N for a mutation occurring as a single copy. Thus, π depends on the ratio of effective population size to census size. It is also clear that when Ne=N, we obtain Haldane's approximation π≈2s for weak selection (Haldane 1927). By contrast, the fixation probability for an allele that is present at a given frequency increases with population size. (Note, however, that a single copy of an allele corresponds to a smaller frequency in a larger population, and thus π≈2s still holds.)

A final note on the approximation π≈2sNe/N is that s reflects the selective advantage of the beneficial allele, while Ne is most often inversely proportional to the variance in offspring number. This foreshadows the important work of Gillespie (1974, 1975) who predicted that the ratio of the mean to the variance in offspring number is necessary in determining both the long-term effects of selection on a beneficial allele and the fixation probability. This idea, particularly as applied to long-term selective effects, has been expanded in a number of elegant recent papers (Proulx 2000 Lande 2007 Orr 2007 Shpak & Proulx 2007).

Much progress has been made since the work of Kimura and the great trinity. As we will review in the following sections, the fixation probability has now been estimated in populations of fluctuating size, for populations whose size cycles among a set of constant values and, more recently, fluctuates according to a density-dependent birth–death process. Populations experiencing exponential or logistic growth or decline have been treated, as have populations that are subject to sustained growth periods followed by a population bottleneck—a sudden reduction in population size. A large body of work treats populations subdivided into demes, most recently including heterogeneous selection among demes and asymmetrical migration. Recent work has also addressed multiple segregating alleles, specifically treating quasi-species interactions and clonal interference, as described in the sections to follow.

3. Populations of changing size

3.1 Growing, declining or cyclic population sizes

Fisher (1930) suggested that the probability of fixation of beneficial alleles would increase in growing populations and decrease in declining populations. Analysis by Kojima & Kelleher (1962) confirmed Fisher's proposition. Fisher's claim was further justified through the theoretical studies of logistically changing populations by Kimura & Ohta (1974).

Ewens (1967) used a discrete multitype branching process to study the survival probability of new mutants in a population that assumes a cyclic sequence of population sizes, as well as a population that initially increases in size and thereafter remains constant. For the former case, Ewens found the probability of fixation of a beneficial mutation to be

Ewens' relaxation of the assumption of constant population size was an important step towards generalizing fixation probability models however, he still maintained the other classic assumptions and only explored two cases of changing population sizes. The approximation in equation (3.1) led Kimura (1970) to a conjecture that equation (2.1) may be used for populations that assume a cyclic sequence of values, with Ne replaced by . Otto & Whitlock (1997) later built on the work of Ewens and Kimura by addressing the question of the fixation probability of beneficial mutations in populations modelled by exponential and logistic growth or decline. These authors proved that the conjecture made by Kimura holds true for the populations in which the product ks is small, where k is the total number of discrete population sizes.

All the papers mentioned above assume a Poisson distribution of offspring. Although such a distribution may be a good model of reproductive success in many species, some species clearly cannot be modelled well by such a distribution (e.g. bacteria that reproduce by binary fission). Pollak (2000) studied the fixation probability of beneficial mutations in a population that changes cyclically in size, assuming a very general distribution of successful gametes, described by a mean and variance, which are functions of the population size. Assuming that a beneficial mutation first appears in a single heterozygous individual, and that such an individual has 1+s times as many offspring as the wild-type, Pollak proved that the result found for the Poisson-distributed offspring by Ewens (1967) and Otto & Whitlock (1997) still holds: that the fixation probability is approximately proportional to the harmonic mean of the effective population sizes in the cycle and inversely proportional to the population size when the mutation manifests.

3.2 Population bottlenecks

In an attempt to provide estimates of the fixation probability for microbial populations maintained in experimental evolution protocols, Wahl and Gerrish studied the effect of population bottlenecks on fixation. A population bottleneck is a sudden, severe reduction in population size. In experimental evolution, bottlenecks are an inherent feature of the protocol (Lenski et al. 1991 Lenski & Travisano 1994 Bull et al. 1997) the population typically grows for a fixed period of time, and then is sampled randomly such that it is reduced to its initial size. The repetition of this procedure is called ‘serial passaging’.

An important point to note is that at the population bottleneck, each individual—mutant or wild-type—survives with the same probability. Thus the ‘offspring’ distribution of each individual at the bottleneck is the same, for either mutant or wild-type. By contrast, during growth the selective advantage of the mutant is realized. Thus the case of growth between population bottlenecks is not simply a special case of cyclic population sizes.

Wahl & Gerrish (2001) derived the probability that a beneficial mutation is lost due to population bottlenecks. For this derivation they used both a branching process approach (Haldane 1927 Fisher 1930) as well as a diffusion approximation (Wright 1945 Kimura 1957, 1962). When selection is weak, Wahl and Gerrish demonstrated that the two approaches yield the same approximation for the extinction probability X of a beneficial mutation that occurs at time t between bottlenecks: 1−X≈2srtτ e −rt . Here s is the selective advantage of the mutant over the wild-type strain, r is the Malthusian growth rate of the wild-type population and τ is the time at which a bottleneck is applied. It was thus found that the fixation probability, π, drops rapidly as t increases, implying that mutations that occur late in the growth phase are unlikely to survive population bottlenecks. Since this model treats only extinction due to bottlenecks, this effect is not due to the large wild-type population size late in the growth phase, but rather due to the fact that the beneficial mutant does not have sufficient time to found a lineage large enough to survive the bottleneck. Wahl and Gerrish also defined an effective population size given by NeN0, where Ne is the effective population size and N0 is the population size at the beginning of each growth phase. This approximation is independent of the time of occurrence of the mutation as well as its selective advantage.

In 2002, this model was extended to include resource-limited growth (Wahl et al. 2002). Resource limitation was included in order to better model serial passaging protocols for bacterial populations, in which the growth phase is typically limited by a finite resource in the growth medium. For both resource-limited and time-limited growth, mutations occurring in the early stages of a growth phase were more likely to survive. Wahl et al. predicted that although most mutations occur at the end of growth phases, mutations that are ultimately successful occur fairly uniformly throughout the growth phase.

The two papers described above included extinction during bottlenecks, but did not include the effects of genetic drift during the growth phase, i.e. the possibility of extinction of an advantageous mutant lineage between bottlenecks. Heffernan & Wahl (2002) incorporated the latter effect, assuming a Poisson distribution of offspring during the growth phase, and using a method based on the work of Ewens (1967). This model predicted a greater than 25 per cent reduction in the fixation probability for realistic experimental protocols, compared with that predicted by Wahl & Gerrish (2001).

The method presented by Heffernan is valid for both large and small values of selective advantage, s. This was an important extension of previous results, especially given the recent reports of large selective advantages in the experimental literature (Bull et al. 2000). When selection is weak and the mutation occurs at the beginning of a growth phase, Heffernan and Wahl derived the approximation πs(k−1), where k is the number of generations between bottlenecks. This approximation is analogous to the classic result π≈2s (Haldane 1927) but is increased by a factor of (k−1)/2.

The work discussed in this section considers only the loss of beneficial mutations due to bottlenecks and genetic drift. In reality, rare beneficial mutations in asexual populations may also be lost during the growth phase due to competition between multiple new beneficial alleles (see §5.3) or quasi-species interactions (see §5.2). Most importantly, the papers described above either assume deterministic growth between bottlenecks or discrete generation times with offspring numbers that are Poisson distributed. These are not ideal simplifications for many microbial populations. Thus, the tailored life-history models described in §6 should provide a more accurate approach to these questions, although they have not, as yet, been as fully developed as the papers described here.

3.3 Dynamically changing population sizes

Three intriguing papers addressing population sizes that change dynamically, according to underlying birth and death events, appeared in 2006 and 2007.

Lambert (2006) developed an extension of the Moran (1958) model, assuming that birth events have a constant per capita rate, while death events have a per capita rate that increases with population density. Lambert addressed three model constructions: the first model considered independent continuous-state branching processes the second model considered branching processes conditioned to produce a constant population size and finally the third model included logistic density dependence through a density-dependent death rate.

For the first and second models at a large population limit, Lambert pointed out that the factor 2 in Haldane's result of π≈2s for very small s stems from the assumption that the offspring distribution is Poisson. For near-critical branching processes, more generally, π≈2s/σ, where σ is the variance of the offspring distribution (Haccou et al. 2005). Thus, increased reproductive variance always reduces the fixation probability in such models.

For the third model, density dependence results in an upper asymptotic limit on the ‘invasibility coefficient’ that is, the rate at which the selective advantage of the mutant increases the fixation probability. Consequently, Lambert found that Haldane's classic approximation (π≈2s) and Kimura's diffusion approximation (equation (2.1)) tend to underestimate the fixation probability of beneficial mutations in growing populations and overestimate it in declining populations. This result is consistent with those of Parsons & Quince (2007a,b), described below, as well as the classic predictions of Fisher (1930), Kojima & Kelleher (1962) and Kimura & Ohta (1974).

Ultimately, Lambert derived a concise expression for the fixation probability, which holds for all three models. The limitation of this approach is that it holds only when the selective advantage of the beneficial mutation is small, such that higher order terms in s are negligible.

Parsons & Quince (2007a) introduced stochastic population sizes in a similar way. In contrast to the work of Lambert, Parsons and Quince considered density-dependent birth rates and density-independent death rates. Another key difference is that Parsons and Quince did not assume that selection is weak. In particular, they argued based on their results that the parameter space over which the assumptions in Lambert (2006) are valid may in fact be quite limited.

In the first case considered (the ‘non-neutral case’), the carrying capacities of the mutant and wild-type are not equal. For advantageous mutants, Parsons and Quince found that stochastic fluctuations in the wild-type population do not affect the fixation probability. On the other hand, for deleterious mutants, the fixation probability is proportional to the fluctuation size of the wild-type population, but relatively insensitive to initial density.

In a second paper, Parsons & Quince (2007b) investigated the ‘quasi-neutral’ case: the carrying capacities of mutant and wild-type are identical, but the birth and death rates are different. Since the carrying capacities are determined by a ratio of the birth and death rates, this implies a life-history trade-off between these parameters. Parsons and Quince used a diffusion approximation to determine the fixation probability when the carrying capacity is large. The authors predicted an increase in fixation probability for the type with a higher birth rate in growing populations and a reduction in a shrinking population. When the population is at carrying capacity initially, the type with a higher birth rate has larger fluctuations in population size and thus a reduced fixation probability.

A shared feature of the approaches described in this section is that beneficial mutations can affect more than one life-history parameter or ‘demographic trait’. Both models predict that the fixation probability depends on this mechanism of the selective advantage. This work is thus closely related to the more detailed life-history models described in §6 to follow.

4. Subdivided populations

Pollak (1966) was the first to address the question of the fixation probability (π) in a subdivided population. Pollak considered a situation in which K subpopulations occupy their respective habitats, with the possibility of migration between subpopulations. A branching process approach was used to deduce that for symmetric migration, π in a subdivided population is the same as that in a non-subdivided population. Later, for the case of symmetric migration, Maruyama (1970, 1974, 1977) used the Moran model with a diffusion approach to show that a similar result holds.

Populations structured into discrete demes were also studied by Lande (1979) and Slatkin (1981) among others. Lande (1979) demonstrated the elegant result that if a population is subdivided into demes, the net rate of evolution is the same as the rate of evolution in a single deme, where the rate of evolution is given by the probability of fixation of a single mutant multiplied by the number of mutations per generation in one deme. This result relies on the assumption that a mutation fixed in one deme can spread through the whole population only by random extinction and colonization. Slatkin (1981) then showed that for a given pressure of selection in each local population, the fixation probability of a mutant allele is bounded below by the appropriate fixation probability in an unstructured population of the same total size and above by the fixation probability obtained by assuming independent fixation in each deme. Slatkin found that the fixation probability is higher in the low-migration limit than in the high-migration limit when a heterozygote mutant has a fitness that is less than the arithmetic mean fitness of the two homozygote states (underdominance). The reverse was found to be true when the heterozygote was more fit than the average homozygote fitness (overdominance). This stands to reason: high migration increases the fixation probability in the overdominant case and decreases the fixation probability in the underdominant case.

Barton & Rouhani (1991) further investigated the fixation probability in a subdivided population, exploring the limiting case when migration is much larger than selection, so that the difference in gene frequency between adjacent demes is very small. In a model with two demes, π was greatly reduced by migration in this model. This observation, however, did not extend to a large array of demes. Clarifying Slatkin's prediction that underdominance reduces the fixation probability, Barton and Rouhani showed that the chance of fixation is considerable despite free gene flow and moderate selection against heterozygotes, as long as the neighbourhood is small and the homozygote has a substantial advantage.

In contrast to Lande's result, Barton and Rouhani concluded that even though the fixation probability for any one mutation may be very low, the overall rate of fixation of any particular novel allele may be very high. This is because mutations can arise in any of a very large number of individuals any mutation that is fixed in a large enough area has high probability of spreading through the entire population.

Like previous models, Barton and Rouhani assumed that migration is symmetric. Relaxing this assumption, Tachida & Iizuka (1991) considered asymmetric migration under the condition of strong selection and found that spatial subdivision increases π. This observation was consistent with the numerical results of Pollak (1972). However, the model by Tachida and Iizuka considered only a two-patch population. Lundy & Possingham (1998) extended the two-patch models of previous authors to investigate π in three- and four-patch systems. When migration is asymmetric, Lundy and Possingham found that the influence of a patch on the overall fixation probability depends largely on two factors: the population size of the patch and the net gene flow out of the patch.

More recently, Gavrilets & Gibson (2002) have studied the fixation probabilities in a population that experiences heterogeneous selection in distinct spatial patches, and in which the total population size is constant. In this model, each allele is advantageous in one patch and deleterious in the other. The results in this contribution are in agreement with the arguments of Ohta (1972) and Eldredge (1995, 2003) that, depending on exactly how migration rates change with population size, selection can be more important in small populations than large populations.

In a model of distinct patches, which focuses on extinctions and recolonizations, Cherry (2003) found that these two effects always reduce the fixation probability of a beneficial allele. Cherry's conclusion is consistent with Barton's (1993) observation for a favoured allele in an infinite population, but applies more generally. Cherry derived both an effective population size and an effective selection coefficient, for beneficial alleles in this model, such that established results for unstructured populations can be applied to structured populations. In his exposition, Cherry (2004) assumed that an extinct patch can be recolonized by only one founding allele. The author goes on to explore the case of more than one founding allele after extinction, confirming that extinction and recolonization reduce the fixation probability for beneficial alleles.

Whitlock (2003) relaxed some of the assumptions in previous structured population models to study the fixation of alleles that confer either beneficial or deleterious effects, with arbitrary dominance. Whitlock constructed a model that allows for an arbitrary distribution of reproductive success among demes, although selection is still homogeneous. He found that in a ‘differentially productive environment’, the effective population size is reduced relative to the census size and thus the probability of fixation of deleterious alleles is enhanced, while that of beneficial alleles is decreased. In a further paper, Whitlock & Gomulkiewicz (2005) examined the question of fixation probability in a metapopulation when selection is heterogeneous among demes. In contrast to the metapopulations with homogeneous selection, Whitlock and Gomulkiewicz concluded that the heterogeneity in selection never reduced (and sometimes substantially enhanced) the fixation probability of a new allele. They found that the probability of fixation is bounded below and above by approximations based on high- and low-migration limits, respectively.

An alternative realization of a spatially structured model was studied by Gordo & Campos (2006) who determined the rate of fixation of beneficial mutations in a population inhabiting a two-dimensional lattice. Under the assumption that deleterious mutations are absent and that all beneficial mutations have equal quantitative effect, Gordo and Campos found that the imposition of spatial structure did not change the fixation probability of a single, segregating beneficial mutation, relative to an unstructured haploid population (in agreement with the findings of Maruyama 1970). However, interestingly, spatial structure reduced the substitution rate of beneficial mutations if either deleterious mutations or clonal interference (more than one beneficial mutation segregating simultaneously) were added to the model. In an elegant example of experimental and theoretical interactions, the conclusions of Gordo and Campos were experimentally substantiated by Perfeito et al. (2008) who studied bacterial adaptation in either unstructured (liquid) or structured (solid) environments.

From the overview above, it is clear that an extremely rich literature surrounding the fixation probability in subdivided populations has been developed. In particular, Whitlock's recent work has relaxed a large number of the limiting assumptions in earlier papers, encompassing beneficial or deleterious mutations, arbitrary dominance, heterogeneous selection and asymmetric mutation. As argued by Whitlock & Golmulkiewicz (2005), some intriguing questions remain. For example, it seems likely that multiple alleles could be simultaneously segregating in different demes this case has not yet been treated in a subdivided population, although it is related to §5 below.

5. Multiple segregating alleles

In §4 above we have discussed the fixation probability in populations that are spatially subdivided (i.e. spatially heterogeneous populations). In analogy, here we consider populations that are divided into a variety of genetic rather than geographical backgrounds. This genetic heterogeneity can occur when multiple alleles are segregating simultaneously at the same locus or when contributions from other linked loci are considered. In general, the literature surrounding these questions suggests numerous possibilities for new work.

5.1 Effects of linked and deleterious alleles

The effects of linked loci on the fixation probability of a beneficial mutation have been extensively studied, beginning with the ideas of Fisher (1922) and Hill & Robertson (1966). Peck (1994), in particular, focused on the fixation probability of a beneficial mutation in the presence of linked deleterious mutations, finding that deleterious mutations greatly reduce the fixation probability in asexual, but not sexual, populations. A more detailed model is presented by Charlesworth (1994) who derived expected substitution rates and fixation probabilities for beneficial alleles when the deleterious alleles are at completely linked loci. A key result of this work is that deleterious linked loci reduce the effective population size, by a factor given by the frequency of mutation-free gametes.

Barton (1994, 1995) derived a more comprehensive method for computing the fixation probability of a favourable allele in different genetic backgrounds. For a single large heterogeneous population, Barton found that loosely linked loci reduce fixation probability through a reduction in the effective population size, by a factor that depends on the additive genetic variance. At tightly linked loci, however, Barton demonstrated that deleterious mutations, substitutions and fluctuating polymorphisms each reduce the fixation probability in a way that cannot be simply captured by an effective population size.

The study of linked loci was extended by Johnson & Barton (2002) who estimated the fixation probability of a beneficial mutation in an asexual population of fixed size, in which recurrent deleterious mutations occur at a constant rate at linked loci. Johnson and Barton assumed that each deleterious mutation reduces the fitness of the carrier by a factor of (1−sd) (i.e. any deleterious mutation has the same quantitative effect on fitness). Furthermore, it is assumed that the beneficial mutation increases the fitness of an individual carrier by a factor of (1+sb) regardless of the number of deleterious mutations present in the carrier. Thus, the relative fitness of an individual with a beneficial mutation and i deleterious mutations is wi=(1+sb)(1−sd) i . Johnson and Barton estimated the fixation probability by summing fiPi, where fi is the probability that a beneficial mutation arises in an individual with i deleterious mutations and Pi, given by the solution of simultaneous equations, is the probability that a beneficial mutation arising in such an individual is not ultimately lost. Johnson and Barton were thus able to quantify the reduction in the fixation probability of a beneficial mutation due to interference from segregating deleterious mutations at linked loci. Interestingly, this result is then used to determine the expected rate of increase in population fitness and the mutation rate that maximizes this fitness increase.

5.2 Quasi-species fixation

Quasi-species theory describes the evolution of a very large asexually reproducing population that has a high mutation rate (Eigen & Schuster 1979 Eigen et al. 1988, 1989 Domingo et al. 2001). This theory is often cited in describing the evolution of RNA viruses (Domingo et al. 2001 Wilke 2003 Manrubia et al. 2005 Jain & Krug 2007). Several authors have questioned the relevance of quasi-species theory to viral evolution (Moya et al. 2000 Jenkins et al. 2001 Holmes & Moya 2002), arguing that the mutation rates necessary to sustain a quasi-species are unrealistically high. In contrast, however, Wilke (2005) reviewed related literature and argued that quasi-species theory is the appropriate model for the population genetics of many haploid, asexually reproducing organisms.

In typical models of population genetics, it is assumed that mutations are rare events, such that an invading mutant strain will not mutate again before fixation or extinction occurs. In contrast, in quasi-species models, the offspring of a mutated individual are very likely to mutate before fixation. Consequently, the fitness of an invading quasi-species is not solely determined by the fitness of the initial/parent mutant, but depends on the average fitness of the ‘cloud’ of offspring mutants related to that parent, continually introduced by mutation, and removed through selection (the ‘mutation–selection balance’). In quasi-species theory, therefore, the fixation of a mutant is defined to be its establishment as a common ancestor of the whole population since the population is never genetically identical, the standard definition does not apply.

Wilke (2003) first investigated the fixation probability of an advantageous mutant in a viral quasi-species. This contribution uses multitype branching processes to derive an expression for the fixation probability in an arbitrary fitness landscape. Wilke initially assumed that mutations that are capable of forming a new invading quasi-species are rare. Thus, while mutations within the quasi-species are abundant, only one quasi-species will be segregating from the wild-type quasi-species at any given time. Under this assumption, the fixation probability was determined for fixation events that increase the average fitness of the population (situations where the average fitness is reduced or left unchanged were not addressed). If πi denotes the probability of fixation of sequence i, that is, the probability that the cascade of offspring spawned by sequence i does not go extinct, and Mij gives the expected number of offspring of type j from sequences of type i in one generation, Wilke demonstrated that the vector of fixation probabilities satisfies (with the convention ). This implies

As discussed more fully in §6, estimates of the fixation probability are extremely sensitive to assumptions regarding the life history of the organism. Wilke's elegant result is a generalization of Haldane's approach, retaining the assumptions of discrete, non-overlapping generations and Poisson-distributed offspring. As these assumptions are not particularly well suited for the life history of viruses, it remains unclear which conclusions of this study would hold in viral populations.

5.3 Clonal interference

In a genetically homogeneous asexual population, two or more beneficial mutations may occur independently in different individuals of the population. Clonal interference refers to the competition that ensues between the lineages of these independent mutations thereby, potentially, altering the fate of the lineages. The idea that competing beneficial mutations may hinder a beneficial mutation's progress to fixation was formulated by Muller (1932, 1964) in his discussions on the evolutionary advantage of sex. Since that time, numerous studies have been conducted on the subject of clonal interference in the last decade a rich literature, both experimentally and theoretically, has developed, sparked by renewed interest in the adaptation of asexual populations in laboratory settings.

A review of this growing literature would be substantial, and is outside the scope of this contribution, relating more closely to adaptation and adaptation rates than to fixation and extinction probabilities, narrowly defined. However, we give a brief overview of the standard means of estimating fixation probabilities under clonal interference, and refer the reader to other recent contributions (Campos & de Oliveira 2004 Campos et al. 2004, 2008 Rosas et al. 2005 De Visser & Rozen 2006).

Gerrish & Lenski (1998) published the first discussion of fixation probabilities under clonal interference. Gerrish and Lenski considered the possibility that while an initial beneficial mutation is not yet fixed, it is possible for a set of other mutations to emerge in the population. If at least some of these mutations survive extinction when rare (for example, due to genetic drift), a competition ensues between the focal mutation and the subsequent mutations. Assuming that the probability density for the selective advantage of beneficial mutations is given by αe −αs , Gerrish and Lenski stated that the probability the focal mutation fixes will be . The function π(s) gives the probability that a given beneficial mutation is not lost through drift when rare, while the function λ(s) gives the mean number of mutations that: occur before the focal mutation fixes have a higher s than the focal mutation and survive drift. We note that λ(s) is also a function of the population size, the mutation rate and α. Under the assumption that mutations appear spontaneously at a constant rate, e −λ(s) then gives the probability that zero superior mutations occur, and survive drift, before the focal mutation fixes. This basic structure for the fixation probability during clonal interference has been augmented in subsequent contributions (Campos & de Oliveira 2004 Campos et al. 2004). The most interesting prediction of this work is that at high mutation rates, clonal interference imposes a ‘speed limit’ on the rate of adaptation.

There is a small conceptual flaw in this derivation (P. Gerrish 2000, personal communication), which is that the possibility that other beneficial mutations were segregating before the initial appearance of the focal individual was neglected. If many mutations are segregating simultaneously, the focal beneficial mutation is likely to have arisen on the background of a previously segregating beneficial mutation. Thus mutations may sweep in groups, the ‘multiple mutation’ regime. Conceptually, the multiple mutation regime lies on a continuum between clonal interference as described by Gerrish & Lenski (1998) and quasi-species dynamics.

The dynamics of adaptation in the multiple mutation regime have been recently described in some detail (Desai & Fisher 2007 Desai et al. 2007). In contrast to the work of Gerrish & Lenski (1998), these authors predicted that clonal interference may not always reduce adaptation rates. Like Gerrish and Lenski, this approach depends on the underlying probability that a beneficial mutation escapes extinction through drift when rare, and assumes that this probability is proportional to s.

6. Life-history models

In almost every contribution discussed so far, beneficial mutations are assumed to increase the average number of offspring: so-called ‘fecundity mutants’. For many organisms, however, a mutant may have the same average number of offspring as the wild-type, but may produce these offspring in a shorter generation time: ‘generation time mutants’. An example here is bacterial fission in the presence of antibiotics: many antibiotics reduce cell growth and thus mutations conferring resistance have a reduced generation time.

This issue was first addressed by Wahl & DeHaan (2004) who approximated the fixation probability for beneficial generation time mutants (πG), in a population of constant size or a population that grows between periodic bottlenecks. The approach is closely related to that of Pollak (2000). In a model with the Poisson offspring distribution with mean 2 and weak selection, it was found that πGs/ln(2) for a constant population size, while πGτs/2 ln(2), when τ, the number of generations between population bottlenecks, is moderately large. For a mutation that increases fecundity, the analogous approximation is π≈2s in a constant population size (Haldane 1927), while an estimate of πτs was obtained for a population with a moderately large τ (Heffernan & Wahl 2002). Thus, assuming that all mutations confer a fecundity advantage leads to an overestimate of the order 2 ln(2)∼1.4 for generation time mutations.

These results emphasize the sensitivity of fixation probabilities to the underlying life history of the organism being modelled, and to the specific effect of the beneficial mutation on this life history. Based on these results, Hubbarde and co-authors studied the fixation probability of beneficial mutations in a ‘burst–death model’ (Hubbarde et al. 2007 Hubbarde & Wahl 2008). This model is based on the well-known continuous-time branching process called the birth–death process, in which each individual faces a constant probability of death, and a constant probability of undergoing a birth event, in any short interval of time. Thus, the generation time or lifetime of each individual is exponentially distributed.

In contrast to a birth–death model, however, a burst event can add more than one offspring to the population simultaneously (a burst of two might model bacterial fission a burst of 100 might model a lytic virus). The burst–death model explored by Hubbarde et al. treats populations in which the expected size is constant (i.e. the death rate balances the burst rate), and populations that grow between periodic bottlenecks. Hubbarde et al. computed the fixation probability for mutations that confer an advantage by increasing either the burst size or the burst rate. This work was extended by Alexander & Wahl (2008) who compared the fixation probability of mutations with equivalent effects on the long-term growth rate, i.e. equally ‘fit’ mutations. The latter paper demonstrates that mutations that decrease the death rate (increasing survival) are most likely to fix, followed by mutations that increase the burst rate. Mutations that increase the burst size are least likely to fix in the burst–death model.

The important departure in the burst–death model from previous work is that a beneficial mutation may affect a number of life-history traits independently. Thus, the mean number of offspring can change independently of p0, the probability of having zero offspring. While the mean largely determines the long-term growth rate, or Malthusian fitness, of the mutant, the fixation probability is sensitive to short-term processes, particularly p0.

By contrast, when generation times are fixed and offspring numbers are Poisson distributed, the only way for a mutation to be beneficial is for it to increase the mean number of offspring, by a factor typically denoted (1+s). The probability of leaving zero offspring is completely constrained by this mean, and this ultimately implies that fixation probabilities, while perhaps not equal to 2s, are at least proportional to s under these classic assumptions.

This simple proportionality no longer holds when more complicated, and thus more realistic, life histories are considered. The overall conclusion here is that for many real populations, estimates of the fixation probability should take into account both the life-history details of the organism and the mechanism by which the mutation confers a reproductive advantage.

7. From theory to experiment

The experimental study of evolution has been recently accelerated through the study of rapidly evolving organisms, such as bacteria, viruses and protozoa (Lenski et al. 1991 Lenski & Travisano 1994 Papadopoulos et al. 1999). These organisms adapt to laboratory conditions on experimentally feasible time scales, making them ideal candidates for the real-time study of evolution. These experiments have generated tremendous interest in evolutionary biology, allowing for experimental tests of some of the most basic features of adaptation.

To date, however, the fixation probability of a specific beneficial mutation has never been experimentally measured. With the advent of serial passaging techniques that allow for experimental designs with very high numbers of replicates (e.g. 96-well plates), we argue that an experimental estimate of the fixation probability is finally within reach. After 80 or 90 years of theory, the possibility of experimental validation is fascinating.

On the other hand, the models developed to date are probably not sufficiently tailored to the life histories of the organisms that could be used in such experiments. Neither bacteria nor viruses are well modelled by discrete, non-overlapping generations, nor by a Poisson distribution of offspring. Recent contributions by Parsons & Quince (2007a,b) and Lambert (2006), as well as work from our own group (Hubbarde et al. 2007 Alexander & Wahl 2008) have highlighted the extreme sensitivity of fixation probabilities to such assumptions.

For experiments involving bacteria, we suggest that theoretical predictions of the fixation probability must be based specifically on bacterial fission. A beneficial mutation might reduce the generation time, for example, or increase the probability that one or both of the daughter cells survive to reproductive maturity. For experiments involving viruses, theoretical predictions must likewise be tailored to include the processes of viral attachment, the eclipse time and then the release of new viral particles through budding or lysis. Other microbial systems will present their own life histories and their own modelling challenges. In addition, population bottlenecks, washout from a chemostat or limited resources must be imposed in experimental systems to prevent unbounded microbial growth.

A final note is that very often, in estimating the fixation probability, it is assumed that selection is weak. This phrase means for example that the selective advantage s is sufficiently small that terms of order s 2 are negligible. This assumption has been widely, and very usefully, employed in population genetics over decades, and is still considered to be relevant to most natural populations. Recent evidence from the experimental evolution of microbial populations, however, has indicated that some beneficial mutations exert extremely high selection pressures, with s of the order of 10 or more (Bull et al. 2000). Thus, a further challenge for theoreticians is to design organism- and protocol-specific models that retain accuracy and tractability, even for very strong selective effects.

The authors are grateful to four anonymous referees, whose comments strengthened this review, and to the Natural Sciences and Engineering Research Council of Canada for funding.

Probability of fixation [ edit | edit source ]

Under conditions of genetic drift alone, every finite set of genes or alleles has a "coalescent point" at which all descendants converge to a single ancestor (i.e. they 'coalesce'). This fact can be used to derive the rate of gene fixation of a neutral allele (that is, one not under any form of selection) for a population of varying size (provided that it is finite and nonzero). Because the effect of natural selection is stipulated to be negligible, the probability at any given time that an allele will ultimately become fixed at its locus is simply its frequency in the population at that time. For example, if a population includes allele A with frequency equal to 20%, and allele a with frequency equal to 80%, there is an 80% chance that after an infinite number of generations a will be fixed at the locus (assuming genetic drift is the only operating evolutionary force).

For a diploid population of size N and neutral mutation rate , the initial frequency of a novel mutation is simply 1/(2N), and the number of new mutations per generation is . Since the fixation rate is the rate of novel neutral mutation multiplied by their probability of fixation, the overall fixation rate is . Thus, the rate of fixation for a mutation not subject to selection is simply the rate of introduction of such mutations.

Is the fixation rate always equal to the mutation rate for neutral alleles? - Biology



Name: Yale University (Yale)
Type: University
Visit Official Website

Please support this content provider by Donating Now


Lecture Description

Mutations are the origin of genetic diversity. Mutations introduce new traits, while selection eliminates most of the reproductively unsuccessful traits. Sexual recombination of alleles can also account for much of the genetic diversity in sexual species. In some instances, population size can affect diversity and rates of evolution and fixation, but in other cases population size does not matter.

Stearns, Stephen C. and Rolf Hoekstra. Evolution: An Introduction, chapter 5

January 26, 2009

Professor Stephen Stearns: Okay, today we're going to talk about the origin and maintenance of genetic variations and this is continuing our discussion of central themes in the mechanisms of microevolution. The reason we're interested in this is that there cannot be a response to natural selection, and there cannot be any history recorded by drift, unless there's genetic variation in the population. So we need to understand where it, where it, comes from, and whether or not it sticks around.

If it happened to be the case that every time a new mutation popped up it was immediately eliminated, either for reasons that were random or selective, evolution couldn't occur. If a lot of variation came into the population, and then persisted for a tremendously long time without any sorting, we would see patterns on the face of the earth that are totally different from what we see today. So these issues are actually central issues in the basic part of evolutionary genetics that makes a difference to evolution.

So the context basically is this. Since evolution is based on genetic change, we need to know where genetic differences come from and the rate of evolution depends on the amount of genetic variation that's available in the population, so we need to know what maintains the variation. If you were to go back fifty, sixty years, which is what we now think of as the classical view--remember the classical view is a moving window in time--at that point it was thought there wasn't very much genetic variation out there and that evolution was actually limited by the rate at which genetic variation was created.

Since 1965, with the discovery of protein isozymes, and especially now, since the discovery of ways to sequence DNA very cheaply, we know that's not true. There is a tremendous amount of genetic variation in Nature, and I'm going to show you some of it this morning. So since about 1975, 1980, due to a series of studies, some of them on the Galapagos finches, some of them on the guppies in Trinidad, some of them on mosquitofish in Hawaii, some of them on the world's fish populations responding to being fished, we know that evolution can be very fast when there's strong selection acting on large populations that have lots of genetic variation.

So really the rate of evolution--and, for example, the issue of climate change and global warming--will all the species on earth be able to adapt fast enough to get--to persist in the face of anthropogenic change on the planet?--that issue is directly addressed by the things we're talking about this morning.

If there isn't enough genetic change to adapt, say, the grassland populations of the world, or things that are living on mountains, to the kinds of climatic changes that they are going to be encountering, and currently are encountering, they'll go extinct. Ei--they have to either move to a place which is like the one they're in, or they have to adapt to the changed conditions that they're encountering.

So the outline of the lecture today basically is this. Mutations are the ultimate origin of all genetic variation. Recombination has a huge impact on variation. So what that means basically is that sexual populations have the potential to be much more variable than asexual populations--there is lots of genetic variation in natural populations. And then we will run through four mechanisms that can maintain variations in single genes, and briefly mention the maintenance of variation in quantitative traits.

So mutations are where these genetic differences come from, and they can be changes in the DNA sequence or changes in the chromosomes, and in the chromosomes they can be changes in how many chromosomes there are in the form of chromosomes or in aspects of chromosome structure. So there can be gene duplications and so forth. Most of the mutations that occur naturally are mutations that are occurring during DNA replication.

For those of you who are thinking of being doctors, this is important because the probability that a cancer will emerge in a tissue is directly proportional to the number of times cells divide in that tissue which is why cancers of epithelial cells are much more common than cancers of cells that do not divide. You never get a cancer in your heart muscle, and you frequently get cancers on your skin, and in your lungs, and in the lining of your gut, and that's because every mitotic event is a potential mutation event.

The kinds of DNA sequence mutations are point mutations there can be duplications, and in the chromosomes as well there can be inversions and transplacements that go on. Genes can be moved around from one chromosome to another. They can actually be turned around so that they are in the opposite reading direction, along the chromosome. All those things are going on.

There's good reason to think that an intermediate mutation rate is optimal. If the mutation rate is too low, then the descendants of that gene cannot adapt to changed conditions. If it's too high, then all the accumulation of information on what has worked in the past will be destroyed by mutation which is what happens to pseudogenes that are not expressed. So some intermediate rate is probably optimal.

Now a gene that controls the mutation rate will evolve much more easily in an asexual organism than in a sexual species because sexual recombination uncouples the gene for the benefits of the process. Let me illustrate that.

Suppose that I am engaged in a process that Greg wants to control, and we've got a certain period of time we can do it in, and so he decides that he's going to do it, with me, on a bus going to New York. We go down to the bus station and, because of recombination, he gets into one bus and I get into another. He loses his opportunity to control me, simply because I am now riding in a different bus.

That's the effect of recombination on genes. Recombination, instead of keeping me on the same chromosome that Greg and I were on, will actually end up putting me into a different body. Okay? So in a sexual organism the gene that's controlling the mutation rate becomes disassociated from the genes whose mutations it might try to control, and therefore even though down in my ride to New York I invent some kind of great process that would benefit Greg, he is now dissociated from it and he doesn't get to benefit from my adaptations.

So it is much more plausible that we will see genes that are controlling mutation rates evolving in organisms like bacteria and viruses than it is that we will see mutations that control mutation rates evolving in us. There is some reason to think that there is weak selection on them, but it's not as strong as it is in bacteria. And in fact, interestingly, in bacteria you can do experimental evolution and show that the mutation rate will evolve up or down, depending on the circumstances that you put the bacteria under.

These are some representative mutation rates, and it's good to have some general framework to think about--how frequent is a mutation? So the per nucleotide mutation rate in RNA is about 10-5 in DNA it's 10-9. So if you start evolving in an RNA world, and you want to lower the mutation rate because your information is getting eroded and you can somehow manage to engineer DNA as your molecule rather than RNA, you can see that you would be able to pick up four orders of magnitude by doing so. That's just because DNA is more stable.

DNA is a remarkably stable molecule. It's possible to recover DNA from fossil bones. Svante Paabo is in the middle of a project to sequence Neanderthal's genome. He's already got significant chunks of Neanderthal sequence. So DNA is just a remarkably stable molecule. The per gene rate of mutation in DNA is about one in a million so this is like per meiosis. The per trait mutation rate is about 10-3 to 10-5. The rate per prokaryotic genome is about 10-3, and per eukaryotic genome it's between .1 and 10.

I once saw a really great talk by a guy named Drake, Frank Drake, from NIH--this was like at a big international meeting--Drake walks up to the blackboard and he writes 10-3 on the blackboard he's going to give a talk about mutation rates in prokaryotes. He talks for 45 minutes about this number no PowerPoints, nothing else, he's just speaking very animatedly about how it was that just about all viruses and bacteria appear to have converged on roughly this per generation mutation rate, per genome, which is pretty strong evidence that it's an optimal rate thousands of species have converged on this rate.

And I asked him how it was that he gave this great talk without any slides, and he said that he had lost them in the airplane, and that had happened about ten times before, and it was such a great talk without the slides that he just switched completely. So a couple of years ago, actually early last year in this course, I tried giving talks without the PowerPoints. Ninety percent of the class didn't like it and it ten percent of the class did. So that's why you're still getting PowerPoints. Okay?

Now what is your mutation rate? Well each of you has about four mutations in you that your--new things, your parents didn't have, and about 1.6 of those are deleterious. So this is something that's always going on. And there are about 100 of us in the room that means there are somewhere around 150 new, deleterious mutations, unique in this generation, sitting here in the classroom.

Where did they happen? Well they happened fifty times more in males than in females. And there are good biological reasons why. There are many more cell divisions between the formation of a zygote and the production of a sperm than there are between the formation of a zygote and the production of an egg. In human development, and in mammal development, egg production pretty much stops in the third month of embryonic development, at which point all the women in this room had about seven million eggs in their ovaries.

Since then oocytic atresia, which means the killing of oocytes, has reduced the number of eggs in your ovaries down by nearly seven million. When you began menstruating you had about 1500 eggs in your ovaries. You've gone from seven million down to 1500. When you were born you had gone from seven million down to one million you'd lost six million of them before you were even born. It appears to be a quality control mechanisms, ensuring that the oocytes that survive are genetically in really good shape.

So there are very, very different kinds of biology affecting the production of eggs and sperm females have a mutation screen that males do not. Well the result of that is that there are more mutations in the sperm of older males they've lived a longer time. Anybody that wants to get in to mate choice and what kinds of reproductive strategies should result from this simple fact is welcome to write a paper on it there's literature out there. Okay? Not very PC, but it's very biological.

Okay, recombination. What does recombination do to this mutational variation that builds up in populations? Suppose we had ten genes, and each of those genes had two alleles, and each of those was on a different chromosome. That would mean that just looking at those ten genes, on those ten chromosomes, we could get 310 different zygotes. Can anybody tell me why?

Professor Stephen Stearns: How many genotypes are there for the first gene? How many different combinations of Aa are there? Three: AA, Aa, aa. So there's three things that the first gene can do. There are three things that the second gene can do. There are three things that the third gene can do. And there are ten genes. So we multiply them to get the number of different combinations, and if they are independently sorting on different chromosomes, that will result in 59,000 different zygotes.

Now if we had a real eukaryotic genome that had free recombination--which we don't have--and unlimited crossing over--which we don't have--then the number of possible zygotes is about 315,000 or 350,000, somewhere along that, that order of magnitude. Well the number of fundamental particles in the universe is only 10131. We're talking about numbers which are just inconceivably large. That means that in the entire course of evolution the number of genetic possibilities that are present, just sitting in you, have never been realized. There is a huge portion of genetic space that remains unexplored, simply because there hasn't been enough time on the planet for that many organisms to have lived.

Now, how--you can see that this would be free recombination with independent assortment of chromosomes. That makes it easier than if it's crossing over, because crossing over happens more frequently the farther genes are apart on a chromosome, and it doesn't happen very often when they're close together. So there's been an evolution of the chromosome number of a lot of species.

And I've previously told you about ascaris. Ascaris is a nematode that lives in the gut of vertebrates. There is an ascaris that lives in dogs, there's an ascaris that lives in us, and it just has one chromosome. So that's kind of one limit, things with one chromosome. There are species that have hundreds of chromosomes. Sugarcane has I think about 110 chromosomes, something like that.

So the chromosome number of the species itself evolves, and it can evolve fairly dynamically. There are actually some populations within a single species that have a different chromosome number than other populations within that species, and when individuals from those two populations meet and mate with each other, the offspring often run into developmental difficulties because of this difference in chromosome number. There is such a, uh, contrast in house mice in Denmark. There's a spot where there's sort of a hybrid zone in Denmark, and the house mice on one side of the hybrid zone have difficulty--uh, they're in the same species, but they just have different chromosome numbers--and they have difficulty dealing with the house mice on the other side of that hybrid zone.

The difference in chromosome numbers appears to have arisen in the house mice during the last glaciation, and they recolonized northern Europe from different places. Some of them came up from Spain. Some of them came up from Greece. They got together in Denmark and they ran into problems.

Okay, now crossing over also generates a lot of genetic diversity. And the amount of crossing over can be adjusted. Inversions will block crossing over. You take a chunk of chromosome and flip it around, so that in the middle of the chromosome the gene sequences are reversed, and in that section of the chromosome the inversion causes mechanical difficulties. It actually changes the shape of the chromosomes when they line up next to each other, and it inhibits crossing over during meiosis.

This is one way of taking a bunch of genes that happen to have really helpful interactions with each other, and locking them up in a combination, so that they don't recombine. That has happened, and it's thought to be important in the evolution of quite a few insects, for example.

Now we can play the mental game of asking ourselves what would happen in a sexual population if we just shut off mutation? We can't actually do it, of course. But how long would it take before we would even notice that evolution had been shut off, if we were just observing the rate at which that population was evolving?

And the answer to that is kind of interesting. We could wave a magic wand over a moderately large sexual population, completely shut off mutation, and the impact of recombination on the standing genetic diversity in that population would create so many new diverse combinations of genes that it would take about 1000 generations before we would even notice that mutation has been shut off.

So think back to the beginning of the lecture. I said mutation is the origin of all genetic diversity and that's true. But once mutation and evolution have been going on for awhile, so much genetic diversity builds up in populations that you can actually shut off mutation and mutation--and evolution will keep going for quite a while. After 1000 generations it'll run out of steam and stop but it takes quite awhile.

Okay, so where genetic came--where genetic variation came from and how much there was, was a huge issue and caused a lot of research and controversy for about fifty years. Before 1965, there was the concept of a wild type out there. After 1965--so there was one really good genome, and then there were a few mutations.

After 1965, with electrophoresis, the impact of Clement Markert's work, and Dick Lewontin, and his colleague Hubby, we've recognized that there's a lot of molecular variation. This concept that each species has a certain genomic type is no longer tenable. There's just a tremendous number of different kinds of genomes out there. Since 1995, we've had a lot of DNA sequence variation and now we've got genomics.

So I want to illustrate the impact of genomics with something that's just become possible in about the last four years. The HapMap Project was done after the human genome was sequenced, and the motivation of it was to try to associate diseases with common genetic variants. By the way, the upshot of that effort is genes don't normally account for very much, usually about two or three percent of the variation but that's another story.

So basically once we had the human genome, it was clear that we could then look for places in genomes that had single nucleotides, that were different, between one person and another these are called single nucleotide polymorphisms. And to do this the HapMap Project looked at regions of the human genome that were about 10,500 kilobytes long, for 269 individuals. So that's 10,500,000 bases, for each of 269 individuals. And they did it on people from Nigeria, Utah, Beijing and Tokyo. And they discovered that our genome is arranged in blocks.

There are, within each block, within each let's say rarely recombining section of DNA, there are about 30 to 70 single nucleotide polymorphisms, and that means that you could design a genechip just to pick up enough of these to tag a person as having that particular block of DNA. Okay? So now there are these genechips, and we've discovered that there are some SNPs that are associated with disease. We can see that there are portions of the genome that show signatures of recent selection. This is an interesting literature.

This is what a little section of our chromosome 19 looks like. Okay? So this is the position along the chromosome, starting at 40,000,000, and going up to 50,000,000 base pairs. The little black dots are all the genes that are in this section of the chromosome, and using the single nucleotide polymorphisms, you can identify people as having a segment of DNA that is not recombining very frequently. And you will notice that they are actually lined up right over places where the recombination rate is pretty high. So you can see breaks in this upper diagram here, showing places where the recombination rate is pretty high.

So remember, this was done over the entire genome, all of our 23 chromosomes. I am only showing you one tiny part of one chromosome here, and there are actually 650,000 of those blocks that have been identified now in our genome.

So three years later a group then goes out and takes 928 people, from 51 populations, and looks at how much haplotype diversity there is. Remember, a haplotype is a block that's got some specific nucleotide polymorphisms on it. The Y axis here has 650,000 entries on it. Of course they all blend together, it's hard to see them. The X axis has 928 people arranged across it. This is a sample of human genetic diversity on the planet. You can see there's quite a bit. You can see different colors. Okay?

Now if you take this and you then use the tools of phylogenetic analysis to ask what kind of historical structure is there in this data set, this is what you get. You get a group in Africa. You can see the emergence of mankind from Africa--this is thought to have happened about 100,000 years ago--and then you get a very, very nice genetic trace of our expansion across the globe.

We paused for awhile in the Middle East, before we broke out. We were in the Middle East up until about 50,000 years ago, and then there was a group that went into Europe, and other groups then split off from that and set off into Asia. And about probably 40,000 years ago people went to Papua New Guinea and Australia, and probably somewhere around between say 15 and 20,000 years ago, a group of people headed off over the Bering Straight for North America, to become Native Americans, and then another group diversified in East Asia. So there is a huge amount of information in the history of genetic variation.

So what I'd now like to do is give you four general reasons why this much genetic variation could be maintained in any population. If you look in the textbook you will see that there is also a tremendous amount of genetic variation in wild populations of practically any species, just as there is in humans. In humans it happens to be better analyzed than in almost any other species. But something like that can be done for any species on earth now, and it's getting cheaper and cheaper and cheaper to do so.

So selection and drift can both explain the maintenance of genetic variation. And for a long time there was a fight within evolutionary genetics about whether what we saw was being explained by selection or drift. It appears not to be a productive question. It's extremely difficult to answer, in any specific case, whether the pattern you see is because of a history of natural selection or because of a history of drift. Both of them are capable of generating quite a few patterns, and those patterns overlap.

So if you take a very specific case and study it in detail, you can give a leading role to selection or to drift. For example, you can find a signature of selection in a portion of a human chromosome, indicating that there's a gene there that perhaps was affected by a specific disease that's been done. But the general answer, for all species across the planet, about whether selection or drift is more important, probably is unrealistic. It's probably not a fruitful research effort to try to answer this question.

So here are the situations that can maintain genetic variation in principle there are four of them. There can be a balance between mutation and drift a balance between mutation and selection there can be heterosis or over-dominance and there can be negative frequency dependence. So I'm going to step through these now and give you some feeling for how the thinking works on each of them. In so doing, we're going to be dealing with equilibria, and really there are other ways of approaching the analysis, but the equilibrium approach is the one that allows you do it with simple algebra, rather than with complicated computer models. We do it for mathematical convenience.

We do it also because the periods during which things are in balance may be pretty long, compared to those in which they're dynamically changing--that does appear to be a message of evolution--but with respect to this particular question of the maintenance of genetic variation, we don't really know too much about those periods. Selection can go back and forth populations can appear to be in stasis when things are going on inside of them. This question is really unresolved.

We do know that in terms of our immune genes that we share certain polymorphisms with chimpanzees. Those appear to have been things that evolved in terms of disease resistance before humans and chimps speciated, about five to six million years ago. So certainly that genetic variation is five to six million years old. We don't have too many cases where we know that, but there may be many more out there, just undiscovered.

A little terminology. The fixation probability of a mutation is the probability that it will spread and be fixed in the population. That's equal to its frequency, at any point in time. The fixation time is how long it takes to become fixed in generations. And I put these ideas up on the board earlier, and I'd like to go back to that, because I'd like to have reference to it in a minute.

So if this is frequency here, it can go from 0 to 1, on the Y axis, and if this is time, over here, this can be many thousands of generations. And the fate of most neutral alleles, when they come into the population, will be to increase in frequency for a little while and then drift out. They have low probability of being fixed because when they first originate they're very rare, and the probability of eventual fixation is just directly equal to their frequency. So in a big population most mutations disappear. But every once in awhile one will drift through, and when it reaches frequency 1.0, it's fixed. Okay?

So the fixation probability is the probability that out of all of the mutations that might arise, most of which drift out, this one will be fixed and that's a small number. And the fixation time, how long it takes to be fixed, is on average how long it takes for this process to occur. So that's the fixation time, and that's an average of many such events. So this picture that you're looking at on the blackboard is really just supposed to be an evocative picture, not some kind of precise, concrete state. Because it's representing many, many different genes, they are occurring at all the different possible places in the genome.

Now for a neutral allele, like the one that I've been sketching there, the fixation rate is just equal to the mutation rate. That doesn't depend on population size. The probability of fixation, as I said, is equal to the current frequency. For a new mutation, one of these guys down here, right at the beginning, that's 1/2N, to be fixed, and 1-1/2N to be lost. That means that most of them are lost. N is the population size. N is a big number.

Because there are 2N copies of the gene in the population, and if mu is mutation rate, that means in each generation there are 2mu new mutations, and for each of them the probability of fixation is 1/2N. So the rate of fixation of new mutations is about 2mu times 1/2N, which is equal to the mutation rate. That's about 10-5 to 10-6 per gene, and that means the molecular clock is ticking once every 100,000 to once every 1,000,000 generations per neutral gene.

The fixation rate doesn't depend on the population size, and that's because the probability that a mutation will occur in a population depends upon how many organisms are there. You can think of all of their genomes out there as being a net spread out to catch mutations--the bigger then net, the more the mutations are in any given generation--and that will just exactly compensate for the fact that it takes them longer to get fixed. The bigger the population, the longer this process takes. But the bigger the population, the more of these are actually moving through to fixation. Those two things exactly compensate. Okay?

In a small population most of them are lost. The few that do reach fixation, reach it rapidly, and in large populations more new mutations are fixed, but each one does it more slowly. Those things compensate, and the fixation rate doesn't depend on population size, if you're looking at the whole genome. The number of differences fixed over the whole genome doesn't depend on the size of the population.

Now there is a technical concept in evolutionary genetics called effective population size, and that is the size of a random mating population, that is not changing in time, whose genetic dynamic would match those of the real one under consideration. And so we know that there are lots of violations of these assumptions. Okay? Populations don't have random mating. They are changing in time ta-da ta-da. How do we take a real population and then transform it into something that's really easy to calculate?

Well, there are methods of doing so. The factors that will have to come into consideration are variation in family size, inbreeding, variation in population size, and variation in the number of each sex that is breeding. And so just to illustrate one of these, to give you some idea of its impact, look at cattle in North America.

There are about 100,000,000 female cattle in North America. They are fertilized by four males, on average, through artificial insemination. So there are four bulls that are inseminating 100,000,000 cows. Genetically speaking, how big is the population? It's just about 16. Okay? So by restricting one sex to a very small number, we have restricted one pathway that the genes can go through to get to the next generation. And by making the male side of it so small, we have biased the probability that a gene will get fixed according to some process like this.

That male side is a really small population. So it completely outweighs the fact that there are 100,000,000 females there. Because if you think about it, every time one of those genes goes through a female and goes into a baby and grows up the next generation, it's going to go back through the male side of the population--right?--as you go through the generations. And these formulas that have been developed give us the opportunity to take that complex situation and make a quick, useful, back of the envelope calculation of how we can expect genetic drift to be going on in cattle in North America. Basically they are a small population.

So that's the basis of a mutation-drift balance. The amount of genetic variation in a population, in a mutation-drift balance, is just a snapshot of the genes that are moving through it. If I were to go back to this diagram, and I were to put more genes into this process, and I were to ask you to go out and take a sample out of a population at any given time, you would take the sample at some time and you would tell me that's how many genes we have, that's how many are moving through. Okay?

Now the second possibility for a mechanism that will maintain genetic variation is a balance between mutation and selection. Mutation brings things into the population. Selection takes them out. So if we had a haploid population, with N individuals, and we have a mutation rate mu, we're getting Nmu new mutations each generation. The key idea is that if there is a mutation selection balance, then the number going in equals the number going out that's what would keep this mechanism balancing the amount of genetic variation in the population.

And so if the mutant individuals have a lower fitness than the non-mutants, and if q is the frequency of the mutants, then selection is taking out NSq mutants per generation. And at equilibrium, with the number coming in equal to the number going out, the number coming in equals the number going out, and that gives us an equilibrium frequency of the mutation rate divided by the selection coefficient. It's a very simple result.

And if you do the same kind of thinking for a diploid population, you get that the equilibrium frequency will be the square root of the mutation rate, divided by selection for recessives, and the same as it is for haploids for dominance. Okay? So there are some examples of this.

There are rare human genetic diseases, such as phenylketonuria--that's the inability to metabolize phenylalanine. It has a frequency of about 1 in 200,000, in Caucasians and Chinese. It is probably in selection mutation balance. It's at low frequency but it's present in a population. People with it suffer a selective disadvantage. It keeps mutating and coming back in, and it keeps getting selected out. The result is balance, okay, and it's pretty rare.

The third mechanism that will maintain selection in natural populations is a balance of selective forces that is, where the heterozygote is better than either homozygote. And there is a classic, famous case, and it's always discussed in this context, and it's interesting that it's the one that's always discussed in this context, and the answer is it's been hard to find more. [Laughs] Okay? That's sickle cell anemia.

Now this is the normal heterozygote which is susceptible to malaria. The heterozygote is resistant to malaria, and the sickle cell homozygote is anemic and sick. And it sets up this kind of relative fitness. And, in fact, if--H here is actually going to be a negative number. Okay? So the fitness of the heterozygote is going to be higher than the fitness of either homozygote. And you can then set--the equilibrium frequency is going to be the one where P prime is equal to p in other words, the frequency in the next generation is just the same as the frequency in this generation.

At what frequency does that happen? Well it happens when these little equations are satisfied. And the interesting thing, when you look at them, is that the selection coefficient has dropped out of them. The equilibrium frequency doesn't depend on the selection pressure, it depends on how frequently the gene is expressed in a heterozygote. So it depends really on the heterozygote advantage.

Now the real situation is more complicated than this. There are several such sickle alleles. They're changing frequency. The equilibrium assumption doesn't really apply out there in Nature, but it does give us a rough rule of thumb for how much to expect, and as soon as people who have sickle cell anemia move out of areas with malaria, it takes quite awhile for that allele to disappear from the population.

The fourth mechanism is a balance of selection forces, so that, for example, for A2, when A2 is 0, it has high fitness here, and as it increases in frequency its fitness drops, according to this equation. Now the frequencies of A1 are just reversed along this axis. A1 is 1.0 here, and it's 0 here. A1 has low frequency--has low fitness when it's at high frequency, and high fitness at low frequency. A2 has high fitness at low frequency low fitness at high frequency. So both of them do better when they are rare. And I think that you can see intuitively from this diagram that at equilibrium they will stop changing when their fitnesses are exactly the same.

Now there are some interesting examples of this sort of thing. One is Ronald Fisher's classical argument on why 50:50 sex ratios are so common why in many populations we see half females and half males. The deviations from that are interesting. This kind of thing happens with evolutionary stable strategies, and those are the solution to many problems within evolutionary game theory. They are also called Nash equilibria, under certain circumstances, and they are important in economics and political science as well.

And the tremendous amount of genetic variation in the immune system is thought to exist for reasons of frequency dependent selection basically pathogen resistant genes gain advantage when they are rare, because when they're common, the pathogens evolve onto them. They are more or less sitting ducks they're a stable evolutionary target.

But as they become more common and more and more pathogens evolve onto them, and those organisms get sicker and sicker, the ones that are rare have an advantage. And then as they start to increase in frequency, the same process occurs the same process, it continues again, and after awhile you've got hundreds of genes, each of which is advantageous at low frequency, and none of which are advantageous at high frequency.

So this is a very important kind of mechanism maintaining genetic variation in natural populations, including our own. If we look at quantitative traits, such as birth weight--here's a classical example. This is for babies born in the United States in the 1950s and 1960s, and this is the percent mortality for babies of different weights. You can see that there's stabilizing selection that's operating to stabilize birth weight right at about 7 pounds, and there's variation around it. And you might wonder, why is there any variation around that? Why don't all babies have the optimal birth weight? It's such an important thing. And there are really two answers to that.

One is that there are evolutionary conflicts of interest between mother and infant, and father and mother, over how much should be invested in the infant, and these lead to some variation. And there's mutation selection balance. So that this is a trait which is probably determined by hundreds of genes, and at each of those genes mutations are coming into the population, and at each of those genes there is a mutation selection balance, and when you add that up, over hundreds of genes, you get quite a range of variation. Of course, some of this variation is also due to developmental effects of the environment variations in the mother's diet and other parts of her physiological condition during pregnancy.

So to summarize. The origin and maintenance of genetic variation are key issues mutations are the origin. Recombination has huge impact. There's a tremendous amount of genetic variation in natural populations. Remember that data from the HapMap Project on us, on humans, and that all of the differences that you have, in single nucleotide polymorphisms, from the person sitting next to you, and how you share them with people who have had a similar history since we came out of Africa.

We can explain the maintenance of this variation by various kinds of mechanisms, principally for balance between mutation and drift, between mutation and selection, and by some kind of balancing selection, either heterosis or frequency dependent selection. And we think that variation in many quantitative traits--human birth weight, human body size, athletic performance, lots of other things--is probably maintained by mutation selection balance, as well as by other factors. So next time I'm going to talk about the role of development in evolution.

Course Index

  1. The Nature of Evolution: Selection, Inheritance, and History
  2. Basic Transmission Genetics
  3. Adaptive Evolution: Natural Selection
  4. Neutral Evolution: Genetic Drift
  5. How Selection Changes the Genetic Composition of Population
  6. The Origin and Maintenance of Genetic Variation
  7. The Importance of Development in Evolution
  8. The Expression of Variation: Reaction Norms
  9. The Evolution of Sex
  10. Genomic Conflict
  11. Life History Evolution
  12. Sex Allocation
  13. Sexual Selection
  14. Species and Speciation
  15. Phylogeny and Systematics
  16. Comparative Methods: Trees, Maps, and Traits
  17. Key Events in Evolution
  18. Major Events in the Geological Theatre
  19. The Fossil Record and Life's History
  20. Coevolution
  21. Evolutionary Medicine
  22. 22. The Impact of Evolutionary Thought on the Social Sciences
  23. The Logic of Science
  24. Climate and the Distribution of Life on Earth
  25. Interactions with the Physical Environment
  26. Population Growth: Density Effects
  27. Interspecific Competition
  28. Ecological Communities
  29. Island Biogeography and Invasive Species
  30. Energy and Matter in Ecosystems
  31. Why So Many Species? The Factors Affecting Biodiversity
  32. Economic Decisions for the Foraging Individual
  33. Evolutionary Game Theory: Fighting and Contests
  34. Mating Systems and Parental Care
  35. Alternative Breeding Strategies
  36. Selfishness and Altruism

Course Description

In this course, Stephen C. Stearns gives 36 video lectures on Evolution, Ecology and Behavior. This course presents the principles of evolution, ecology, and behavior for students beginning their study of biology and of the environment. It discusses major ideas and results in a manner accessible to all Yale College undergraduates. Recent advances have energized these fields with results that have implications well beyond their boundaries: ideas, mechanisms, and processes that should form part of the toolkit of all biologists and educated citizens.

Course Structure:

This Yale College course, taught on campus three times per week for 50 minutes, was recorded for Open Yale Courses in Spring 2009.

Kondrashov, A. S. Contamination of the genomes by very slightly deleterious mutations. Why have we not died 100 times over? J. Theor. Biol. 175, 583–594 (1995).

Crow, J. F. The high spontaneous mutation rate: is it a health risk? Proc. Natl Acad. Sci. USA 94, 8380–8386 (1997).

Kondrashov, A. S. & Crow, J. F. Amolecular approach to estimating the human deleterious mutation rate. Hum. Mutat. 2, 229–234 (1993).

Kimura, M. & Maruyama, T. The mutational load with episatic gene interactions in fitness. Genetics 54, 1337–1351 (1966).

Muller, H. J. Our load of mutations. Am. J. Hum. Genet. 2, 111–176 (1950).

Lande, R. Risk of population extinction from fixation of new deleterious mutations. Evolution 48, 1460–1469 (1994).

Charlesworth, B., Charlesworth, D. & Morgan, M. T. Genetic loads and estimates of mutation rates in highly inbred plant populations. Nature 347, 380–382 (1990).

Simmons, M. J. & Crow, J. F. Mutations affecting fitness in Drosophila populations. Annu. Rev. Genet. 11, 49–78 (1977).

Keightley, P. D. Nature of deleterious mutation load in Drosophila. Genetics 144, 1993–1999 (1996).

Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, (1983)).

Wolfe, K. H., Sharp, P. M. & Li, W. -H. Mutation rates differ among regions of the mammalian genome. Nature 337, 283–285 (1989).

Fields, C., Adams, M. D. & Venter, J. C. How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

Duret, L., Mouchiroud, D. & Gouy, M. HOVERGEN—a database of homologous vertebrate genes. Nucleic Acids Res. 22, 2360–2365 (1994).

Goodman, M. et al. Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Mol. Phylogenet. Evol. 9, 585–598 (1998).

Kumar, S. & Blair Hedges, S. Amolecular timescale for vertebrate evolution. Nature 392, 917–920 (1998).

Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).

Hill, K. & Hurtado, A. M. Ache Life History: The Ecology and Demography of a Foraging People (Aldone de Gruyter, New York, (1996)).

Howell, N. Demography of the Dobe Kung (Academic, New York, (1979)).

Melancon, T. F. Marriage and Reproduction among the Yanomamo Indians of Venezuela.Thesis, Pennsylvania State Univ.(1982).

Nishida, T., Takasaki, H. & Takahata, Y. in The Chimpanzees of the Mahale Mountains (ed. Nishida, T.) 63–97 (Tokyo Univ. Press, Tokyo, (1990)).

Ophir, R. & Graur, D. Patterns and rates of indel evolution in processed pseudogenes from humans and murids. Gene 205, 191–202 (1997).

Li, W. -H. Molecular Evolution (Sinauer, Sunderland, Massachusetts, (1997)).

Ohta, T. Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. J. Mol. Evol. 40, 56–63 (1995).

Wolfe, K. H. & Sharp, P. M. Mammalian gene evolution—nucleotide-sequence divergence between mouse and rat. J. Mol. Evol. 37, 441–456 (1993).

Neel, J. V. et al. Search for mutations altering protein charge and/or function in children of atomic-bomb survivors—final report. Am. J. Hum. Genet. 42, 663–676 (1988).

Mohrenweiser, H. W. & Neel, J. V. Frequency of thermostability variants—estimation of total rare variant frequency in human populations. Proc. Natl Acad. Sci. USA 78, 5729–5733 (1981).

Drake, J. W. et al. Rates of spontaneous mutation. Genetics 148, 1667–1686 (1998).

Thompson, J. D. et al. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24, 4876–4882 (1997).

Ina, Y. Estimation of the transition/transversion ratio. J. Mol. Evolv. 46, 521–533 (1998).

Hammer, M. F. Arecent common ancestry for human Y chromosomes. Nature 378, 376–378 (1995).