When running a simulation in PopG, with parameters:

- Population size: 100
- AA fitness: 1.0
- Aa fitness: 1.0
- aa fitness: 1.0
- Mutation A to a: 1.0E-6
- Mutation a to A: 1.0E-6
- Migration rate between populations: 0.001
- Initial frequency of A: 0.5
- Generations to run: 1000
- Populations evolving simultaneously: 100
- Random number seed: autogenerate

I get a result of 20 populations being fixed and 10 populations being lost. Obviously, as this is a simulation, these results will vary each run; but what I am having trouble understanding is are lost populations just fixation of the recessive a allele?

Yes, 'lost' seems to be referring to fixation of the a allele in this program, where the statements 'fixed' and 'lost' are specifically referring to the A allele. Conceptually, 'fixed' and 'lost' are describing the same thing in this particular case ( since you only have two alleles in the population) - the loss of one allele in a population.

Note that once the plot of the gene frequency curves reaches the right-hand side of the graph, the program prints there the number of populations that

fixed for the A allele(ended up with a frequency of 1.0) and the number thatlost this allele.

(my emphasis, from http://evolution.gs.washington.edu/popgen/popg.html)

**Lost vs fixed population**

I am not sure I understand your question. It seems to be a matter of definition. The term lost and fixed populations sounds quite wrong. I have never used PopG. Is the PopG manual using these terms without defining them?

**Probability of loss and probability of fixation**

Given your parameters, you will clearly expect that, if fixation occurs, then it is as likely that the allele a fixes than the allele A fixes. In other words, the probability of loss and the probability of fixation for a neutral locus are equal if you start at frequency 0.5.

**Did you say recessive allele**

Note that there is no recessive allele at least in terms of fitness (but you probably don't have an explicit phenotype that is afterward link to these fitness values anyway) in your simulation as all genotypes have the same fitness.

**Extra info: $F_{ST}$**

If I understand your parameters correctly, you are simulation an island model with constant migration between any two populations with D=100 populations. In such circumstances, the $F_{ST}$ should be $$F_{ST}=frac{1}{1+4N(m+mu)left(frac{D}{D-1} ight)^2}$$ which is approximatively equal to $frac{1}{1+4Nm}$, where $m$ and $mu$ are the migration and mutation rate respectively.

Because of migration, you wouldn't expect that all population would be fixed for one or another allele at any time point.

## Allelic Richness following Population Founding Events – A Stochastic Modeling Framework Incorporating Gene Flow and Genetic Drift

Affiliations Department of Solar Energy and Environmental Physics, The Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel, Mitrani Department of Desert Ecology, The Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel

Affiliations Department of Biology, Washington University, St. Louis, Missouri, United States of America, Institute of Evolution, and Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel

Affiliation Department of Solar Energy and Environmental Physics, The Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel

Affiliation Mitrani Department of Desert Ecology, The Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben-Gurion, Israel

## Comparison Chart: Genetic Drift Vs. Natural Selection

Genetic drift is a mechanism of evolution in which allele frequencies of a population change over generations due to chance.

It can result in the loss of some alleles (including beneficial ones) and the fixation, or rise to 100% frequency of other alleles.

Alleles are just the different versions of the same gene located on the same locus of a chromosome. And, allele frequency refers to how common an allele is in a population.

Genetic drift shows that evolution (more commonly microevolution) can take place by chance which may be due to natural selection or some other evolutionary mechanism.

Genetic drift is a mechanism of evolution that produces random changes in allele frequencies in a population over time rather than selection-driven.

Genetic drift is more finely seen in small populations within a short period of time over generations, although it can also happen in large populations but takes a long time as the population is large.

Genetic drift alone can’t produce any type of adaptation in an organism because it’s a random occurrence. But when natural selection, mutation, and gene flow also happens along with genetic drift than adaptation definitely occurs over time.

The occurrence of genetic drift can be harmful or beneficial to the organism. It depends on the change in allele frequencies.

If a beneficial allele becomes dominant in a population than it is good, and if a harmful (deadly) allele becomes dominant in a population than it may cause death overtime.

**For example:** In a mice population, the allele frequency of the dominant B allele (black fur) is 40% and the allele frequency of the recessive b allele (brown fur) is 60%. This shows that more brown fur mice are present in the population than the black fur ones. This is proof that the population of mice is evolving due to genetic drift.

## CONCLUSION

The genetic stability of a closed hatchery population depends on the population's N_{e}. As was the case with average inbreeding, genetic drift is inversely related to N_{e}. Consequently, small hatchery populations can cause random changes in gene frequency. The ultimate effect of a small N_{e} is the loss of alleles via genetic drift. Rare alleles will be lost more easily, but common alleles can also be lost. The loss of genetic variance can produce irreversible damage to a population's gene pool. This loss can prevent future improvements via selection, and it can reduce viability in populations that are stocked in lakes and rivers.

## Calculating Relatedness

How can we calculate relatedness in inbred, mixed, or haplodiploid families? The procedure is essentially the same as with regular diploid families. We can trace genes from generation to generation and calculate the probability that they are shared or we can use a graphical technique similar to the one above. However, we can no longer assume that all steps reduce relatedness by a factor of 2 (multiplying by ½). Instead, we must label our family tree with the known relatedness at each step. As you make your path through the tree, write down the relatedness at each step. At the end, multiply all of the r values to obtain the coefficient of relatedness. The four trees below illustrate this for sample unrelated, inbred, mixed, and haplodiploid cases.

**Unrelated.** This tree simply adds the relatedness of ½ between parents and offspring and between siblings when there is no inbreeding.

**Inbred.** A was related to its mate by ½ (siblings) and B was related to its mate by ⅛ (first cousins). No other parents are related.

**Mixed.** In this tree, B and C share only one parent, A, reducing their relatedness to ¼. Similarly, D and E share only one parent, B.

**Haplodiploid.** This tree shows a family of wasps with multiple queens. Only G is male all the others are female. (Non-reproductive females are not shown.)

For example, what is the relatedness between D and G in the Inbred tree? Following the path D-B-C-G, we cross relatednesses of 9 &frasl_{16}, ¾, and ½, giving a relatedness of 27 &frasl_{128} (about 0.211). For comparison, D and G are related by ⅛ (0.125) in the Unrelated tree. In the Mixed tree, how are D and K related? The path D-E-K has ¼ and ½, for a relatedness of ⅛. For more practice, try the problems below,which refer to the family trees above unless otherwise indicated. (For other families, you need to draw customized trees using the parent-offspring and sibling-sibling relatednesses given earlier in this tutorial.)

## The Limits of Theoretical Population Genetics

THE purpose here is to discuss the limits of theoretical population genetics. This 100-year-old field now sits close to the heart of modern biology. Theoretical population genetics is the framework for studies of human history (R eich *et al.* 2002) and the foundation for association studies, which aim to map the genes that cause human disease (J orde 1995). Arguably of more importance, theoretical population genetics underlies our knowledge of within-species variation across the globe and for all kinds of life. In light of its many incarnations and befitting its ties to evolutionary biology, the limits of theoretical population genetics are recognized to be changing over time, with a number of new paths to follow. Stepping into this future, it will be important to develop new approximations that reflect new data and not to let well-accepted models diminish the possibilities.

It is valuable to define this field narrowly. Theoretical population genetics is the mathematical study of the dynamics of genetic variation within species. Its main purpose is to understand the ways in which the forces of mutation, natural selection, random genetic drift, and population structure interact to produce and maintain the complex patterns of genetic variation that are readily observed among individuals within a species. A tremendous amount is known about the workings of organisms in their environments and about interactions among species. Ideally, with constant reference to these facts—the bulk of which are undoubtedly yet to be discovered—theoretical population genetics begins by distilling everything into a workable mathematical model of genetic transmission within a species.

Taking this narrow view precludes the application of theoretical population genetics to studies of long-term evolutionary phenomena. This, instead, is the purview of evolutionary theory. For theoretical population genetics, processes over longer time scales are of interest only insofar as they directly affect observable patterns of variation within species. The focus on current genetic variation came to the fore during the 1970s and 1980s with the development of coalescent theory (K ingman 1982, 2000), or the mathematics of gene genealogies. E wens (1990) reviews this transition from the forward-time approach of classical population genetics to the new, backward-time approach. It can be seen both in classical work (F isher 1922 W right 1931) and in coalescent theory (K ingman 1982 H udson 1983 T ajima 1983), both of which are considered below, that the time frame over which the models of theoretical population genetics apply within a given species is a small multiple of *N*_{total} generations, where *N*_{total} is the total population size, or the count of all the individuals of the species. Looking at gene genealogies in humans, for example, it seems that this means roughly from 10 4 to 10 6 years (H arris and H ey 1999).

This allows us to suppose that the parameters affecting the species that we wish to model have remained relatively constant over time, compared to the situation in evolutionary theory. For purposes of discussion, consider the following simple model which, with embellishments, might serve to describe any species from *Homo sapiens* to *Bacillus subtilis*. The species is divided into *D* subunits, each of size *N*, so that the total population size is *N*_{total} = *ND*. Corresponding to the phenomena listed above, the other parameters of the model are the per-locus, per-generation probability of mutation *u*, the selective advantage or disadvantage, *s*, of some type relative to some other type in the population, and a parameter, *m*, which determines the extent of population structure.

The subunits in the model are used below to represent *D* diploid individuals, so that *N* = 2 is the number of copies of each chromosome within each individual. Note that this departs from the usual notation, in which *N* is the number of diploid individuals. The reason for this departure is to emphasize the similarities between the diploid model and other models of population structure. Thus, the same model is used to represent a population subdivided into *D* local populations, or *demes* (G ilmour and G regor 1939), each containing *N* individual organisms.

Many details have been ignored in this model for the sake of simplicity. For example, mutation is a complex process, which includes various kinds of recombination, and natural selection is similarly not likely to be so simple that a single parameter captures all of its intricacies. In addition, the general term “population structure” encompasses dioecy, ploidy level, age structure, reproductive patterns such as partial selfing, as well as the various forms of geographical structure and dispersal. Finally, as noted above, all parameters are assumed to not change over time. However, with some flexibility in the interpretations of parameters, this model can be used to illustrate the limits of theoretical population genetics.

The ranges of the parameters are restricted by nature. Specifically, *D* and *N* are whole numbers, both of which it is natural to assume are ≥1. The other parameters can vary continuously, but also have natural ranges: 0 ≤ *u* ≤ 1, *s* ≥ −1, and 0 ≤ *m* ≤ 1. The last two require some context. Let *m* be the fraction of each subunit (of which there are *D*) that is replaced by offspring randomly sampled from the entire population each generation. This is the island model of population subdivision and migration introduced by W right (1931), but it can be used to represent other forms of structure as well. Subdivision is at its least when *m* = 1 and is at its most when *m* = 0. Selection is imagined between two types, one with fitness 1 and the other with fitness 1 + *s*, and *s* ≥ −1 precludes negative fitness values. With selection among more than two types, the fitness of one of them is taken to be equal to one and this establishes the relative selection coefficients (values of *s*) of the others.

The current and historical boundaries of theoretical population genetics can be understood with reference to the object of study, which is genetic variation within species, but also in terms of methodology. The ridiculously oversimplified model just described already has five parameters. Even with the restrictions above, there is an enormous five-dimensional space that defines all possible kinds of species under the model: <(*D*, *N*, *u*, *s*, *m*) *D* ≥ 1, *N* ≥ 1, 0 ≤ *u* ≤ 1, *s* ≥ −1, 0 ≤ *m* ≤ 1>. Theoretical population geneticists obtain predictive equations by simplifying such complicated models, again ideally with close attention to the biological relevance of any assumptions made. Formally, this is done by taking mathematical limits. The hope is that by doing so, *i.e.*, by further restricting the ranges of parameters, tractable analytical results or simple approximations to the model can be obtained, which will be both useful and illuminating.

The first limiting result was established independently by H ardy (1908) and W einberg (1908) for the case of two alleles, *A* and *a*, with frequencies *p* and *q* = 1 − *p*, respectively, in a population of diploid, monoecious organisms see C row (1988) for a perspective on this important result. In this case, the subunits in the model represent the organisms (*N* = 2), the population is supposed to be infinite (*D* = ∞), without mutation (*u* = 0) or selection (*s* = 0), and offspring are formed by either random mating or random union of gametes (*m* = 1). Then, the Hardy-Weinberg law states that the frequencies of the genotypes *AA*, *Aa*, and *aa* will be equal to *p 2* , 2*pq*, and *q* 2 after a single generation, regardless of the initial genotype frequencies, and that they will remain in these frequencies forever. P rovine (1971) discusses the important historical role of the Hardy-Weinberg law in evolutionary biology, which was to show that the mechanism of inheritance would not itself cause the variation upon which selection acts to be depleted in a population.

The simplicity of the Hardy-Weinberg law is a consequence of its very stringent assumptions. It exists only in the special case in which the values of all parameters are fixed and given by (*D* = ∞, *N* = 2, *u* = 0, *s* = 0, *m* = 1). F isher (*e.g.*, 1930) and H aldane (*e.g.*, 1932), and a great number of workers who followed their lead were content with the assumption of infinite population size. They sought to establish the dynamics of allele frequencies in an expanded Hardy-Weinberg population that included mutation and selection. As a result, much of classical population genetics takes place in the restricted parameter space where <(*D*, *N*, *u*, *s*, *m*) *D* = ∞, *N* = 2, 0 ≤ *u* ≤ 1, *s* ≥ −1, *m* = 1>. However, the overwhelming majority of results have been derived under the additional assumption that *u* and *s* are small.

Although every population is finite, so that *D* = ∞ can never be true, these classical predictions are valuable because they establish tendencies at work in populations of any size (*e.g.*, the frequency of a favored allele will increase over time). Further, these classical predictions should be close to true if the population is “large enough.” Of course, it is only by considering a finite population that these two statements can be investigated and verified. In addition, some vital phenomena simply cannot be studied using an infinite population model. Questions concerning the fixation or loss of alleles from the population or, more generally, questions about the behavior of alleles in low copy number are outside the boundaries of classical, infinite-population-size theory.

No population is so large that finite size can be ignored as a factor contributing to patterns of genetic variation within a species. For example, in an infinite population with mutation but no selection, every possible allelic type will be present at the frequency determined by the pattern and rate of mutation. However, even a stretch of 100 nucleotides has 4 100 ≈ 10 60 possible alleles, and no population comes even remotely close to being this large. Considered further, the consequences of reproduction in finite populations are rather amazing. First of all, without mutation (and assuming at least some mixing: *m* > 0), all variation will eventually be lost from any population. More subtly, reproduction with any reasonable fidelity, which is assured by universally small rates of mutation (D rake *et al.* 1998), causes identical or related alleles to accumulate in the population even as they are all ultimately ephemeral (W atterson 1976).

Random genetic drift is the term used to describe the stochastic effects of reproduction in a finite population. Historically, the need to incorporate random genetic drift into population genetic models was motivated by observations of J. T. Gulick and others concerning geographic variation within species without apparent selective causes—see P rovine (1986) for a thorough compilation of the history—and by the trenchant argument of H agedoorn and H agedoorn (1921), which demonstrated the need to understand the random effects of reproduction in finite populations. The result was the Wright-Fisher model of random genetic drift.

To be concrete, consider the population model as it was used above to illustrate the Hardy-Weinberg law in an infinite population of diploid organisms, but eliminate the assumption of infinite population size. This leaves <(*D*, *N*, *u*, *s*, *m*) *D* ≥ 1, *N* = 2, 0 ≤ *u* ≤ 1, *s* ≥ 1, *m* = 1> for the parameter space. The total population size is *N*_{total} = *ND* = 2*D* and is finite. The Wright-Fisher model of random genetic drift states that the *D* diploid individuals that form generation *t* + 1 are obtained by randomly sampling pairs of gametes, with replacement, from the adults of generation *t*. Generations are nonoverlapping, so all adults die and are replaced by offspring. If there are currently *i* copies of allele *A* among gametes, then the frequency of allele *A* now is *p* = *i/*(2*D*), and the probability *P _{ij}* that there are

*j*= 0, 1, … , 2

*D*copies of allele

*A*at the beginning of the next generation is given by the familiar binomial distribution with parameters 2

*D*and

*p*=

*i/*(2

*D*).

Fisher used the above model of genetic drift implicitly, in many cases assuming a Poisson distribution of offspring number with the mean equal to one per individual, which is the large-*D* approximation to the above binomial distribution with *i* = 1. Wright used the model explicitly, as a null model for the dynamics of a randomly mating population of finite size. Fisher and Wright showed, among other things, that the rate of loss of heterozygosity in a population is equal to 1*/N*_{total} = 1*/*(2*D*). This illustrates the statement above that the time scale over which theoretical population genetics considers things is a small multiple of *N*_{total} generations.

The Wright-Fisher model of random genetic drift is a discrete time, discrete allele-frequency model. Time is measured in numbers of generations and *P _{ij}* describes changes in the numbers of alleles. This model is surprisingly difficult to analyze, and few exact results are available. Early on, F isher (1922) and W right (1931) considered a continuous time, continuous allele-frequency approximation to the model, which allowed many results of biological interest to be derived. Their results relied on a diffusion approximation to the discrete model (K omolgorov 1931). M al é cot (1944)(1946) used the same ideas and rigorous methods to greatly extend the application of diffusion results in population genetics. F eller (1951) provided the general mathematical framework for these models, and K imura (1955a)(b) obtained the full solution of the time-dependent distribution of allele frequencies in a population.

The transition from the discrete model to the continuous one happens in the limit as the population size tends to infinity, but it relies on very different assumptions about the other parameters than are made in classical deterministic work. This model, which is often called *the* diffusion limit of population genetics, exists in the limit as *D* tends to infinity and assumes that lim_{D}_{→∞} 4*Du* = θ and lim_{D}_{→∞} 4*Ds* = σ are finite. Time is rescaled so that it is measured in units of *N*_{total} = *ND* = 2*D* generations. The continuous model holds in the limit because single generations and single copies of alleles represent, respectively, infinitesimal amounts of time on the new time scale and infinitesimal differences in allele frequency. This is the appropriate diffusion approximation when 1*/D*, *u*, and *s* are all small and do not differ too greatly in magnitude. Finally, the limit is taken with the allele frequency *i*/(2*D*) assumed to be fixed (*i.e.*, constant) in the limit as *D* tends to infinity, which means that for most purposes—but see Bü rger and E wens (1995)—this model is not appropriate when the number of copies of an allele in the population is not large.

Note that the apparent dependence of the parameters *u* and *s* on *D* in the assumptions lim_{D}_{→∞} 4*Du* = θ and lim_{D}_{→∞} 4*Ds* = σ is not a statement about biology. The model does not suppose, for example, that if the population doubled in size, the mutation rate and the selection coefficient would drop by one-half. The standard diffusion limit is simply a mathematical approximation to the behavior of a large population in which the probability of mutation and the selection coefficient(s) are small. Like the classical (*D* = ∞) results, it applies in a particular region of the parameter space, one in which *D* → ∞ but where the effects of random genetic drift are not negligible. Another possible point of confusion is the extra factor of two in the parameters θ and σ relative to the way in which time is rescaled. This practice was inherited from Wright and Fisher, and it simply reflects biologists' great concern for heterozygosity, or polymorphism between a pair of chromosomes.

Nearly all of modern population genetics is based upon this standard diffusion model, although much of the time it is used implicitly. It is the source of the common practice of simplifying expressions obtained from a discrete model by keeping only terms involving *u*, *s*, and 1*/D* and throwing out “small” terms like *u* 2 , *s* 2 , 1*/D* 2 , *u/D*, etc. In this case it is clear why 1*/D*, *u*, and *s* should not differ too greatly in magnitude. For example, if *D* = 10 4 and *u* = 10 −8 , it does not make sense to ignore terms involving 1*/D* 2 but keep terms involving *u*. Technically, the standard diffusion holds for any σ and θ, as long as these remain finite as *D* tends to infinity. Thus, the standard diffusion model can be used to model weak selection and mutation by making σ and θ small and to model strong selection and mutation by making σ and θ large. The risk in doing so is that the error of using these results to approximate the results for a finite population may be large unless *D* is very large (E thier and N orman 1977).

There are more appropriate approximations than the standard diffusion, even other diffusion approximations, if one needs to model populations that fall into other regions of the parameter space (F eller 1951 K arlin and M c G regor 1964). One which is well known and has been fairly well exploited to address questions of fixation probabilities since F isher (1922) and H aldane (1927) is the branching-process approximation for the number of copies of an allele. The counts of an allele can be approximated by a branching process (without reference to the rest of the population) in the limit as *D* tends to infinity for fixed values of *u* and *s* but where the number of copies of the allele is not large. This complements the classical deterministic model, which makes the same assumptions about *D*, *u*, and *s*, but applies only when the counts of alleles are very large. For a recent example, see W ahl and D e H aan (2004).

Another approximation, the Gaussian diffusion, sits between the standard diffusion model and the classical deterministic results. In a somewhat neglected article—but see N agylaki (1990) and G illespie (2001)—N orman (1975) proved that with *s* → 0 and *u* → 0, but *Ds* → ∞ and *Du* → ∞, the trajectories of allele frequencies would tend strongly to the deterministic predictions but with small deviations. Further, these stochastic deviations in allele frequencies tend to zero as *D* becomes much larger than 1*/s* and 1*/u*. Thus, if *D* is very much greater than 1*/s* and 1*/u*, and the latter are large, the deterministic equations are very nearly correct (as long as the number of copies of each allele is large). Such concerns underlie the use of a stochastic treatment of allele frequencies close to zero or one and a deterministic treatment in the interior, for example, by K aplan *et al.* (1989) and G illespie (1991).

Returning to the ubiquity of the standard diffusion approximation, the addition of a single assumption, that the sample size *n* is constant, so that *n/D* → 0, as *D* tends to infinity (roughly: *n* ≪ *D*), yields coalescent theory (K ingman 1982 H udson 1983 T ajima 1983 K rone and N euhauser 1997 N euhauser and K rone 1997). Coalescent theory describes the genetic ancestry of a sample and provides the tools for the analysis of intraspecies molecular data. N ordborg (2001) gives a recent thorough review of this field. K ingman (2000) gives a historical perspective, which includes credit to Gustave Malécot for having the original idea of tracing lineages back to common ancestors see also N agylaki (1989). Applications of coalescent theory to the problems of modern biology abound, from the geographic origin of *Plasmodium falciparum* (J oy *et al.* 2003) and the dynamics of HIV within infected individuals (D rummond *et al.* 2002) to the extent of gene flow between recently separated cichlid species (H ey *et al.* 2004).

The standard diffusion approximation has permeated the field so thoroughly that it shapes the way in which workers think about the genetics of populations. There are positive aspects of this. For example, the parameters θ = 4*Du* and σ = 4*Ds* capture the important and fascinating fact that even very weak mutation and selection can have a strong effect if the population size is large. This illustrates the potentially important role of the population size in setting the time scale of population genetic change. However, in terms of the general model with parameter space <(*D*, *N*, *u*, *s*, *m*) *D* ≥ 1, *N* ≥ 1, 0 ≤ *u* ≤ 1, *s* ≥ −1, 0 ≤ *m* ≤ 1>, the standard diffusion model can be viewed only as a model of weak mutation and weak selection. For selection, the term “strong” would best be reserved for cases in which *s* is either close to −1 or much greater than zero (*e.g.*, *s* = 10), while the typical usage is to say, roughly, that |σ| *>* 10 constitutes strong selection.

It is problematic when conclusions drawn from a special case of a general model become normative statements carried over to other situations. Under the assumptions of the standard diffusion model, in which *D* → ∞ while θ and σ remain fixed, everything depends only on the products *Du* and *Ds*. This limiting result is responsible for the notion that it is impossible to estimate *D* and *u*, for example, separately and that only θ can be estimated. However, this is simply a consequence of the assumptions of the model, which might be expected to break down in cases outside the region of parameter space in which the standard diffusion is appropriate. For example, it breaks down for very large samples in a coalescent model (*n/D* → *x* as *D* → ∞), allowing both *D* and *u* to be estimated (W akeley and T akahashi 2003). While it may be true that there is low power to estimate *D* and *u* separately, questions about this cannot even be posed within the framework of the standard coalescent.

A parallel set of issues arises in the study of structured populations. The simple model adopted here includes W right 's (1931) island model of population subdivision and migration, which he proposed to help explain nonadaptive differences among different subunits of a species—recall the observations of Gulick—and which became part of his shifting balance theory of evolution (P rovine 1986). Wright introduced the diffusion approximation to obtain the equilibrium distribution of allele frequencies on a single island under the assumption of a constant allele frequency among migrants. Thus, in addition to θ = lim_{N}_{→∞} 4*Nu* and σ = lim_{N}_{→∞} 4*Ns*, which Wright defined for the single island of *N* diploid organisms rather than for the total population, the island model has a scaled migration parameter, *M* = lim_{N}_{→∞} 4*Nm*. The parameter *M* captures the notion that small amounts of migration over the time scale of *N* generations can have a very large effect see also N agylaki (1980). As with θ and σ, the relevance of the parameter *M* in the limiting model should not be taken to mean it will be impossible to separately estimate *N* and *m* in other cases—see V italis and C ouvet (2001)—or that the dynamics of every subdivided population depend only on the product *Nm*.

Wright offered two possible justifications for the assumption of constant allele frequency among migrants: (1) that migrants come from an infinitely large, unstructured population, like the one that gave the Hardy-Weinberg law above, or (2) that migrants come from an infinitely large collection of islands, of which the focal island is a single example. This second possibility is easily represented using the present model. It is obtained by assuming that *D* = ∞, so that allele frequencies in the total population remain constant, as they do under the Hardy-Weinberg law described above. The assumptions of diploidy (*N* = 2) and random mating (*m* =1) need to be relaxed so the demes can be of any size (*N* ≥ 1) and receive migrants at any biologically reasonable rate (0 ≤ *m* ≤ 1).

Described in this way, it is helpful to think of Wright's infinite-island model as a classical population genetic model for idealized *N*-ploid organisms (the demes), with complications such as double reduction ignored. Reproduction is a little more complicated than in the classical diploid model—newborn individuals receive a fraction, *m*, of their gametes from the total parental population's pool of gametes and a fraction, 1 − *m*, from a single parent's gamate pool—but these models share many features. It is clear, for example, that the allele frequencies in the total population will remain constant only if there is no selection and no mutation otherwise they should change according to something like the classical deterministic theory. In addition, the infinite-island model suffers the same restrictions as the classical model: questions about stochastic trajectories of allele frequencies in the total population (*e.g.*, the fixation or loss of alleles) cannot be addressed.

By assuming a fixed, finite number of demes, M aruyama (1970), L atter (1973), and others studied the finite-island model and obtained results for fixation probabilities and other properties of the population. Without making any assumptions about the parameters, the finite island model is represented by the general version of the present model, with parameter space <(*D*, *N*, *u*, *s*, *m*) *D* ≥ 1, *N* ≥ 1, 0 ≤ *u* ≤ 1, *s* ≥ −1, 0 ≤ *m* ≤ 1>. There are difficulties in analyzing the finite island model, as there are in the case of the Wright-Fisher model of an unstructured finite population. In fact, the difficulties are greater because subdivision, *i.e.*, when *m* < 1, increases the complexity of the system substantially. Because there are more parameters, there are more choices as to how the parameters might be related or restricted in approximations to the model.

The best known of these limits is the one that underlies the structured coalescent process (N otohara 1990 W ilkinson -H erbots 1998). This is the finite-island model with *N* → ∞ and with parameters scaled as W right (1931) did originally. This model frames most work on populations structured by migration. It is a model of a relatively small number of very large populations connected by limited migration, with weak mutation and, in nearly all cases, no selection. Another limit, which is to the island model what the standard diffusion is to the unstructured model, is the many-demes limit with weak mutation and selection, and any *m >* 0 and *N* ≥ 1 (W akeley 2003). Allele frequencies in the total population change according to the standard diffusion, but on a time scale that depends on *N* and *m*. At the same time, relatively strong migration and drift within demes keeps the collection of demes close to the kind of equilibrium described by W right (1931), which is the analog in this model of Hardy-Weinberg genotype frequencies in the diploid model. It hardly needs stating at this point that neither the finite-*D*, *N* → ∞ diffusion nor this finite-*N*, *D* → ∞ diffusion should be applied or accepted without attention to its restrictions.

Why all this attention to the arcane subject of diffusion theory, which may seem to have peaked with Kimura's work in the 1950s? Possibly the most exciting new direction in theoretical population genetics is the study of a coupled (backward and forward) process that promises to unite diffusion theory and coalescent theory, while fully incorporating natural selection into the latter. This relates population genetic models to bodies of more abstract mathematics, such as the theory of interacting particle systems (L iggett 1985). The approach was introduced into population genetics by D onnelly (1984), developed further by K rone and N euhauser (1997), and can also be seen in D arden *et al.* (1989). Recent articles include D onnelly and K urtz (1999) and B arton *et al.* (2004). The challenge is to develop from this work a set of tools for making inferences from genetic data that can be applied in the way that the standard coalescent is being applied now.

Due to recent developments in biotechnology, the theory and methodology of population genetics are lagging behind the collection of data. The abundance of data now available, and soon to be available, holds the promise that it will finally be possible to infer the current and historical characteristics of populations with a high degree of precision. There is already a huge store of results in the historical literature of theoretical population genetics, which can be mined for present-day aims. However, at least since the introduction of coalescent theory 20 years ago, theoretical population genetics has developed closely in response to newly available data, and now is the time to push the boundaries of the field.

A number of new limits are just over the horizon. For example, high-throughput genotyping techniques have increased sample sizes in two directions: the number of individuals and the number of base pairs per individual. Simplifications of complex models may arise in the limit as the number of individuals sampled tends to infinity or as the length of sequence per sample tends to infinity. If history is any guide, then looking back in a few years it will be apparent how new directions such as these will have shaped the way in which we think about patterns of genetic variation and the processes that conspire to maintain them.

## What is Allele Frequency

The allele frequency is the frequency of the two forms of a particular allele in a population. They are dominant and recessive alleles . Each allele frequency can be calculated by dividing the number of individuals with the allele form by the total number of individuals in the population . Here, the p represents the dominant allele frequency of the population while the q allele represents the recessive allele frequency. Also, the sum of the allele frequencies in a population is equal to 1.

**Figure 2: Inheritance of Dominant and Recessive Alleles**

## Outlook and concluding remarks

Loss-of-function alleles were once often held up as a paragon of deleterious genetic variation. Today a more nuanced appreciation for their potential role in adaptation has emerged. This new paradigm inspires investigations into deeper questions about the causes and consequences of adaptation by genetic loss of function. For example: Do species or populations differ in their capacity to adapt via loss of function, and if so, why? Does the high effective mutation rate of loss-of-function alleles lead to bias in the probabilities of different evolutionary outcomes? What is the contribution of adaptive loss of function to the phenomena of antagonistic pleiotropy and reproductive isolation? How does adaptation by loss of function affect long term evolutionary trajectories of populations and future evolvability? Ongoing technical breakthroughs promise to scale up the study of loss-of-function alleles experiencing positive selection for population genomic research to address these questions. More broadly, these lines of research provide paths toward advancing tools and concepts that facilitate a continued synthesis between functional molecular genomics and classic population genetic theory.

## Mechanisms of Evolution

**Natural selection, genetic drift, and gene flow can cause allele frequencies in a population to change over time.**

**Phenotype**: Observable characteristics that are determined by the genotype.

Individuals differ from one another in part because they have different alleles for genes.

Different alleles arise by **mutation** : Change in DNA.

Mutations can result from copying errors during cell division, mechanical damage, exposure to chemicals (mutagens) or high-energy radiation.

**Formation of new alleles is critical to evolution.**

If mutation did not produce new alleles, all members of a population would have **identical** genotypes and evolution could not occur.

**Recombination** also produces different genotypes within a population.

Offspring have combinations of alleles that differ from those of their parents. However **recombination** and **mutation** are not responsible for the short term evolutionary changes that we observe

### Three types of natural selection:

**Directional selection**: Individuals at**one phenotypic extreme**(e.g., large size) are favored.

Example: Drought favored large beak size in medium ground finches.

**Stabilizing selection**: Individuals with**an intermediate phenotype**are favored.

Example: Parasitic wasps select for small gall size of *Eurosta* flies while birds select for large gall size.

**Disruptive selection**: Individuals**at both phenotypic extremes**are favored.

Example: African seedcrackers (birds) have two food sources—hard seeds that large beaks are needed to crack, and smaller, softer seeds that smaller beaks are more suited to.

Natural selection can result in populations in which all individuals have the favored allele: **100%** and the allele is **fixed.**

**Genetic drift** occurs when chance events determine which alleles are passed to the next generation. It is significant only for **small populations.**

**Genetic drift has four effects on small populations:**

**1**. It acts by chance alone, thus causing allele frequencies to fluctuate at random. Some may disappear, other may reach 100% frequency (**fixation**).

**2**. Because some alleles are lost, genetic variation of the population is reduced.

**3**. Frequency of harmful alleles can increase, if the alleles have only mildly deleterious effects.

**4**. Differences between separate populations of the same species can increase.

2. and 3. can have dire consequences.

****Loss of genetic variation reduces the ability of the population to respond to changing environmental conditions.*

*****Increase of harmful alleles can reduce survival and reproduction. These effects are important for species that are near extinction.*

**Gene flow**: Alleles move between populations via movement of individuals or gametes.

## Running the Gamut

Multiple genes and, therefore, multiple alleles, affect continuous, or quantitative, traits. Because they are caused by more than one gene, they are also referred to as polygenic traits. Gene expression is complex, and continuous traits can also be influenced by environmental factors. Continuous traits are common in humans, who show a wide range of possibilities in characteristics such as height, skin color, learning ability and blood pressure. These traits are frequently seen in agriculture, as well. Combination of genetics and environment is readily apparent in crop yield, percentage of fat in animals, animal weight gain, and resistance to certain diseases.