Population size and genetic drift - What are the evidences?

Wright-Fisher model

From the Wright-Fisher model of genetic drift, the random sampling of allele from one generation to the next is taken from a binomial distribution with parameters $2N$ and $p$, where $N$ is the population size and $p$ the frequency of an allele of interest.

Strength of Genetic drift

The strength of genetic drift can be measured in terms of two different statistics:

  • variance in allele frequency from one generation to the next
  • loss of heterozygosity per generation

Variance in allele frequency from one generation to the next

From the above, it is relatelively straight forward to show that the variance in allele frequency in the successive generation is

$$ ext{var}left(p' ight) = frac{p(1-p)}{2N}$$

, where $N$ is the population size.

Loss of heterozygosity

Heterozygosity decays by $1-frac{1}{2N}$ every generation

$$H_t = H_{t-1}left(1-frac{1}{2N} ight)$$

, where $H_t$, is the expected heterozygosity at time $t$.


See Gillespie: Population Genetics - A concise guide chapter 2 for more info on these predictions.


Do we have empirical evidence that the decay rate in heterozygosity and/or the variance in allele frequency due to genetic drift follows these predictions?

One size fits all? Direct evidence for the heterogeneity of genetic drift throughout the genome

Effective population size (Ne) is a central parameter in population and conservation genetics. It measures the magnitude of genetic drift, rates of accumulation of inbreeding in a population, and it conditions the efficacy of selection. It is often assumed that a single Ne can account for the evolution of genomes. However, recent work provides indirect evidence for heterogeneity in Ne throughout the genome. We study this by examining genome-wide diversity in the Danish Holstein cattle breed. Using the differences in allele frequencies over a single generation, we directly estimated Ne among autosomes and smaller windows within autosomes. We found statistically significant variation in Ne at both scales. However, no correlation was found between the detected regional variability in Ne, and proxies for the intensity of linked selection (local recombination rate, gene density), or the presence of either past strong selection or current artificial selection on traits of economic value. Our findings call for further caution regarding the wide applicability of the Ne concept for understanding quantitatively processes such as genetic drift and accumulation of consanguinity in both natural and managed populations.

1. Introduction

The effective population size (Ne) measures the magnitude of genetic drift in a population. It determines expected levels of polymorphism, the efficacy of selection and the potential for accumulation of consanguinity in a population. Given its widespread use in population and conservation genetics, it is important to know whether a single Ne can account for the observed patterns of evolution in a population. Theory predicts that selection acting at linked sites can modulate the amount of genetic drift and hence the Ne experienced by a given site. The intensity of linked selection is expected to vary throughout the genome and thereby generate heterogeneity in Ne [1,2].

Empirical evidence for the heterogeneity of Ne has been indirect, relying almost exclusively on joint patterns of polymorphism and divergence [3]. It has been difficult to prove directly that Ne is heterogeneous throughout the genome and that heterogeneity observed in polymorphism is not merely reflecting variation in mutation rates [3,4].

Here, we use a method for directly estimating Ne based on temporal variation in allele frequencies. We test for heterogeneity of estimated Ne by genotyping a total of more than 1000 individuals representing three successive generations in a population from the Danish Holstein cattle breed. We find statistical evidence for substantial variation in Ne throughout the genome. We find that proxies commonly used for the expected intensity of linked selection cannot account for the variation observed.

2. Material and methods

(a) Sampling of individuals and genotyping

We studied the Danish Holstein population [5]. We selected three cohorts of individuals born in 1995, 2000 and 2005 (268, 295 and 579 individuals, respectively), genotyped using the 54 K SNP chip. Marker positions refer to the UMD3.1 assembly of the Bos taurus genome [6]. We only included SNPs with less than 10% missing data and genotyped in all cohorts, leaving a total of 46 268 SNPs. See the electronic supplementary material for further details.

(b) Estimation of Ne from temporal variation in allele frequency

For each autosome, SNPs were grouped in non-overlapping windows containing 100 SNPs (n = 447 windows). We used two Ne estimators based on the temporal variance in allele frequency, which have complementary statistical properties of variance and bias [7,8]. These estimators do not rely on pedigree information and provide direct estimates of Ne over a time interval in each window. We calculated the standard error (s.e.) of the Ne estimates within each window using 10 000 bootstrap samples (see the electronic supplementary material).

(c) Genomic covariates

For each window, we obtained data on the local recombination rate (centiMorgan per megabase), the density of genes (fraction of window in coding regions), the presence of quantitative trait loci (QTL) for three economically important traits selected in the population for the time period considered here (milk production, fat and protein content) and footprints of past selection in the Holstein breed (see the electronic supplementary material).

(d) Statistical analysis

We used linear models with Ne estimated in each window as the dependent variable, and genomic covariates, the chromosome of origin and physical length of each window as explanatory variables. Analyses were carried out in R [9] and are available as the electronic supplementary material. To account for the heterogeneity of standard errors around Ne estimates, models were fitted by weighted least squares using the function lm() and each window in the analysis was weighted by 1/s.e.

3. Results

We estimated Ne over one generation in two time intervals (1995–2000 and 2000–2005) and, unless stated otherwise, results reported here use the time interval 1995–2000 where roughly equal numbers of individuals were available. All results use estimator [8], as the alternative method [7] yielded similar Ne estimates (electronic supplementary material, figure S1).

We found that the estimated Ne of each autosome varied by a factor of almost two (mean: 48, median: 45, min: 36 ± 2.6 on chromosome 25, max: 72 ± 6.4 on chromosome 23, electronic supplementary material, figure S1). We then estimated Ne within 447 autosomal windows of 100 SNPs, spanning on average 5 Mb (range: 3–10 Mb). This revealed considerable heterogeneity between windows in the estimated Ne (median: 50.83, s.d.: 37.4 figure 1a). Although some of the variation observed is due to sampling error, genuine variation remains among windows (p < 0.0001).

Figure 1. Heterogeneity of estimated Ne in genomic windows. (a) Distribution of Ne over one generation (1995–2000) in 447 windows. Histogram: empirical distribution of Ne estimates. Dashed line: distribution expected under homogeneous Ne and incorporating standard errors on estimated Ne (electronic supplementary material, figure S5). Solid line: expected distribution for the estimated Ne under a model accounting for standard errors as above and further assuming lognormally distributed parametric variation in Ne among windows (see the electronic supplementary material). (b) Example of within-chromosome heterogeneity in estimated Ne. Each dot represents the Ne estimated per window. Errors bars indicate 1 s.e. (estimated by bootstrapping).

Genetic diversity is reduced in regions of low recombination rates and/or regions with high gene density [1,10], because they are expected to experience more background selection and are more likely to be affected by neighbouring selective sweeps. Therefore, we tested whether a number of genomic variables used as proxies for linked selection could explain the observed variation in estimated Ne. Although chromosome of origin significantly affected the estimated Ne of a window, neither local recombination rates nor gene density explained the variation in estimated Ne (table 1 electronic supplementary material, figures S2 and S3).

Table 1. Effect of genomic covariates on log (Ne) in autosomal windows.

a Estimates of regression coefficients and associated standard errors are only provided for regressing/continuous explanatory variables.

b Significance was tested using an F-test in a linear model accounting for heterogeneity of variance around Ne estimates.

Episodes of strong natural or artificial selection are expected to affect the evolutionary trajectory of linked regions. Although a single Ne cannot rigorously account for the effect of directional selection on the diversity of neighbouring regions, we expect regions currently affected by a sweep to exhibit reduced estimated Ne. Using information on past selective sweeps in Holstein [11], we found no differences among Ne estimated in windows with no selective sweep (n = 378 windows) or 1, 2 or more than 2 selective sweeps (respectively n= 49, 13 and 7 windows Wilcoxon rank sum test, p = 0.56 electronic supplementary material, figure S4). We found an effect of the presence of past selective sweep but this effect is very small (table 1 electronic supplementary material, table S2). We also used three traits of major economic value in the breed and currently under artificial selection in this population. No difference in estimated Ne among windows containing either no QTL (n = 375 windows) or QTLs for 1, 2 or 3 traits (respectively, n = 64, 7 and 1 windows) was found (Wilcoxon test, bins with QTL versus no QTL, p = 0.49 figure 2). Ne estimated in the vicinity of the 7 QTLs with the largest phenotypic effect was not markedly reduced relative to the remaining windows (p = 0.33 electronic supplementary material, table S1).

Figure 2. Boxplot of log (Ne) among windows with either no QTL or harbouring QTLs coding for 1, 2 or 3 traits under artificial selection.

To guard against heterogeneity spuriously created by physical window size, we estimated Ne in 433 windows spanning 5 Mb (and thus variable number of SNPs). Irrespective of the window type used (fixed number of SNPs versus fixed length), we uncover very similar patterns of variation in estimated Ne and effect of chromosomes (electronic supplementary material, figure S6 and table S2). We still reveal significant heterogeneity among chromosomes (p < 0.0001) and no effect of other covariates.

We also estimated Ne over two generations (1995–2005) and found low correlations with estimates for the same window obtained for one-generation intervals (1995–2000 and 2000–2005). There was a weak and non-significant tendency for chromosomes with the highest Ne estimates in 1995–2000 to have the lowest Ne estimates in 2000–2005 (electronic supplementary material, figure S7).

4. Discussion

We provide direct evidence that the intensity of genetic drift varies throughout the genome. Our finding is robust to the choice of estimator for inferring Ne, potential effect of linkage decreasing effective sample size (electronic supplementary material, figure S1), time interval considered and type of window used (electronic supplementary material, figure S6). Overlapping generations and some non-random mating can bias Ne estimates based on temporal variance [12], but this bias will apply with equal forces throughout the genome and not create heterogeneity per se. Scale of analysis is ultimately limited by the sampling variance around Ne estimates and 100 SNPs per window was the minimum needed to get reliable Ne estimates.

Genome-wide average Ne estimated for the interval 1995–2000 (48) and for individual chromosomes (35–72 electronic supplementary material, figure S1) are within the range of values reported for this breed. Ne was estimated to be 49 for the Danish Holstein population for the period 1993–2003 using rates of inbreeding inferred from the pedigree [5]. Similarly, Ne for the US Holstein has been estimated to be 39 by the same method [13]. The magnitude of the heterogeneity we detected for Ne throughout the genome (figure 1a) was comparable, albeit in the lower range, of what was recently estimated indirectly in 10 species [3].

No effect of either local recombination rate, gene density or the presence of QTLs for traits under artificial selection was detected on estimated Ne (table 1 electronic supplementary material, table S2). Local recombination rates and gene density are commonly used proxies for the long-term effect of linked selection on nucleotide diversity throughout genomes [2,10,14]. We detect no correlation between these variables and the estimated Ne (table 1 electronic supplementary material, table S2). One possibility is that recombination rates and gene density do affect the intensity of selection at linked sites, but at a scale of about 100 kb [14]. The windows we used were typically 50 times larger. Linked selection is also likely to act in an episodic fashion and most regions may not currently be experiencing linked selection. Another possibility is that the effect of linked selection acts cumulatively over timescales larger than the few generations examined here.

Interestingly, we detected no effect of the presence of QTL on estimated Ne over the 1995–2000 or 2000–2005 time intervals, but when estimating Ne over the 1995–2005 interval, we detect a modest effect of the presence of QTLs on Ne estimated in 5-Mb windows (electronic supplementary material, table S2). We expect that cumulative effects of selection on QTLs will be easier to detect in studies using longer time intervals. Summing up, although we present strong evidence for heterogeneity in Ne, the processes underlying this heterogeneity and that could account for the lack of correlation in Ne over successive generations (electronic supplementary material, figure S7) remain unknown.

5. Conclusion

Several studies report pervasive effects of selection throughout the genome of Drosophila [3,10,14]. A review on Ne and its applicability concludes: ‘ (…) no nucleotide in the compact genome of D. melanogaster is evolving entirely free of the effects of selection on its effective population size it will be of great interest to see whether this applies to species with much larger genomes’ [1, p. 203]. Here, we provide direct evidence for extensive variation in Ne in a much less compact genome.

Ne plays a prominent role in conservation genetics to assess the status of populations, predict the rate of accumulation of consanguinity and forecast adverse consequences of inbreeding depression. If the variation in Ne we uncover is typical, caution should be used when interpreting mean values of Ne, as genomic regions can drift and accumulate consanguinity at a much higher rate than would be predicted if Ne was homogeneous (see [15] for a review in the topic).

Pervasive variation in Ne throughout the genome also raises concern for the uncritical use of genome-wide scans for footprints of selection. A popular strategy consists of deriving a null distribution for a test statistic, such as level of subdivision, length of homozygosity tracts, etc., expected under selective neutrality. Genomic regions exhibiting discrepant values for these statistics are then flagged as ‘candidates for selection’. However, null distributions for selective neutrality used so far rely on the implicit assumption that all regions undergo a common Ne. The true null distribution might actually contain substantially more variance than expected, and ignoring such variation will invariably yield inflated rates of false positives.

Data accessibility

Supporting data, metadata and R script are available as the electronic supplementary material.

Authors' contributions

B.J.M. and T.B. designed and wrote the study with input from all co-authors. B.G., R.B. and G.S. obtained SNP and analysed QTL data. B.J.M., P.T. and T.B. analysed SNP data. All authors agree to be held accountable for the content therein and approve the final version of the manuscript.

Competing interests

The authors have no competing interests.


B.J.M. benefitted from a grant from Erasmus-Mundus PhD School ‘EGS-ABG’ and INRA Animal Genetics. The data for QTL are funded by the project ‘Genomic selection—from function to efficient utilization in cattle breeding’ (grant no. 3405-10-0137).

Sex ratio rather than population size affects genetic diversity in Antennaria dioica

C. Rosche, Institute of Biology/Geobotany and Botanical Garden, Am Kirchtor 1, D-06108 Halle, Germany.

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

Department of Chemical Ecology, Bielefeld University, Bielefeld, Germany

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Department of Community Ecology, UFZ Helmholtz Centre for Environmental Research, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Department of Botany and Zoology, Stellenbosch University, Centre for Invasion Biology, Matieland, South Africa

Department of Botany and Zoology, Masaryk University, Brno, Czech Republic

Department of Biological Sciences, University of Alberta, Edmonton, Canada

Senckenberg Biodiversity and Climate Research Centre, Frankfurt (Main), Germany

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

UfU - Independent Institute for Environmental Issues, Berlin, Germany

C. Rosche, Institute of Biology/Geobotany and Botanical Garden, Am Kirchtor 1, D-06108 Halle, Germany.

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

Department of Chemical Ecology, Bielefeld University, Bielefeld, Germany

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Department of Community Ecology, UFZ Helmholtz Centre for Environmental Research, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany

Department of Botany and Zoology, Stellenbosch University, Centre for Invasion Biology, Matieland, South Africa

Department of Botany and Zoology, Masaryk University, Brno, Czech Republic

Department of Biological Sciences, University of Alberta, Edmonton, Canada

Senckenberg Biodiversity and Climate Research Centre, Frankfurt (Main), Germany

Institute of Biology/Geobotany and Botanical Garden, Martin Luther University Halle-Wittenberg, Halle, Germany

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany


Sequence Data.

White pulps were microdissected from spleens and � proviral clones of the V1/V2 region of env � bp long were sequenced from two to five pulps per spleen, as described elsewhere (ref. 37 M.-J.D. and S.W.-H., unpublished work). Sequences were aligned with clustalw (40) and checked by eye. Although in peripheral blood, proviral DNA from peripheral blood monoclear cells (PBMCs) can differ from viral RNA in plasma, presumably because of the 𠇊rchiving” of genetic variation in PBMCs over time (3, 5), this is unlikely to be an issue here, as (M.-J.D. and S.W.-H., unpublished work) have shown that viral RNA and proviral DNA obtained from the same pulp are virtually indistinguishable.

Population Genetic Analysis.

Genetic differentiation between subpopulations was quantified by using estimates of FST, the fraction of the total genetic variation found between subpopulations (41) obtained from an analysis of molecular variance (AMOVA) in arlequin ver. 1.1 (42) assuming a Jukes�ntor (43) model to correct for multiple hits. The significance of departures from the null hypothesis of a random distribution of genetic variation was determined by using 10,000 randomizations of sequences between populations.

Estimates of the number of pairwise differences and the number of mutations for each sample were obtained by using dnasp ver. 3.0 (44). Tajima's D statistic was calculated by using a β distribution with mean 0 and variance 1 to estimate the P value against the values of D expected under a coalescent model of a population of constant size (45). Values of D ≤ 1.84 give a P value of approximately less than 0.05 when the sample size is � (46).

FST and Tajima's D were calculated, correcting for heterogeneity in the substitution rate across sites (47). Rate variation was modeled as a γ distribution with shape parameter 0.3, similar to the values of the shape parameter estimated from the V3 region of env (48).

We tested for recombination within each individual by testing for long regions of sequence that are sufficiently long and/or similar to suggest recombination either between sequences in the sample or their ancestors (“inner” fragments) or between a sequence in the sample and another sequence that was unknown (“outer sequence” fragments), either because the other sequence was not present in the sample or because further mutation had obscured the relationship between the two sequences. Global P values, corrected for multiple comparisons, were obtained by simulation using the program geneconv ver. 1.81 (49). To compare the extent of recombination between sequences within each pulp, the minimum number of recombination events was estimated by analyzing pairs of variable sites by using the method of Hudson and Kaplan (50). Sites at which three or four nucleotides were segregating were not included.

Mathematical models of genetic drift [ edit | edit source ]

Mathematical models of genetic drift can be solved using either branching processes or a diffusion equation describing changes in allele frequency. Ε]

Wright-Fisher model [ edit | edit source ]

Consider a gene with two alleles, A or B. In diploid populations consisting of N individuals there are 2N copies of each gene. An individual can have two copies of the same allele or two different alleles. We can call the frequency of one allele p and the frequency of the other q. The Wright-Fisher model assumes that generations do not overlap. For example, annual plants have exactly one generation per year. Each copy of the gene found in the new generation is drawn independently at random from all copies of the gene in the old generation. The formula to calculate the probability of obtaining k copies of an allele that had frequency p in the last generation is then Ζ]

where the symbol "!" signifies the factorial function. This expression can also be formulated using the binomial coefficient,

Moran model [ edit | edit source ]

The Moran model assumes overlapping generations. At each time step, one individual is chosen to reproduce and one individual is chosen to die. So in each timestep, the number of copies of a given allele can go up by one, go down by one, or can stay the same. This means that the transition matrix is tridiagonal, which means that mathematical solutions are easier for the Moran model than for the Wright-Fisher model. On the other hand, computer simulations are usually easier to perform using the Wright-Fisher model, because fewer time steps need to be calculated. In the Moran model, it takes N timesteps to get through one generation, where N is the effective population size. In the Wright-Fisher model, it takes just one.

In practice, the Moran model and Wright-Fisher model give qualitatively similar results, but genetic drift runs twice as fast in the Moran model.

Other models of drift [ edit | edit source ]

If the variance in the number of offspring is much greater than that given by the binomial distribution assumed by the Wright-Fisher model, then given the same overall speed of genetic drift (the variance effective population size), genetic drift is a less powerful force compared to selection. Η]

Random effects other than sampling error [ edit | edit source ]

Random changes in allele frequencies can also be caused by effects other than sampling error, for example random changes in selection pressure. ⎖]

One important alternative source of stochasticity, perhaps more important than genetic drift, is genetic draft. ⎗] Genetic draft is the effect on a locus by selection on linked loci. The mathematical properties of genetic draft are different from those of genetic drift. ⎘] The direction of the random change in allele frequency is autocorrelated across generations. Ώ]

Installing PopG

Here are instructions for saving, unpacking, and installing PopG from different browsers, and on operating systems. We cover the Chrome, Firefox, Safari, and Internet Explorer browsers on the Windows, Mac OS X, and Linux operating systems.

  1. Click on the link.
  2. A downwards-pointing animated arrow in the lower-left portion of the browser window will move, pointing to a tab there called
  3. will now be found in your Downloads folder.
  4. Click (or if this does not work, double-click) on the file. The Zip archive will be extracted into the Downloads folder. A folder PopG will be created in the Downloads folder.
  5. Move that folder to where you want it to be.
  1. Click on the link.
  2. A dialog box opens and offers to let you Save File. Choose that.
  3. The Zip archive will be in your Downloads folder.
  4. Double-click on it. The archive will be extracted and a folder PopG created in the Downloads folder.
  5. Move that folder to where you want it to be.
  1. Click on the link.
  2. A dialog box opens and offers to let you Save File. Choose that.
  3. The Zip file will be downloaded into the downloads folder and will be automatically extracted. A folder PopG will be created in the Downloads folder.
  4. Move it where you want it to be.
  1. Click on the link.
  2. The Zip file will be downloaded into the downloads folder and will be automatically extracted. A folder PopG will be created in the Downloads folder.
  3. Move it where you want it to be.

A MAC PROBLEM: On Mac OS X systems, when you attempt to extract the Zip archive, or when you attempt to run the Java executable, the system may complain that this is from an unknown developer. That is simply because I did not sign the file with my Apple Developer ID. You should be able to make the operation work by control-clicking on the icon and selecting the option to open the file, using the defaults suggested. Once it successfully gets past this, it will not bother you with this again.

The Java archive

The Java archive file PopG.jar will exist in the folder PopG once you have downloaded and installed PopG. If you have Java installed on your system, you should be able to run the Java program by finding the folder PopG and clicking or double-clicking on the icon of the file PopG.jar

The documentation page

The PopG folder also includes the present documentation web page which you are now reading. This can be read here or you can use the Save As menu item in the File menu of your browser to save a copy of it on your machine. The latest version of this page can be read on the Web using this link.

Older versions of PopG

There are also older executable versions compiled for Windows, Mac OS X, and Linux systems, plus some even older operating systems. These can be fetched from folder old at our PopG site. Most users should not use these older executables, but if you do, you should start by reading the README file in that folder. One of the versions there is version 3.4, which has compiled executables for the three major operating systems available as well as C source code. These may be useful if you do not have Java and cannot install it on your system.

Where are the Android and iOS versions?`

We would like to make versions available for tablets and even phones. Unfortunately, a version of Java that can use the graphics functions does not seem to exist on the Android operating system and the iOS operating system. We would have to rewrite the program separately for each of those. If you know of a way to run our Java executables on either of those operating systems, and get it to work, please let us know how you did that.

Making sure you have Java on your computer

If you have Java installed you can run the PopG program. Generally, Java will be already installed on Mac OS X systems and on Linux systems. If you aren't sure if you have Java installed, you can type java -version in a command window and, it Java exists, it will tell you what the version is. If you get back a blank line, you need to either download Java or append where it is to your search path. On Windows systems and on Mac OS X or Linux systems that do not have Java, you can install a recent version of Java at no cost by using this link: Recent Linux and Mac OS X systems usually have a recent-enough version of Java already installed. Mac OS X systems 10.4 (Leopard) and earlier may not have a recent-enough Java to be able to run PopG. Windows systems do not come with Java already installed, but it can be installed on them from the above web site.

Running the program

To run the PopG Java program you should simply click (or double-click) on the icon of the PopG.jar file (you can also run it from a command window by navigating to where PopG.jar is stored and typing java -jar PopG.jar). The start up screen looks like this:

There are two menus, File and Run , that control PopG. They are in the upper left of the main PopG window.

The Run menu

The Run menu contains five items: Continue w/ , Continue , New Run , Restart , and Display Whole Plot .

The first time it is picked, it looks like:

with all but New Run grayed out. Once you have done your first run, all the selections will be active.

New Run Initially only New Run is available. It brings up the following dialog:

It contains all the parameters that control a PopG run. Note that usually you do not enter a Random Number seed unless you want to do two identical runs. When you are finished editing you can click the OK box to start the run. You can also click Cancel to not start the run and Defaults to reset all the data entry boxes to their default values. Continue w/ This choice continues the run, for the same number of generations as previously entered in the New Run menu. Continue This continues the run, but presents the following dialog:

which allows you to change the number of generations run in the next continuation of the run. Restart This restarts the run with the same parameter values as before. If you want to change some of the parameter values, use New Run instead. Display Whole Plot This plots all generations on the same plot. During a run the plots will normally show only the last group of generations. This shows all generations that have been simulated so far. This is particularly handy when you have finished a simulation and want to print the results of the whole run.

The random number generator

The program uses a random number generator which automatically initializes from your system clock. Thus it should give you a different sequence of random numbers and thus a different result every time you run the program. In the menu for a new run, there is a setting for Random number seed which is set by default to (Autogenerate) , which will initialize from the system clock. You probably won't have any reason to change this, unless you are debugging PopG and want to do the same run, with the same random outcomes, twice. If you do wish to do the same exact run twice, enter a value in place of the (Autogenerate) string and PopG will use that to initialize the random number generator. Assuming you have not modified the calcPopG routine within the Java code, every time you start with that random number you will get exactly the same results.

The File menu

This contains four menu items. They are Save , Print , About and Quit .

The first time it is displayed, it looks like:

with Save and Print grayed out. Once you have done your first run, they will be active.

Save This opens a standard save file dialog and allows you to save the graph as a JPG or PNG file. The default name is PopG with the appropriate extension to match to file format. Print This opens a standard print dialog and allows you to select a printer and print the graph. About Displays the program's copyright notice. Quit This is self-explanatory: the program quits.

Compiling it yourself

Most people will not need to compile the program themselves as the Java Jar package supplied should run on most versions of Java. So you should probably skip this section. But if you wish to modify the functionality of PopG or if you have some unusual Java environment that will not run the supplied Jar file you will need a Java compiler. We repeat: If you just need to run the program, you should run the Jar file that comes in our distribution. You do not need to compile anything (though you may need to install Java).

If you do need to compile the program, you will find a src directory in the downloaded and unzipped folder PopG which you got from our site. Import the file from src into your favorite Java editor (we used Eclipse). You can either execute it directly from there or export a Java Jar from the editor and execute it. does not reference any external libraries, everything it needs is in the JavaSE-1.6 system library. If you are modifying our program, once you have finished doing that you should have no problems creating the Java Jar,

If you cannot do, tell us, since that would be a bug.

Simulating with PopG

This program simulates the evolution of random-mating populations with two alleles, arbitrary fitnesses of the three genotypes, an arbitrary mutation rate, an arbitrary rate of migration between the replicate populations, and finite population size.

The programs simulate simultaneously evolving populations with you specifying the population size, the fitnesses of the three genotypes, the mutation rates in both directions (from A to a and from a to A ), and the initial gene frequency. They also ask for a migration rate among all the populations, which will make their gene frequencies more similar to each other. Much of the time (but not always!) you will want to set this migration rate to zero. In most respects the program is self-explanatory.

Initially there are ten populations. You can set the number of simultaneously-evolving populations to any number from 0 to 1000. The population size for each population can be any number from 1 to 10000. Note that a larger population, a larger number of generations run, and a larger number of populations can lead to longer runs.

When you make a menu selection that causes the program to run, a graph of the gene frequencies of the A allele in each of the populations will be drawn in the window. Here is what the graph looks like when we run with an initial gene frequency of 0.2 and fitnesses of AA, Aa, and aa set to 1.08, 1.04, and 1, with all other parameters in their default values. (Note that if you try this run, there will be different random numbers, so your result will be a bit different).

Note that the window can be resized, and the graph should adjust to this. There is also a blue curve that shows what the gene frequencies would be in an infinite population (one with no genetic drift). If the number of populations being simulated is set to zero, this curve is all you will see. The graph can be printed using the Print option of the File menu, or saved to a Postscript file using the Save option in that menu.

Note that once the plot of the gene frequency curves reaches the right-hand side of the graph, the program prints there the number of populations that fixed for the A allele (ended up with a frequency of 1.0) and the number that lost this allele.


  • Try cases with no mutation, no migration, and all fitnesses 1.0 so that there is no selection. Does genetic drift in a population of size 1000 accomplish roughly the same changes in 1000 generations as genetic drift in a population of size 100 does in 100 generations? By running a largish number of populations, can you check whether the probability that an allele is fixed by genetic drift is equal to its initial frequency in the populations?
  • Try a case with no mutation or migration, with the A allele favored by natural selection (with fitness of the AA genotype set highest and fitness of the aa genotype set lowest). Start with a small frequency of A . Is it always fixed? If one starts with a single copy of the allele, how does the probability that A is fixed compare with the selection coefficient favoring it in the heterozygote (the fraction by which the Aa genotype is higher compared to the fitness of the aa genotype)? Is this fixation probability larger than the one you would get with the same initial frequency but with no selection?
  • Try overdominance ( Aa having the highest fitness). Does the gene frequency converge towards an equilibrium? Why does it vary from this equilibrium frequency? How large do the selection coefficients have to be to cause the gene frequency to stay away from fixation or loss for large amounts of time?
  • Try underdominance ( Aa having the lowest fitness). Is there a starting gene frequency that will result in some populations heading for fixation, and others heading for loss? If you add a small amount of migration, what will happen in the long run? What will happen if instead you add a small amount of mutation in both directions?
  • With migration but no selection or mutation, how much migration is needed to make the gene frequency curves be quite similar to each other? How much is needed to make them all end up at the same gene frequency in the long run? How is that migration rate affected by the population size?
  • With mutation but no migration or selection, how much mutation is needed to cause the gene frequencies to converge to a mutational equilibrium gene frequency? How does this value relate to the population size?
  • If an allele is selected against, can you set up mutation rates that will maintain it at low frequency in the population?


Version 4.0 of PopG, the first Java version, was written by Ben Zawadzki. His enormously effective programming made good use of mentorship and advice from our lab's Java wizard, Jim McGill.

The original version of PopG was written in the 1970s in FORTRAN by Joe Felsenstein. The interactive version then was written in C with much work by Hisashi Horino, Sean Lamont, Bill Alford, Mark Wells, Mike Palczewski, Doug Buxton, Elizabeth Walkup, Ben Zawadzki and Jim McGill. Hisashi and Sean wrote the C version, and the screen graphics for IBM PC and the first part of the Postscript printing system. Bill greatly improved and expanded the Postscript printing and the X windows graphics. Mark Wells did the original Macintosh version. Mike Palczewski greatly improved the Windows, Macintosh and X Windows graphical user interface, and Doug Buxton modified the program to the 3.0 version and prepared the executables for different operating systems. Elizabeth Walkup improved the X windows interaction and prepared version 3.3. Small documentation changes after version 4.0 were made by me.

Population size is not genetic quality

Inbreeding depression is a phenomenon largely taken for granted among evolutionary geneticists ( Conner & Hartl, 2004 ). For conservation biologists, the most important source of inbreeding comes when small population size leads to nonrandom mating among genotypes, either because available mates are related or because drift has reduced heterozygosity. Either process leads to the over-expression of homozygotes throughout the genome. Because many of the resultant homozygotes will express deleterious phenotypes, inbreeding is expected to widely lead to reductions in phenotypic values and fitness for individuals spawned from such unions. Inbreeding itself does not lead to reductions in allelic diversity. Rather, it enables selection to more effectively weed out deleterious recessives, and linkage disequilibrium drags along alleles at linked loci, so that over time we often see declining genetic diversity in inbreeding populations. On the whole, populations with measurable inbreeding are predicted to have lower average fitness than comparable populations with random mating (i.e. ‘inbreeding depression’), because of the expression of deleterious homozygotes rather than secondarily reduced allelic diversity ( Conner & Hartl, 2004 ). Inbreeding level is variable it may be severe, as when populations are so small so as to necessitate mating among close relatives, or may be subtle in any situation that results in a nonrandom association of mates. Thus, even populations of moderate, but not effectively infinite, size may experience some of the deleterious effects of inbreeding.

It seems obvious, then, that inbreeding is not a good thing for most populations (though there are certainly many taxa, especially plants, that make due with high levels of inbreeding). If inbreeding reduces fitness, should not we expect inbreeding to be one of the many factors that hasten the demise of declining populations? As Reed, Nicholas & Stratton (2007a,b) point out, the causal relationship between inbreeding and extinction risk is not clear. In extremely small populations, the very ones expected to be highly inbred, realities of demography and stochasticity are expected to blink out a population before inbreeding can make its mark ( Lande, 1988 ). In somewhat larger populations, density-dependent mortality might lead to situations in which population dynamics are largely unaffected by the level of inbreeding. In such cases, inbreeding might correlate with, but not contribute to, population growth (or decline) rates. It is exactly these kinds of populations – declining, but not perilously small – for which management decisions have a reasonable chance of protecting both the numbers of a species and its genetic potential.

Reed et al. (2007a) set out to test whether genetic diversity and inbreeding have measurable impacts on population dynamics of two species of wolf spiders across a range of population sizes. Using a 3-year dataset that they recently published in Conservation Genetics ( Reed et al., 2007b ), they analyze a new independent variable, population growth, to discern whether inbreeding or genetic quality impacts population dynamics. Their analyses clearly show that habitat quality (as measured by prey capture rate) and population size impacts population growth rate. The effect of population size is most obvious (and significant) in years with low prey capture rate, leading the authors to the conclusion that inbreeding (or ‘genetic quality,’ or ‘genetic diversity’) impacts population dynamics under stressful situations. As a generality, I agree with the expectation that genetic factors will have negative effects on population growth, especially under stressful conditions. However, I found myself questioning whether this impressive dataset and analysis demonstrate such links. Moreover, I think it is critical to distinguish different genetic phenomena, their origins and their expected consequences.

Inbreeding might affect population growth rate through at least two distinct paths. For populations with little history of inbreeding and thus a wealth of deleterious recessives, inbreeding depression is expected to lower mean fitness (for populations with a history of inbreeding, as with many plant breeding systems, past selection would have reduced the number of deleterious recessives so that inbreeding depression is less marked). Inbreeding depression might also result from a reduction of heterozygosity at overdominant loci ( Conner & Hartl, 2004 ). This reduction in mean fitness should lead to declines in the population growth rate unless density-dependent mortality results in compensatory increases in survival rates. This process is different from the effect of reduced genetic diversity, which might be a secondary consequence of inbreeding when combined with selection or drift. The loss of allelic diversity, however, is not necessarily a primary source of inbreeding depression (in fact, the elimination of deleterious recessive alleles through selection should alleviate inbreeding depression). Instead, reduced allelic diversity is expected to limit adaptive evolutionary responses to changing environments. In a stable environment, a well-adapted population should suffer no loss in mean fitness due to loss of allelic diversity, particularly if the lost alleles are deleterious recessives. Of course, stable environments may be a theoretical construct and certainly do not fairly represent the situations that are typically represented by declining populations. In small populations, genetic drift can lead to a loss of genetic diversity through the random fixation of alleles. When the alleles approaching fixation are deleterious, small population size can result in an increase in deleterious homozygotes that act like inbreeding depression ( Conner & Hartl, 2004 ). Only in this sense does genetic diversity lead to inbreeding depression and lower mean fitness. In most scenarios, reduced genetic diversity does not impact mean population fitness, and thereby population dynamics, in the same way that inbreeding depression is expected to. The labels ‘inbreeding depression’ and ‘genetic diversity’ are therefore not interchangeable. ‘Genetic quality’ is a bit more nebulous, but implicitly compares individuals to a local fitness optimum. Inbreeding depression could certainly be said to impact genetic quality, but I do not think the same could be said of genetic diversity, which is inherently a statement about populations.

Reed et al. (2007a,b) use ‘population size’ as a shorthand for all three of these terms, hence, it bears considering whether or not this is appropriate. Effective population size (Ne), the concept of Wright's meant to bridge simple counts of individuals (N) to the population genetic consequences of idealized populations, is fundamental to estimating the impact of inbreeding, drift and selection. Ne is always smaller than a raw count of individuals, and considers factors such as sex ratio and variance in mating success. Reed et al.'s estimates of spider censuses are impressive, but do not constitute estimates of ‘long term effective population size’ as suggested. Nonetheless, correlations between population size and inbreeding are certainly expected, at least at smaller population sizes (the relationship is likely to be non-linear because above some population size inbreeding should be negligible). This relationship is born out in their first paper (2007b), wherein population size is positively correlated with a measure of heterozygosity at microsatellite loci. In the same paper, a positive correlation between an average measure of ‘heritability’ and population size is also reported. Heritability is intended to be a standardized measure of additive genetic variation for a specific trait. It is notoriously specific to environmental conditions (including any source of environmental variance), breeding system and allele frequencies. Averaging heritabilities over a number of traits will make most quantitative geneticists bristle, because the factors influencing each heritability are so varied. How this metric bears on allelic diversity, genetic quality or inbreeding is unclear at best it is a measure of evolutionary potential and so most likely to inform us about impacts on the adaptive prospects of a population.

Taking these measures at face value, it seems that population size is a reasonable correlate of inbreeding or diversity in the wolf spider system. However, as one of the authors points out elsewhere ( Reed, 2007 ), ‘Any declining population sampled should have reduced genetic variation regardless of whether inbreeding depression is a contributing factor in the decline.’ That is, population size, genetic diversity and heterozygosity are all expected to be correlated without a single exclusive path. As with any set of intercorrelated variables, it is difficult to attribute causality to one variable by showing association with another. The results of Reed et al. (2007a) show clear associations between population growth and population size (not Ne) in some years, but do not show significant improvement of models when heterozygosity or ‘average heritability’ are included. It is therefore difficult to conclude from these data whether genetic factors are causally related to population declines. Other factors influence population size and population size has ramifications beyond the genetics. In some instances, it is even possible that small population size is inversely related to genetic diversity and inbreeding, such as when invasions from multiple sources mix genotypes ( Kolbe et al., 2004 ).

There is a conceptually simple way to experimentally test whether inbreeding depression is leading to population declines in the wolf spider system. When drift and inbreeding occur together, different populations should fix for alternative deleterious alleles. Crossing two populations should then lead to heterosis and a gain in mean fitness. This is a classic result from population genetics and is the basis of ‘genetic rescue’ plans (e.g. Madsen et al., 1999 ). If such crosses then lead to increased population growth rates in nature (as in the Madsen study), it would be strong evidence that inbreeding depression contributed to the population declines.

The most important aspect of many real-world conservation problems is not whether inbreeding contributes to population declines, but what the relative impact of demographic, ecological and genetic factors is. Management decisions rarely have the luxury of optimizing all considerations, and often the most important factors must be addressed first. The clearest message from the data of Reed et al. (2007a,b) is that the major impact on population growth of wolf spiders is habitat quality (as measured by prey capture rate). Population size or perhaps heterozygosity sometimes interact or have lesser explanatory effects. This observation suggests that attention to ecological factors might have more dramatic results than attempts to manipulate the genetic makeup of populations (of at least moderate size). The question of whether or not inbreeding adversely effects growth rate in declining populations is an interesting one, and a difficult one to tease apart from other correlated factors. On the practical side, however, it may be less critical to demonstrate because the high degree of correlation between population size and inbreeding means that practices that stem the numerical decline of populations also ameliorate inbreeding.

Gene Flow

Another important evolutionary force is gene flow, or the flow of genes in and out of a population resulting from the migration of individuals or gametes (Figure 3). While some populations are fairly stable, others experience more flux. Many plants, for example, send their seeds far and wide, by wind or in the guts of animals these seeds may introduce genes common in the source population to a new population in which they are rare.

Figure 3. Gene flow can occur when an individual travels from one geographic location to another and joins a different population of the species. In the example shown here, the brown allele is introduced into the green population.

The Fundamentals of Evolution: Darwin and Modern Synthesis

The foundations for the critically important synthesis of Darwinism and genetics were set in the late 1920s and early 1930s by the trio of outstanding theoretical geneticists: Ronald Fisher, Sewall Wright, and J. B. S. Haldane. They applied rigorous mathematics and statistics to develop an idealized description of the evolution of biological populations. The great statistician Fisher apparently was the first to see that, far from damning Darwinism, genetics provided a natural, solid foundation for Darwinian evolution. Fisher summarized his conclusions in the seminal 1930 book The Genetical Theory of Natural Selection (Fisher, 1930), a tome second perhaps only to Darwin's Origin in its importance for evolutionary biology. 5 This was the beginning of a spectacular revival of Darwinism that later became known as Modern Synthesis (a term mostly used in the United States) or neo-Darwinism (in the British and European traditions).

It is neither necessary nor practically feasible to present here the basics of population genetics. 6 However, several generalizations that are germane to the rest of the discussion of today's evolutionary biology can be presented succinctly. Such a summary, even if superficial, is essential here. Basically, the founders of population genetics realized the plain fact that evolution does not affect isolated organisms or abstract species, but rather affects concrete groups of interbreeding individuals, termed populations. The size and structure of the evolving population largely determines the trajectory and outcome of evolution. In particular, Fisher formulated and proved the fundamental theorem of natural selection (commonly known as Fisher's theorem), which states that the intensity of selection (and, hence, the rate of evolution due to selection) is proportional to the magnitude of the standing genetic variation in an evolving population, which, in turn, is proportional to the effective population size.

Box 1-1 gives the basic definitions and equations that determine the effects of mutation and selection on the elimination or fixation of mutant alleles, depending on the effective population size. The qualitative bottom line is that, given the same mutation rate, in a population with a large effective size, selection is intense. In this case, even mutations with a small positive selection coefficient ("slightly" beneficial mutations) quickly come to fixation. On the other hand, mutations with even a small negative selection coefficient (slightly deleterious mutations) are rapidly eliminated. This effect found its rigorous realization in Fisher's theorem.

Box 1-1: The fundamental relationships defining the roles of selection and drift in the evolution of populations

Nearly neutral evolution dominated by drift

Evolution dominated by selection

Mixed regime, with both drift and selection important

Ne: effective population size (typically, substantially less than the number of individuals in a population because not all individuals produce viable offspring)

s: selection coefficient or fitness effect of mutation:

FA, Fa: fitness values of two alleles of a gene

A corollary of Fisher's theorem is that, assuming that natural selection drives all evolution, the mean fitness of a population cannot decrease during evolution (if the population is to survive, that is). This is probably best envisaged using the imagery of a fitness landscape, which was first introduced by another founding father of population genetics, Sewall Wright. When asked by his mentor to present the results of his mathematical analysis of selection in a form accessible to biologists, Wright came up with this extremely lucky image. The appeal and simplicity of the landscape representation of fitness evolution survive to this day and have stimulated numerous subsequent studies that have yielded much more sophisticated and less intuitive theories and versions of fitness landscapes, including multidimensional ones (Gavrilets, 2004). 7 According to Fisher's theorem, a population that evolves by selection only (technically, a population of an infinite size—infinite populations certainly do not actually exist, but this is convenient abstraction routinely used in population genetics) can never move downhill on the fitness landscape (see Figure 1-1). It is easy to realize that a fitness landscape, like a real one, can have many different shapes. Under certain special circumstances, the landscape might be extremely smooth, with a single peak corresponding to the global fitness maximum (sometimes this is poetically called the Mount Fujiyama landscape see Figure 1-1A). More realistically, however, the landscape is rugged, with multiple peaks of different heights separated by valleys (see Figure 1-1B). As formally captured in Fisher's theorem (and much in line with Darwin), a population evolving by selection can move only uphill and so can reach only the local peak, even if its height is much less than the height of the global peak (see Figure 1-1B). According to Darwin and Modern Synthesis, movement across valleys is forbidden because it would involve a downhill component. However, the development of population genetics and its implications for the evolutionary process changed this placid picture because of genetic drift, a key concept in evolutionary biology that Wright also introduced.

Figure 1-1 Fitness landscapes: the Mount Fujiyama landscape with a single (global) fitness peak and a rugged fitness landscape.

As emphasized earlier, Darwin recognized a crucial role of chance in evolution, but that role was limited to one part of the evolutionary process only: the emergence of changes (mutations, in the modern parlance). The rest of evolution was envisaged as a deterministic domain of necessity, with selection fixing advantageous mutations and the rest of mutations being eliminated without any long-term consequence. However, when population dynamics entered the picture, the situation changed dramatically. The founders of quantitative population genetics encapsulated in simple formulas the dependence of the intensity of selection on population size and mutation rate (see Box 1-1 and Figure 1-2). In a large population with a high mutation rate, selection is effective, and even a slightly advantageous mutation is fixed with near certainty (in an infinite population, a mutation with an infinitesimally small positive selection coefficient is fixed deterministically). Wright realized that a small population, especially one with a low mutation rate, is quite different. Here random genetic drift plays a crucial role in evolution through which neutral or even deleterious (but, of course, nonlethal) mutations are often fixed by sheer chance. Clearly, through drift, an evolving population can violate the principle of upward-only movement in the fitness landscape and might slip down (see Figure 1-2). 8 Most of the time, this results in a downward movement and subsequent extinction, but if the valley separating the local peak from another, perhaps taller one is narrow, then crossing the valley and starting a climb to a new, perhaps taller summit becomes possible (see Figure 1-2). The introduction of the notion of drift into the evolutionary narrative is central to my story. Here chance enters the picture at a new level: Although Darwin and his immediate successors saw the role of chance in the emergence of heritable change (mutations), drift introduces chance into the next phase—namely, the fixation of these changes—and takes away some of the responsibility from selection. I explore just how important the role of drift is in different situations during evolution throughout this book.

Figure 1-2 Trajectories on a rugged fitness landscape. The dotted line is an evolutionary trajectory at a high effective population size. The solid line is an evolutionary trajectory at a low effective population size.

John Maynard Smith and, later, John Gillespie developed the theory and computer models to demonstrate the existence of a distinct mode of neutral evolution that is only weakly dependent on the effective population size and that is relevant even in infinite populations with strong selection. This form of neutral fixation of mutations became known as genetic draft and refers to situations in which one or more neutral or even moderately deleterious mutations spread in a population and are eventually fixed because of the linkage with a beneficial mutation: The neutral or deleterious alleles spread by hitchhiking with the linked advantageous allele (Barton, 2000). Some population-genetic data and models seem to suggest that genetic draft is even more important for the evolution in sexual populations than drift. Clearly, genetic draft is caused by combined effects of natural selection and neutral variation at different genomic sites and, unlike drift, can occur even in effectively infinite populations (Gillespie, 2000). Genetic draft may allow even large populations to fix slightly deleterious mutations and, hence, provides them with the potential to cross valleys on the fitness landscape.

Conservation genetics of population bottlenecks: the role of chance, selection, and history

Conservation genetics studies of populations bottlenecks are commonly framed under the detrimental paradigm of inbreeding depression. This conceptual paradigm presupposes a direct and unambiguous relationship between population size, genetic diversity, fitness, and extinction. Here, I review a series of studies that emphasize the role of chance, selection, and history in determining the genetic consequences of population bottlenecks. The variable responses of bottlenecks to fitness, phenotypic variation, and heritable variation emphasize the necessity to explore the relationship between molecular genetic diversity, fitness, adaptive genetic diversity, and extinction beyond the detrimental paradigm of inbreeding depression. Implications for conservation and management are presented as guidelines and testable predictions regarding the potential effects of bottlenecks on population viability and extinction.

This is a preview of subscription content, access via your institution.