# Understanding ancestry testing mathematically

Forgive me if this question has been asked here before, because it is something which should be very easy to find, but I can't seem to find an answer no matter where I search.

The question is simply where to learn the mathematics that goes into things like popular ancestry tests, and also more academic things like determining ancestry components of historical groups (e.g. usage of the terms Ancient North Eurasian, Eastern European Hunter-Gatherers and all that).

It is clear that if someone gets an ancestry test saying say 32% Scandinavian, then of course that doesn't mean 32% of their base pairs have a convenient "Scandinavian" label attached to them, rather there is some statistical inference going on behind these percentages, and I would like to understand that.

Suppose I have the raw data of my own fully sequenced genome, and also a database of the genomes of many individuals from various populations (of course grouping them into populations is already something that involves some assumptions that I would like to learn more about). Where would I learn how to analyze that myself to produce something like the results of an ancestry test? Is there a textbook someone could recommend that gets into the actual algorithms used?

I'll give here a simple, non-technical answer because I'm assuming you don't need to actually perform an analysis of ancestry.

So, detecting ancestry is a non-trivial task. Given your genome sequence, you would need to compare some "informative" regions of the genome with the homologous sequences of some population (say, of a database with other genomes). These informative regions are usually some parts of the genome that vary across individuals (variation is used because differences are informative: some populations vary on particular sites, distinct from other populations). At the core, this is a question on how to compare "character strings" (DNA is composed of 4 characters, namely, A, T, C, and G). But these strings exist in a complicated structure: a human genome is partitioned in 23 pairs of chromosomes, within each individual. However, the question of ancestry is not about individual sequences really, but about population-level changes in DNA composition. So, in fact, you need to consider population-level factors: size of the populations to consider, the rates of recombination (DNA exchange across chromosomic pairs), mutation rates, and even population structure (people move geographically!).

Given these (and, many other) factors, people build models of "coalescence": given a sequence of interest, how likely is that it shares some ancestor with another query sequence? So, the models try to relate these two sequences (say, the one you are interested), with a query (say, a "consensus" Scandinavian sequence), and then make a model of a 3rd sequence (the ancestor!). This process is repeated to test many hypothesis, so you end up with many probabilities. On top of this, you can estimate the ancestry for any given part of your genome, and this is what most companies do (say, 23 and me).

In summary, you are correct that a "% Scandinavian" does not mean sequence similarity per se. It implies an estimation of common ancestry. This estimation comes from models of shared ancestry. If you're unsatisfied with this simple answer, and want a more technical answer, I recommend reading this paper. An intermediate-level explanation is found here.

what is ancestry?

I think some of the confusion in ancestry proportions stems from the fact that 'ancestry' can really mean several different things in different contexts. I would really encourage people to read this short piece, as it does a good job of clarifying what the different terms mean https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008624

What something like 23andme is trying to do is to infer how much of your genome most closely matches the genome of a particular pre-specified reference population. This is relatively easy to interpret if one parent if from Nigeria and the other is from Norway, since it will roughly be 50% Norwegian and 50% Nigeria, but becomes more complex to interpret when someone is 70% Norwegian, 20% Danish and 10% Swedish, for example.

summary of the 23andme process

Since you specifically asked about 23andme, I will try and give you a relatively simple explanation of how they end with the numbers that you might have got on the report.

What 23andme are trying to achieve is essentially a 'classification' problem - they have a genome from an individual of unknown ancestry (call this the target individual), and they would like to describe it in terms of some pre-defined set of different ancestries (like 'Norwegian' or 'Nigerian'). To achieve this, they first need to assemble some kind of reference dataset of individuals of whom they know in advance to be of a certain country. For this, they will choose individuals who they know have e.g. all of their grandparents and great grandparents from Norway - so they genomes of these individuals will be reflective of general Norwegian ancestry.

They use something called Support vector machine Learning, which is just a fancy computational method, to 'learn' what segments of Norwegian ancestry look like. This is analagous to the AI algorithms which are able to tell the difference between an image of a cat and a dog. If you 'train' the AI with enough labelled examples, it can accurately classify new images. In the same way, if you train the SVM algorithm with enough examples of what Norwegian or Nigerian DNA segments look like, it can classify the probability that a new segment comes from a particular reference population.

They then take the genome of target individuals and split it up into chunks along the genome (something like 100 chunks per chromosomes). They then apply the SVM algorithm to calculate the probability that a particular chunk comes from a particular reference population. So for example, there might be a certain chunk of the genome which has a 70% chance of coming from the Norwegian reference population and a 30% chance from come from a Danish reference population. If the individual is admixed, the the next window may have a 90% chance of coming from the Nigerian reference population and a 10% chance of coming from the Cameroonian reference population.

They will then go across the genome and look for windows of 'high-confidence', where the probability of the window coming from a particular reference population may be higher than say 90%. If you add up the high confidence windows across the genome for each reference population, you will end up with your overall ancestry proportion for that population.

other ways of inferring ancestry

There are many, many other ways of inferring ancestry proportions/components. For example, Principle Component Analysis, ADMIXTURE analysis, clustering methods, all of which have different strengths and weaknesses.

Our free chancing engine takes into consideration your SAT score, in addition to other profile factors, such as GPA and extracurriculars. Create a free account to discover your chances at hundreds of different schools.

If you’re a high school student with an interest in the sciences, you might be considering taking one or more of the SAT’s subject tests in that field. These tests, formerly known as SAT IIs, allow you to show off your knowledge on topics more specific than those covered by the regular SAT.

The SAT subject test in biology has two variations, E and M E stands for ecological biology, while M stands for molecular biology. In this post, we’ll go over the distinction between the two variation, what topics each variation covers, and how to decide which test would be better for you personally to take.

## What Is Genetic Testing?

Before getting into the pros and cons of genetic testing, it may be helpful to explain exactly what genetic testing is and how it is applied.

All human beings have unique sequences of DNA, a chemical database that provides instructions for how the body functions. Genetic testing involves analysis of this DNA, which can reveal any mutations (changes to that chemical database) that may suggest a higher risk of illness, deformity or disease.

The gathering of genetic material for testing is typically accomplished through:

• Blood testing
• A cheek swab
• Amniocentesis (for pregnant women involves the use of a thin needle inserted into the uterus to collect fluid)

While most genetic testing is conducted in a hospital or other medical practice, services such as 23andMe allow participants to collect their own cheek swabs and mail them in for testing.

### Pregnancy

There are a number of applications for genetic testing, including prenatal testing. Pregnant women may have their baby’s DNA analyzed for abnormalities, allowing for early detection of conditions such as Down Syndrome. Prenatal testing is done either by looking for blood markers or through amniocentesis. While the latter is more precise, it also entails some risk to the baby. These tests are optional, though physicians may recommend them in cases where risk factors, such as advanced maternal age, are evident.

It is worth noting that all 50 states require basic genetic screening for newborn babies, which allows providers to evaluate for conditions such as sickle cell disease or hypothyroidism. Through early detection, treatment can commence as quickly as possible. Newborn screening is the most common form of genetic testing in the United States. These tests involve a simple “heel prick” to obtain a blood sample, and don’t offer any risk to the baby.

For those interested in learning more about genetic testing and pregnancy, some resources include:

Obstetriciansand Gynecologists, “Prenatal Genetic Screening Tests.” This guide provides a helpful, medical perspective on prenatal tests. Parents will find some useful guidelines here about prenatal genetic testing. WebMD reports on different types of prenatal testing available to pregnant women.

### Treating Health Issues

Genetic testing is also helpful in the treatment of different diseases. For example, the field known as pharmacogenetics helps doctors evaluate a patient’s DNA to assess which medication or clinical intervention will have the greatest effect and the lowest risk. This is an important method where medical treatment is highly personalized to the patient.

Sometimes, genetic testing reveals a medical complication “accidentally.” A Science News article highlights a woman who used a genetic testing service to learn more about her family history. The testing revealed an amino acid in her DNA that put her at high risk for breast cancer. This prompted her to see her doctor, who spotted a small indication of cancer and immediately started treatment.

Some additional resources regarding genetic testing and disease treatment include:

This news report shows one example of genetic testing and its use in disease screening. This article outlines some considerations for using genetic testing to screen for cancer.

### Discovering Bloodlines and Heritage

Genetic testing can also be used to trace family history or learn more about ethnic background. This is called genetic ancestry testing. According to Genetics Home Reference, this “is a way for people interested in family history (genealogy) to go beyond what they can learn from relatives or from historical documentation.”

More specifically, this testing examines DNA variations to provide indications of where a person’s ancestors came from and how different family lines intermingled in past generations. Such testing provides a precise indication of a person’s ethnic composition e.g., it can tell you that you’re 50% Irish, 20% German, etc.

Ancestry tests are taken either in a medical practice or at home the latter is an incredibly popular option. As of 2019, more than 26 million people had taken home ancestry tests, according to a report from MIT Technology Review.

Genealogy” This article offers helpful context on using genetic testing for family research. This article reviews and compares some of the most widely available genetic testing services, including 23andMe.

### Food and Agriculture

Genetic testing also plays an important role in food and agriculture. Through DNA evaluation of different strands of plant life, agriculturalists can determine which seeds will produce the healthiest yields. Genetic testing helps produce crops that are maximally resistant to disease, pests and the effects of the climate. The result is more efficient planting for farmers, along with better quality food for consumers.

Similarly, genetic testing is used in selective animal breeding, which enables farmers to produce the healthiest and most resilient livestock.

Additional information about genetic testing and its use in agriculture can be found here:

Learn in-depth information about the role of genetic testing in farming and food manufacturing. An in-depth look at how biotech impacts the food production industry.

## What I actually learned about my family after trying 5 DNA ancestry tests

FAMILY FINDER Science News reporter Tina Hesman Saey tried out several consumer genetic testing companies to learn more about her ancestry.

Commercials abound for DNA testing services that will help you learn where your ancestors came from or connect you with relatives. I’ve been interested in my family history for a long time. I knew basically where our roots were: the British Isles, Germany and Hungary. But the ads tempted me to dive deeper.

Previous experience taught me that different genetic testing companies can yield different results (SN: 5/26/18, p. 28). And I knew that a company can match people only to relatives in its customer base, so if I wanted to find as many relatives as possible, I would need to use multiple companies. I sent my DNA to Living DNA, Family Tree DNA, 23andMe and AncestryDNA. I also bought the National Geographic Geno 2.0 app through the company Helix. Helix read, or sequenced, my DNA, then sent the data to National Geographic to analyze.

#### Genetic testing goes mainstream

This story is part of a series on consumer genetic testing. See the whole series.

These companies analyze hundreds of thousands of natural DNA spelling variations called single nucleotide polymorphisms, or SNPs. To estimate ethnic makeup, a company compares your overall SNP pattern with those of people from around the world. SNP matches also help companies see who in their database you’re related to.

Some of the companies also analyze a person’s Y chromosome or mitochondrial DNA. Y chromosome DNA traces a man’s paternal line. In contrast, mitochondrial DNA traces maternal heritage, since people inherit mitochondria, which generate energy for cells, only from their mothers. Neither type of DNA changes that much over time, so those tests usually can’t tell you much about recent ancestors.

Once I sent in DNA samples, my Web-based results arrived in just a few weeks. But my user experience, and results, were quite different for each company.

#### National Geographic Geno 2.0

MOTHERLINE National Geographic Geno 2.0 shows how a customer’s maternal line migrated and changed across time, as determined by analyses of mitochondrial DNA. A heat map indicates where members of your maternal line are most prevalent. National Geographic Geno 2.0

At \$199.95, National Geographic’s test is the most expensive, yet the least useful. The results are generic, and the ethnicity categories are overly broad. My results say that 45 percent of my heritage came from people living in southwestern Europe 500 to 10,000 years ago. That doesn’t tell me much and doesn’t reflect what I know of my family history.

There’s no relative matching, though Geno 2.0 shows which historical “geniuses” may have shared your mitochondrial or Y chromosome DNA. I don’t know how National Geographic knows about the mitochondria of Petrarch, Copernicus or Abraham Lincoln. So I’m skeptical that I am actually related to those famous figures, even from the distance of 65,000 years, the last time we supposedly had an ancestor in common. The service also calculated the percentage of Neandertal ancestry that I carry. I take geeky pride that 1.5 percent of my DNA comes from Neandertals, topping the 1.3 percent average for Geno 2.0 customers.

Overall, Geno 2.0 has a nice presentation, but I learned more about my family history elsewhere. Since I bought the Geno 2.0 kit as an app through Helix, I don’t know if the kit purchased directly from National Geographic, which is processed by Family Tree DNA, would yield different results.

#### Living DNA

ENGLISH ANCESTRY Living DNA offers fine-scale ethnicity estimates for people of British or Irish descent (Saey’s results shown). The company is less certain about subregional estimates than it is about global estimates. Living DNA

Another expensive test (\$159) came from Living DNA. When I saw the company’s ad claiming to pinpoint exactly where in the British Isles a person’s genetic roots stem from, I decided to give it a go. The company highlights ethnicity on a world map, then lets you zoom in from the continent level. I found that 22.5 percent of my heritage came from Lincolnshire in east-central England. I haven’t yet traced any ancestors to Lincolnshire, but I did find through much genealogical sleuthing that one of my sixth-great-grandfathers came from Aberdeen, Scotland. Living DNA says that 3.1 percent of my DNA is from Aberdeenshire. Written narratives on the website provide a history of each reported region.

Using mitochondrial DNA and, if applicable, Y chromosome DNA, the company can trace your maternal and paternal lines back to human origins in Africa and show where and when your particular line probably branched off the original. My “motherline” probably arose in the Near East 19,000 to 26,000 years ago, Living DNA claims, and my ancestors were some of the first people to enter Europe. In February, the company announced that it would soon launch a relative-matching service for its customers.

I’m not sure the service would be worth the price tag for people whose ancestry doesn’t contain a strong British or Irish tilt, though Living DNA says it is working to improve ethnicity estimates in Germany and elsewhere.

#### Family Tree DNA

ANCIENT DNA MATCHES Family Tree DNA is the only company Saey tried that compares customers’ DNA with that of ancient modern humans. But based on the results’ presentation, it wasn’t clear whether Saey actually shares DNA with the ancient people indicated on the map (shown). Family Tree DNA

The most no-frills of the bunch is Family Tree DNA. For \$79, “autosomal” testing looks for genetic variants on all of the chromosomes except the X and Y sex chromosomes. Y chromosome and mitochondrial DNA analysis costs extra.

Family Tree DNA allows a user to build a family tree, incorporating personal DNA tests and matches from the site’s relative-matching section. I found more than 2,400 potential relatives. A chromosome viewer lets me see exactly which bit of DNA I have in common with any particular relative, or with up to five relatives at a time. That feature also allows users to trace how they inherited DNA from a shared ancestor. But I found this tool difficult to use.

The website offers little explanation of results. For instance, I was excited to see that my DNA was compared with that of ancient Europeans, including Ötzi the Iceman, who lived 5,300 years ago (SN: 9/17/16, p. 9). Family Tree DNA is the only company I tried that incorporates ancient DNA into its results and that feature was what convinced me try this company. I did get a breakdown of how different groups — Stone Age hunter-gatherers, early farmers and “Metal Age Invaders” from the Eurasian steppes — contributed to my DNA. But when I saw Ötzi’s dot on my ancestry map, it wasn’t clear if that meant we share DNA or if the map was merely showing where he lived.

#### 23andMe

MELTING POT 23andMe color-codes parts of chromosomes according to the ethnic group that contributed the DNA. For instance, Saey inherited from her dad a bit of DNA on chromosome 15 that carries western Asian and northern African heritage (purple). 23andMe

23andMe (\$99) offers one of the more complete packages of information. Most companies show a map of ethnic heritage. 23andMe does, too, but also presents an interactive diagram of all of a person’s chromosomes, indicating which portions carry a particular ethnic ancestry. Because my parents also did 23andMe, I learned that my dad handed me a tiny bit of chromosome 15 that carries western Asian and northern African heritage. My mom gave me the 0.3 percent of my DNA that comes from the Balkans, in a single chunk on chromosome 7, which makes sense since her grandparents came from Hungary. Playing with the chromosomes is fun. But I question the accuracy of these results (see my related article for more on why ancestry tests may miss the mark).

23andMe presents Neandertal heritage in terms of the number of genetic variants you carry. A family-and-friends scoreboard shows where you stack up. (I top my leaderboard with 296 Neandertal variants, more than what 80 percent of 23andMe customers have.) The report also explains what some of those Neandertal variants do, including ones linked to back hair, straight hair, height and whether you’re likely to sneeze after eating dark chocolate. The company doesn’t test for all possible Neandertal variants, including ones that have been linked to health (SN Online: 10/10/17 SN: 3/5/16, p. 18).

Like Geno 2.0, 23andMe uses mitochondrial and Y chromosome DNA to trace the migration patterns of a person’s ancestors, from Africa to the present day.

Relative matching is both interesting and frustrating. I could see the people I match, how we might be related and compare our chromosomes. But 23andMe doesn’t provide a way to build family trees to further explore these relationships.

#### AncestryDNA

OLD AND NEW ROOTS AncestryDNA shows ethnic groups in the Old World that contribute to your history. For some people, the map (orange blob) may indicate your ancestors settled in the United States early in American history. AncestryDNA

AncestryDNA (\$99) doesn’t give the variety of information other companies do. But it has useful genealogical tools, provided you link your results to a family tree that you can build with help from historical records via a paid subscription to Ancestry.com.

One interesting feature of my heritage report was that it went beyond spots on the map in Europe to also show a region of the United States called “Northeastern States Settlers.” A match to that category tells me that my ancestors who came from Europe probably initially settled in New England or around the Great Lakes. They did. One branch of my family tree set roots in Massachusetts in the 1640s. Using birth, death and immigrant records from Ancestry.com, I could build a timeline to show when and from where individual ancestors immigrated to the United States.

AncestryDNA also matches you with relatives, but you can only see how you’re related to those people if they have also chosen to make family trees.

A feature unique to AncestryDNA is called DNA circles. It shows connections between individuals and family groups who share DNA with you. These circles also contain descendants of your ancestors who you don’t directly share DNA with. Therefore, this feature allows you to extend relative matches beyond what traditional DNA matching can do.

For instance, I am in a family group with my uncle and a cousin. We all share DNA with 24 other descendants of Samuel Pickerill, a drummer during the Revolutionary War. Pickerill has 42 other descendants with whom my family group doesn’t share DNA. Those 42 Pickerill descendants happened to inherit different bits of DNA from Pickerill than my uncle, his cousin and I did. That sometimes happens because of the random nature of the rules of biology and genetics (for more on those rules, check out this video).

#### Genealogy junkie

Although I’ve always been interested in family history, DNA testing has gotten me hooked on genealogy research.

23andMe and AncestryDNA were the most fun to use. 23andMe can tell me whether a relative is on my mother’s or father’s side of the family. But then I have to go back to AncestryDNA and comb through my family tree to learn how we’re really connected. DNA can kick-start a genealogy hunt, but combing through marriage certificates, military rolls, census records, immigration documents, old photographs and other records — which Ancestry.com can provide — is what really tells me who my ancestors were.

### All in the family

A variety of consumer genetic testing companies offer ancestry testing. Here’s how five such services compare.

Geno 2.0 Living DNA Family Tree DNA 23andMe AncestryDNA
Cost \$199.95 \$159 \$79* \$99 \$99
Services include:
Ethnicity estimates
Relative matching Coming soon
Neandertal results
Y chromosome analysis
Mitochondrial DNA analysis
Family tree building

*Y chromosome and mitochondrial DNA analysis costs extra

A version of this article appears in the June 23, 2018 issue of Science News.

## DNA testing can bring families together, but gives mixed answers on ethnicity

FINDING FAMILY DNA testing helped Michael Douglas find his biological family in southern Maryland and his Irish roots.

Michael Douglas, a new resident of southern Maryland, credits genetic testing for helping him find his heritage — and a family he knew very little about.

Douglas, 43, is adopted. He knew his birth mother’s name and had seen a birth certificate stating his birth name: Thomas Michael McCarthy. Over the years, Douglas had tried off and on to find his birth family, mostly by looking for his mother’s name, Deborah Ann McCarthy, in phone books and calling the numbers. “I think I must have broken up a lot of marriages,” he laughs.

His search gained urgency in the last five years as he battled a life-threatening illness. “We planned my funeral three times,” he says. Douglas has a genetic disease called Ehlers-Danlos syndrome, caused by a variant in a gene that helps build the body’s connective tissue. His stretchy skin and hyperflexible joints are characteristic of the disease.

“As a kid, I was always dislocating something,” he says. His blood vessels don’t constrict properly to maintain his blood pressure, so Douglas sometimes faints when he stands up. For five years, he has had a constant migraine. Headaches are typical of about a third of people with Ehlers-Danlos. On top of that, he has B cell lymphoma. “I feel like I have the flu every day,” he says. It was time, he decided, to track down his birth family and learn more about his medical history.

In June 2017, Douglas flew to Ireland on what he calls his “death trip.” He wanted to see the land of his McCarthy ancestors. He chose Fethard, because the walled medieval town has a pub called McCarthy’s. (Douglas learned later that he and the pub owner are related.) His health improved during the visit, which he attributes to Ireland’s cool weather. When he returned to Phoenix, where he and his adopted family lived, he had new resolve to find his birth family.

“That’s it,” he decided. “I need my DNA run to find out who I am.” He sent his DNA to three testing companies: Family Tree DNA, AncestryDNA and MyHeritage. With his results plus sleuthing of genealogical records by some helpful strangers, Douglas found his biological family last November and dove headfirst into a new life.

In February, he moved from Phoenix to Maryland to help care for his biological mother as she recovers from a stroke. The new family dynamic hasn’t been easy, but Douglas has bonded with one of his two biological brothers. “And I have a relationship with my ancestors that I did not know before.” He is pleased to find that he resembles his great-grandfather Thomas Rodda, a bicycle maker. Douglas himself is a Star Wars costume maker.

FAMILY HISTORY With this picture, Douglas learned he resembles his great-grandfather Thomas Rodda (center, holding the bicycle frame). Matthew Rakola

Adoptees like Douglas and birth parents looking for children they gave up often use commercial DNA tests in hopes of reconnecting, says Drew Smith, a genealogical librarian at the University of South Florida in Tampa. Many states make it difficult for adoptees to get birth certificates or other documents that could help them track down birth families. DNA tests are “an end run around the documentation problem,” Smith says.

But the pool of people looking for their genetic roots is much larger. AncestryDNA, the ancestry testing service with the biggest customer base, has persuaded about 10 million people to take its DNA test. 23andMe, Living DNA, Family Tree DNA, MyHeritage, National Geographic’s Geno 2.0 and others also offer customers a chance to use genetics to connect with living relatives and with families’ pasts. A few companies even give hints about ties that go back to Neandertals (SN: 11/11/17, p. 10). But such testing services may not be able to tell you as much about who you are and where your family came from as they claim.

#### Genetic testing goes mainstream

This feature is part of a series on consumer genetic testing. See the whole series.

#### False precision

I got my DNA tested for this multipart reporting project. My assignment was to investigate the science behind DNA testing (SN: 6/9/18, p. 20), but it was also a welcome excuse to learn more about my family’s history.

I already knew a lot about three branches of my family tree. Based on birth and death records, plus census and other documents, most of my family stems from England and Germany. But I dreamed of connecting to relatives on the Hungarian branch, which I knew less about. So I sent saliva or cheek swabs to a handful of testing companies.

My ethnicity estimates were all over the European map. Generally, estimates are most accurate on the broad continental scale. All of the companies agree that my heritage is overwhelmingly European. But that’s where the consensus ends. Even the companies that limit their estimates to broad swaths of the continent told different stories. National Geographic’s Geno 2.0 says that I am 45 percent Southwestern European. Veritas Genetics puts my Southwestern European heritage at only 4 percent and tells me I’m mostly (91.1 percent) north-central European.

The companies that try to dig down to the country level see their confidence in the results go down, but that doesn’t stop them from making very specific estimates. In most reports, the main results given are at the lower end of the confidence scale. 23andMe, for instance, says it has 50 percent statistical confidence in the ethnicity results.

Along with the wide variations between companies, the estimates often didn’t match what I know about my family tree. 23andMe says I’m 16.6 percent Scandinavian. When I sent raw data from 23andMe to MyHeritage to do its own analysis, that company reported no Scandinavian ancestry in my background it said I’m 16.9 percent Italian. As far as I know, I have no ancestors from Italy or Scandinavia.

Only 23andMe called out my German heritage, though the company lumped it in with French for a total of 18.8 percent. Hungarian is not specifically identified in any company’s estimates. I can only guess that 23andMe’s 3.9 percent Eastern European and 0.3 percent Balkan findings cover that part of my ancestry. Both 23andMe and AncestryDNA say that I have Ashkenazi Jewish heritage. News to me.

Multiple companies agree that a sizable chunk of my heritage is from the British Isles. But even in that, estimates run from 23andMe’s 26.6 percent British and Irish, to Living DNA’s calculation that 60.3 percent of my DNA comes from Great Britain and Ireland, to MyHeritage’s even higher 78.7 percent.

When I shared these inconsistencies with Deborah Bolnick, an anthropological geneticist at the University of Texas at Austin, I could practically hear her shaking her head over the phone.

“They present these very specific, precise numbers down to the decimal point. But it’s a false precision,” Bolnick says. “The tests that are available may not be as nuanced, sensitive and fine scaled as they are presented.”

### Mixed messages

Five companies gave reporter Tina Hesman Saey a wide range of results about her ethnic makeup. Sometimes the companies’ findings overlap, but the categories are called different things, such as French & German and Western European. Both AncestryDNA and 23andMe found evidence of Jewish heritage, but none of the other companies did. Geno 2.0 and Family Tree DNA reported on only 99 percent of Saey’s DNA.

#### Checking references

Ethnicity estimates come from comparing patterns of genetic variants — often called single nucleotide polymorphisms, or SNPs — in your DNA with the SNP patterns of pools of people from particular geographic locations. As a way to confirm that a pool solidly represents a place, companies generally require that the people in these pools, known as the reference populations, have four grandparents who were also born in that location. Many of the companies draw reference population DNA samples from people in large public databases compiled by the 1000 Genomes Project, a catalog of genetic variation of thousands of people around the world, and from other studies. Some companies supplement their databases by testing more people in particular parts of the world. So the mixes in reference populations differ across companies.

Who the companies say you are depends in large part on those reference populations, Bolnick says. For instance, you may carry a pattern of SNPs found in people in both southern France and in Italy. If, by chance, the French people a company sampled had that SNP pattern but none of the Italians in the company’s database did, “they may infer that you have French ancestors and not Italian because of who they do and do not have in their database,” Bolnick explains.

Drilling down to tell customers which country or which part of a country their ancestors called home requires sampling many people in those countries, together with more sophisticated math to detect slight differences in the patterns. By looking at more than SNP patterns, Living DNA provides ethnicity estimates down to subregions of the United Kingdom and Ireland. The company analyzes how different stretches of DNA are connected to each other, says David Nicholson, the company’s cofounder and managing director.

It’s a bit like regional differences in the way people in southwest England assemble scones, cream and jam for cream teas. “In Devon you have a scone, cream and then you have jam,” Nicholson says. “In Cornwall you have a scone, jam, cream, so you have them in a different order. Most DNA tests just tell you that you have a scone, jam and cream so you’re from the U.K.” But because his company looks at the order of the DNA ingredients, Nicholson claims his results can tell customers what part of the British Isles was their ancestral home.

REFERENCE CHECK Testing companies estimate ethnicity by comparing customers’ DNA with the DNA of people in reference populations around the world. But companies have different reference populations and divide the world differently, as seen in this comparison of reference population maps from AncestryDNA, MyHeritage and 23andMe. From left: AncestryDNA, MyHeritage, 23andMe REFERENCE CHECK Testing companies estimate ethnicity by comparing customers’ DNA with the DNA of people in reference populations around the world. But companies have different reference populations and divide the world differently, as seen in this comparison of AncestryDNA’s and MyHeritage’s reference population maps. From left: AncestryDNA, MyHeritage

#### Dividing lines

In reality, what the companies can say with certainty is that you share common DNA patterns with people living in those places today. But your ancestors may not always have lived where their descendants do now, Bolnick says. People move around, which muddies the waters.

For many Americans, some branches of their families may be recent immigrants, while other branches may have deep roots in American soil. Two branches of my family came to Massachusetts and Maryland from England in the 1600s. One branch moved from Germany to Nebraska in the late 1800s, and my Hungarian great-grandparents arrived in 1905.

Most Americans who get tested want to know about family from before the big move to the United States, says human geneticist Joe Pickrell, chief executive of DNA testing company Gencove. But the answer isn’t simple. DNA is a record of thousands of ancestors stretching back deep in time, each from a slightly different place. How companies sort out time and place may produce different ancestry estimates, Pickrell says.

They may infer that you have French ancestors and not Italian because of who they do and do not have in their database.

Take a stretch of DNA containing a particular SNP pattern. “Today it may be found in you in the United States and in relatives in England and Germany, but it could be that 500 years ago your shared ancestor lived in Italy,” Bolnick explains. Going further back in time, that stretch of DNA may look like it came from Romania, Mongolia and Siberia. “As people move and the genes that they have move with them, it’s going to change what those geographic ancestries look like,” she says.

Given the timing of my family’s migrations, I would have expected a much bigger percentage of my ethnicity to come from the newer immigrants. I thought my British ancestry would have been diluted after hundreds of years in America, but I guess not.

Further complicating matters, most people think of their ancestry as coming from particular countries, but genetics cuts across and transcends national borders, Bolnick says. In reality, those categories are not genetic, they’re sociopolitical and historic.

Smith, in South Florida, agrees: “From a DNA perspective, it’s hard to tell a French person from a German person.”

#### Missing groups

And some groups, including aboriginal populations in Australia and big parts of Africa and Asia, are mostly absent from companies’ databases. The same goes for Native Americans, whose samples in public databases are small, and in some cases, were collected by questionable means, says Krystal Tsosie, a geneticist at Vanderbilt University in Nashville.

Courtesy of Vanderbilt University Medical Center

She’s talking about “vampire projects,” in which geneticists swooped in to draw blood from native people, then disappeared. Some scientists have misused DNA samples taken from members of several indigenous nations, conducting studies the DNA donors didn’t consent to and doing studies that contradicted the groups’ cultural and religious beliefs.

In 2002, the Navajo (Diné) Nation — Tsosie’s tribe — declared a moratorium on genetic research. Recently, tribal members have discussed lifting the moratorium, but for now it remains in place, Tsosie says. “We’ve been, for so long, used as research subjects and not really equitable partners in research,” she says. “We’re still waiting for the conversation to change to allow us to have our interests protected.”

As a result of this mistrust of genetic research, there are not enough people from the 566 federally recognized tribes in the genetic databases to enable customers to learn about their tribal heritage from DNA tests. And even if a DNA test could establish that a person carries DNA inherited from a Native American ancestor, that doesn’t make that person a member of the tribe, Tsosie says. Tribal memberships are based on family and community ties, not DNA.

As a volunteer for the Native American Indian Association of Tennessee, Tsosie gets a lot of questions. People get Native American results and want to know if they can share in gaming profits. “It’s not enough to just call yourself a Native American,” she says. “I tell them, you have to go through the genealogy” and document your ancestry. “Typically, the response is, ‘Oh, that sounds like too much work.’ ”

That response baffles her. “If knowing this Native American past — this part of you — is so important, then undergoing the legwork and documentation should be important,” she says. Equally puzzling is why people base their identities on randomly inherited SNP patterns, she says. “Our character, who we are, who we come from is a complex story of a variety of nonbiological factors. To reduce that to a test kit is actually going to ignore the beauty and complexity that is us.”

### Mix and unmatch

When genetic testing customers discover that they don’t share DNA with people they thought were their cousins, assumptions can get dark quickly. Are there secrets in the family tree? Not necessarily.

DNA recombination — a reshuffling of bits of the parents’ chromosomes in the cells that give rise to eggs and sperm — creates new genetic combinations, half of which each parent passes to a child. Siblings will share about 50 percent of their DNA. The recombination means children don’t inherit the exact same mix from their parents (unless the kids are identical twins).

That mixing may lead to distant cousins inheriting completely different genetic legacies from their ancestors. The more distant the connection, the more likely relatives are to have no DNA in common. About 10 percent of third cousins (who share the same great-great-grandparents) and 45 percent of fourth cousins (descendants of the same great-great-great-grandparents) have no DNA in common, says Drew Smith, a genealogical librarian at the University of South Florida in Tampa.

“Don’t get upset if you’ve got a documented third cousin and you don’t share any DNA. It happens,” he says. “On the other hand, if you’ve got a second cousin and you don’t share DNA, there’s a problem.”

RELATIONSHIPS This tree shows how a set of chromosomes from one couple is recombined and passed down to their descendants. Here, Bob would share some DNA (dark blue strip) with a male third cousin but not with a female third cousin. C. Chang RELATIONSHIPS This tree shows how a set of chromosomes from one couple is recombined and passed down to their descendants. Here, Bob would share some DNA (dark blue strip) with a male third cousin but not with a female third cousin. C. Chang

WANNA SWAP? DNA recombination can be a confusing concept. We explain with Legos.

#### Making connections

Some ads for testing companies reinforce the link between DNA and identity. An AncestryDNA ad features Kyle Merker, a real person, who says that he grew up thinking he was of German descent. He even danced in German folk groups and wore lederhosen. Merker’s DNA suggests he’s not German at all, but predominantly Scottish and Irish. He’s swapped his lederhosen for a kilt.

The commercial makes it sound like Merker changed his entire culture because of a DNA test. Dig deeper, though, and you’ll find that he researched his family through newspaper articles and government records. These traditional genealogical resources really told Merker the story of his family, Smith says.

“DNA by itself is rarely of any value,” Smith says. “If you’re really interested in researching your family, there’s much more work to be done.” He likens it to ads from Home Depot or Lowe’s: “They make it look like, ‘Oh my gosh, redoing a room is easy.’ ”

Similarly, to really confirm heritage, people have to follow paper trails composed of birth and death certificates, military forms, immigration records, census rolls, church baptism and marriage records, and more. “DNA is just one more type of record,” Smith says. “You’ve got to pull it all together to build your case.”

Michael Douglas found his Irish roots, but it took more than DNA to untangle his heritage. Douglas learned from a McCarthy lineage group on Family Tree DNA that his Y chromosome suggests he’s a descendent of Donal Gott McCarthy, a 13th century Irish king. “Oh, my god, I’m royalty!” he says. The group helped him trace the McCarthy lineage from the 1200s to 1830s Cork County, Ireland.

FOLLOW THE TRAIL Through consumer DNA testing and other research, Michael Douglas found out that his Y chromosome connects him to medieval Irish royalty. Matthew Rakola

AncestryDNA’s and MyHeritage’s DNA and genealogical records allowed Douglas and four people he calls his “ancestry angels” to connect him with his biological family. The angels were four strangers who friended Douglas on Facebook and helped him with his family research, using genetic connections Douglas had rejected because they didn’t have the McCarthy last name. The helpers disappeared once he tracked down his mother.

Not all endings are happy. Smith has seen DNA testing split families. “You may discover things that are surprising or disturbing,” he says. You could find out that your father isn’t your father. Or matching to other relatives could uncover family secrets, such as an aunt who never told her family that she gave up a child for adoption or an uncle who knowingly or unknowingly fathered a child.

“It’s fun to learn more about our ancestors and what our ethnicity is,” Smith says. But, he warns, keep in mind that what you learn “may upend your personal life or the personal lives of members of your family.” Don’t do it if you’re not prepared for the repercussions.

A version of this article appears in the June 23, 2018 issue of Science News.

## 23andMe Genetic Health Risk Reports: What you should know

Genetic Health Risk reports tell you about genetic variants associated with increased risk for certain health conditions. They do not diagnose cancer or any other health conditions or determine medical action.

Having a risk variant does not mean you will definitely develop a health condition. Similarly, you could still develop the condition even if you don't have a variant detected. It is possible to have other genetic risk variants not included in these reports.

Factors like lifestyle and environment can also affect whether a person develops most health conditions. Our reports cannot tell you about your overall risk for these conditions, and they cannot determine if you will or will not develop a condition.

These reports do not replace visits to a healthcare professional. Consult with a healthcare professional for help interpreting and using genetic results. Results should not be used to make medical decisions.

## Discussion

In Theobald’s response to K&W’s simulations, he showed that by extending his test to include the true model (the MAX-Poisson under a star tree with infinite branch lengths, called “profile” model) it would be preferred over a single tree with a standard substitution model. This shows that the evaluated phylogenetic substitution models are consistent, but do not provide evidence about the appropriateness of the original UCA test. Even more, the actual model selection should be thought of as a blind test: we must not rely on some privileged knowledge about the true origin of the data set to reject hypotheses beforehand. Since we never know the true generating model of real data sets – which is especially true in phylogenetics – we must accept that all models we work with are misspecified [27].

On the other hand, if the inference for or against UCA depends on details of the phylogenetic model, then the test will only be useful when we know the true phylogenetic model. We do not expect a useful model to be very sensitive to model violations, especially when these violations can be assumed to affect both hypotheses. We expect the test to favour the correct hypothesis for any model close enough to what might be the true generating one.

For example if our conclusion for UCA or IO changes depending on whether allow or not rate heterogeneity, whether we include or not a given replacement matrix, or some other mild model misspecification, then it becomes hard to defend our conclusion, and we should not trust this model selection. Our expectation is that a model good enough will affect both hypotheses likewise.

We are not against extending the UCA test framework to include more models, which might help distinguishing an IO data set from an UCA one. After all, the test output will give the odds ratio given a set of assumptions – like for instance rate heterogeneity, common branch lengths along the alignment, a common topology for all sites, etc. And we can always improve on the assumptions. Furthermore if we can devise an evolutionary model whereby independent sequences can mislead BLAST searches and alignment procedures, certainly we would like to see it implemented it in such a model selection framework. But we should accredit it as a contribution to a better model selection test, particularly if such model could have systematically misled the original one. Systematically misleading simulations are a valid criticism to a particular model selection scheme, that deserve credit.

We should not dismiss a model based solely on our subjective impressions about commonplace data sets, either: novel methodologies are created precisely to discover patterns that were hidden or unexplained so far. Therefore biological realism or representativeness may not be good judges of a model’s relevance. In exploratory analysis we employ several short cuts like skipping similar models or disregarding those based on assumptions known to be very unlikely. But when the aim is to assign objectively probabilities to the hypotheses, then we should consider and embrace models capable of refuting them.

A more serious problem may be when model misspecification happens only under one of the hypothesis (due to software limitations, for instance). For instance, cases where amino acid replacement model heterogeneity between the independently evolved data sets can affect the test: while under UCA all branches are forced to follow the same replacement matrix, gamma parameter and equilibrium frequencies, under IO the independently evolved groups are allowed to have their own ones. We recognize that this is an implementation problem and not a theoretical one – programs usually make this homogeneity assumption to avoid overparameterization. Nonetheless, we should be careful whenever the test favours IO since it might be the case of a better parameterization – one set of parameters for each subtree. Whenever the test favours IO, we should always try to isolate the effect of the IO assumption against the confounding effect of amino acid replacement heterogeneity by one of two ways.

One is by extending the software to replace the fixed parameter by a variable one. That is, to allow the implemented model to have a variable replacement matrix along the tree, or a heterogeneous equilibrium frequency vector across branches, etc. so as the UCA tree can access the same parameter space as the IO trees. The other is to assume homogeneity under the IO hypothesis by using the same parameters over all independently evolved groups, such that any model misspecification can be “marginalized”. If some apparent support for the IO hypothesis disappears once we force homogeneity, then we can suspect that the model misspecification was misleading the test.

We maintain that the UCA test as originally proposed [1] is heavily biased towards UCA, but a good counterargument would be to show a replicable simulation procedure that generates bias-free alignments where the test correctly detects IO. The problem lies in that there are no known mechanisms (at least none that we are aware of) by which we can simulate independently evolved sequences that satisfy the quality requirements imposed in [7] – and any attempt might be met with a special pleading, as we have seen. It is worth noticing that another method has been recently proposed that can more directly test for ancestral convergence [12]. This method does not seem to suffer from the drawbacks of the UCA test, since it takes into account the alignment step.

Another powerful argument for the common ancestry of life is to show how distinct genes or different units of information support similar phylogenetic histories – and we can only thank Douglas Theobald for the herculean task of compiling the evidence for it in an accessible manner (http://www.talkorigins.org/faqs/comdesc/). But unfortunately the opportunity of showing this consilience of trees for the universally conserved proteins was missed: the UCA model selection framework suggested that several trees were much more likely than a single tree for all proteins [1], which prima facie goes against a universal phylogeny, in the absence of a quantification of the amount of disagreement. We are thus left only with a visual corroboration of the non-random clustering of taxa ([1] Figure 2a), which do indeed provide evidence for the common ancestry of the analysed sequences.

## Genetic Testing Pros and Cons

Genetic testing is used to identify genetic disorders, and involves a detailed study of the DNA molecule. It is an effective tool to diagnose everything, from risks of cancer to genetic abnormalities in the newborn. This BiologyWise article focuses on the many pros and cons of this advanced technique.

Genetic testing is used to identify genetic disorders, and involves a detailed study of the DNA molecule. It is an effective tool to diagnose everything, from risks of cancer to genetic abnormalities in the newborn. This BiologyWise article focuses on the many pros and cons of this advanced technique.

Genetic testing, otherwise known as gene testing or the DNA testing, is amongst the latest techniques used to detect genetic disorders, if any, in the DNA molecules of our body. Genetic testing is used for biochemical tests for various gene products like the enzymes, proteins, etc. Let us have a look at the various reasons why genetic testing is carried out.

### Why Genetic Testing is Carried Out

• The most popular and widely used purpose of using genetic testing is to detect newborn babies with any possibility of genetic disorders.
• Screening carriers of a disorder, i.e., detecting and identifying one of the two required set of genes to establish the particular genetic disorder is also done through it. The set of two genes that is mentioned here is the pair which is made out of the parent’s genes, one from the father and the other one from the mother.
• This test is also used to establish the paternity of a baby. The test detects the unique DNA and helps in determining who is the father of the child.
• It is conducted during pregnancy to check the embryo and detect any kind of defects in the genes.It also gives the couple a chance to decide whether to continue with the pregnancy or go for an abortion in case there is some serious complications in the fetus.
• It is very useful in the detection of faults in the genes in case of in vitro fertilization (IVF), which is a process in which an embryo is formed by fusion of the male sperm and female egg, outside the female body. It is an artificial way of conception.
• Genetic tests are extremely useful to people who have a history of genetic disorder(s) in the family, i.e., if genetic disorders are present in the genes. Diseases like cancer can be somewhat predicted and precautions can be taken in such cases.
• Genetic tests also play a major role in identifying criminals and plays a key role in forensics.

### Pros of Genetic Testing

• One of the advantages of genetic testing is that it can be carried out at any given point of time in one’s life.
• In cases of prenatal genetic testing, a baby’s defects in the genes can be rectified before birth.
• If patients are made aware of their chances of being affected with some disease, they can go in for proper preventive measures to avoid or delay the setting in of the particular disease.
• Regular check ups and visits to the doctors can help determine the stages or onset of the suspected disease with the help of genetic testing which is always a better option.

### Cons of Genetic Testing

• The fact that genetic testing is not available for all the kinds of genes in the body is one of its biggest drawbacks.
• The knowledge of having some genetic disorder can trigger various reactions in the family members which is bound to affect the person as well his or her family.
• Genetic testing is, many times, the cause of abortion of the girl fetus.

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

The bottom line, that genetic testing is all about probabilities, is something which everyone has to bear in mind. Hope this article helped throw light on the concept of genetic testing. Opponents and proponents of genetic testing have their own set of beliefs, making it one of the highly debated issues of the present times.

### Related Posts

While genetic engineering can lead to introduction of greater quality traits in organisms, it can also have undesirable side effects. To understand the pros and cons of genetic engineering, read&hellip

Cloning is the process of creating a copy of a biological entity. In genetics, it refers to the process of making an identical copy of the DNA of an organism.&hellip

## Understanding ancestry testing mathematically - Biology

Your DNA contains a record of your ancestors, but you aren't a carbon copy of any one of them. The particular mix of DNA you inherit is unique to you. You receive 50% of your DNA from each of your parents, who received 50% of theirs from each of their parents, and so on. In the chart below you can see how the amount of DNA you receive from a particular ancestor decreases over generations. If you go back far enough, there is a chance that you inherited no DNA from a particular ancestor. The chart below helps illustrate how different segments of DNA might have been passed down from your grandparents to make your unique DNA. Assume each letter represents a segment of DNA. Things to notice are:

• Which letters get passed down to each generation is random (the fact that the letters spell names in this example is simply to help with the illustration).
• Not all of the letters get passed down.
• Just because a child doesn't have a letter doesn't mean that an earlier ancestor didn't have that letter.
• Siblings can have different combinations of letters

Click here to order our latest book, A Handy Guide to Ancestry and Relationship DNA Tests

My son is going to have twins. I'm a fraternal twin and I have another brother and sister that are fraternal twins. My husband has on his side of the family twins (fraternal) and triplets - boys. Is our son having twins because of our background? Does it have to do with our genetics?

That is a very interesting question! And one that many people wonder about. In fact, we answered a very similar question many years ago.

Twin genetics depend on what kind of twins we are talking about. Having identical twins is not genetic. On the other hand, fraternal twins can run in families.

Genetics can definitely play a role in having fraternal twins. For example, a woman that has a sibling that is a fraternal twin is 2.5 times more likely to have twins than average!

However, for a given pregnancy, only the mother’s genetics matter. Fraternal twins happen when two eggs are simultaneously fertilized instead of just one. A father’s genes can’t make a woman release two eggs.

It sounds like fraternal twins do indeed run in your family! But, since your son is the father, his genes are on the wrong side of the family tree. So, your family history likely didn’t play a role in his wife’s twin pregnancy.

The answer would be different if you were asking about a daughter. Also, although your son’s family history of twins can’t increase his wife’s chance of having twins, he can pass those genes down to your granddaughter. With your strong family history of fraternal twins, this just might increase the chances of your granddaughter having twins!

But, your daughter-in-law is not necessarily having twins because of her genetics. Other things like environment, nutrition, age, and weight have also been linked to having twins as well. And there is always simple chance…every woman has a chance at having fraternal twins. It is just that some women have a higher or lower chance.

Huh? Help Me Understand the Genetics!

Wait a minute. One type of twins has a genetic basis and the other does not? And, only the mom’s genetics matter? How is that possible?

Don’t worry. It makes a lot of sense once we break down the biology.

The important difference between identical and fraternal twins is the number of fertilized eggs involved. Identical twins come from a single fertilized egg. Fraternal twins come from two different ones.

Identical twins happen when a single embryo splits in two soon after fertilization. This is why identical twins have identical DNA. They came from the same fertilized egg.

Since embryo splitting is a random event that happens by chance, it doesn’t run in families. Genes are not involved. The same is not true for fraternal twins.

Fraternal twins happen when two independent eggs are each fertilized by different sperm. This is why the DNA of fraternal twins is different. In fact, fhe DNA of fraternal twins is no more similar than the DNA any other sibling pair.

Usually, a woman only releases a single egg at a time. Fraternal twins can only happen if a mother releases two eggs in one cycle. This is called hyperovulation.

Unlike embryo splitting, ovulation is a normal biological process that is controlled by our genes. And, different women can have different versions of these ovulation genes.

Some women have versions (called alleles) of these genes that make them more likely to hyperovulate. This means there is a higher chance that two eggs could get fertilized at once, leading to fraternal twins.

The gene versions that increase the chance of hyperovulation can be passed down from parent to child. This is why fraternal twins run in families.

However, only women ovulate. So, the mother’s genes control this and the fathers don’t.

This is why having a background of twins in the family matters only if it is on the mother’s side. And why your son’s family genetics did not play a role in his twins.

We went over a lot of this stuff in our previous answer, but your question got me thinking. Our last answer on twins was done so long ago. Has recent research discovered anything new on this fascinating topic? They have indeed… at least if you are a sheep!

Counting Sheep can Teach us about Twins

Scientists often turn to animals when they want to study a biological process. Some of the newest information we have about twin genetics comes from studying sheep.

Sheep were chosen because, like people, they typically give birth to a single lamb. However, they can sometimes have twins and triplets.

Different breeds of sheep naturally have higher or lower twin rates. These different breeds have different versions (called alleles) of some of their genes. Specific alleles can make certain breeds more likely to have twins.

We can compare the genes between these different breeds to try to find the genes controlling twinning. And, this is just what scientists did.

A thorough search for genes controlling twining in sheep identified several interesting ones. The breeds with higher twin rates had different alleles of these genes!

Three key sheep genes identified were named BMP15, GDF9, and BMPR1B. The specific gene names are not really important. Just know that all of these genes are involved in controlling ovulation. Which makes sense!

Remember, hyperovulation increases the chance of having fraternal twins. The sheep breeds with higher than average twin rates had versions of the genes that increase ovulation.

Sheep are a great tool to help us study twin genetics. The tricky part is connecting these findings to people.

It is harder to study humans. Scientists have tried to find links between the genes identified in sheep and human twin genetics. So far they’ve found that some match up and some don’t. This, in and of itself, is interesting!

Another gene called follicle-stimulating hormone, or FSH for short, has also been linked to twins in humans. Like the other three genes identified, this FSH is also involved in promoting ovulation, and mothers of fraternal twins often have high levels of it.

It seems that twin genetics is more complicated in humans than in sheep. More genes are likely involved. But, each new bit of information about the genes involved adds another puzzle piece to the complete genetic picture.

Maybe someday we will know all the genes that cause fraternal twins in people. But for now, you can just tell your son that his genetics likely didn’t cause his twins. Scientists are still trying to figure out which, if any, genes on his wife’s side could possibly be the culprits!

## Investigating Evolutionary Puzzles through Proof-of-Concept Modeling

Proof-of-concept models have proven to be an essential tool for investigating some of the classic and most enduring puzzles in the study of evolutionary biology, such as “why is there sex?” and “how do new species originate?” These areas of research remain highly active in part because the relevant time scales are long and the processes are intricate. They represent excellent examples of topics in which mathematical approaches allow investigators to explore the effects of biologically complex factors that are difficult or impossible to manipulate experimentally.

### Why Is There Sex?

A century after Darwin [25] published his comprehensive treatment of sexual reproduction, John Maynard Smith [26] used a simple mathematical formalization to identify a biological paradox: why is sexual reproduction ubiquitous, given that asexual organisms can reproduce at a higher rate than sexual ones by not producing males (the “2-fold cost of sex”)? Increased genetic variation resulting from sexual reproduction is widely thought to counteract this cost, but simple proof-of-concept models quickly revealed both a flaw in this verbal logic and an unexpected outcome: sex need not increase variation, and even when it does, the increased variation need not increase fitness [27]. Subsequent theoretical work has illuminated many factors that facilitate the evolution and maintenance of sex. Otto and Nuismer [28], for example, used a population genetic model to examine the effects on the evolution of sex of antagonistic interactions between species. Such interactions were long thought to facilitate the evolution of sex [29],[30]. They found, however, that these interactions only select for sex under particular circumstances that are probably relatively rare. Although these predictions might be difficult to test empirically, their implications are important for our conceptual understanding of the evolution of sex.

### How Do New Species Originate?

Speciation is another research area that has benefitted from extensive proof-of-concept modeling. Even under the conditions most unfavorable to speciation (e.g., continuous contact between individuals from diverging types), one can weave plausible-sounding verbal speciation scenarios [22]. Verbal models, however, can easily underestimate the strength of biological factors that maintain species cohesion (e.g., gene flow and genetic constraints). Mathematical models have allowed scientists to explicitly outline the parameter space in which speciation can and cannot occur, highlighting many critical determinants of the speciation process that were previously unrecognized [31]. Felsenstein [32], for example, revolutionized our understanding of the difficulties of speciation with gene flow by using a proof-of-concept model to identify hitherto unconsidered genetic constraints. Speciation models in general have made it clear that the devil is in the details there are many important biological conditions that combine to determine whether speciation is more or less likely to occur. Because speciation is exceedingly difficult to replicate experimentally, theoretical developments such as these have been particularly valuable.

### Pitfalls and Promise

Although mathematical models are potentially enlightening, they share with experimental tests the danger of possible overinterpretation. Mathematical models can clearly outline the parameter space in which an evolutionary phenomenon such as speciation or the evolution of sex can occur under certain assumptions, but is this space “big” or “little”? As with any scientific study, the impression that a model leaves can be misleading, either through faults in the presentation or improper citation in subsequent literature.

Overgeneralization from what a model actually investigates, and claims to investigate, is strikingly common in this age when time for reading is short [33], and this problem is exacerbated when the presentation is not accessible to readers with a more limited background in theoretical analysis [34]. Indeed, these problems, universal to many fields of science, introduce the greatest potential for error in the conclusions that the research community draws from evolutionary theory.

We follow this word of caution with a final positive thought: in addition to the roles of mathematical models in testing verbal logic, the ability of theory to circumvent practical obstructions of experimental tractability in order to tackle virtually any problem is a benefit that should not be underestimated. Science is a quest for knowledge, and if a problem is, at least currently, empirically intractable, it is very unsatisfactory to collectively throw up our hands and accept ignorance. Surely it is far better, in such cases, to use mathematical models to explore how evolution might have proceeded, illuminating the conditions under which certain evolutionary paths are possible.