1.2: The roots of comparative methods - Biology

The comparative approaches in this book stem from and bring together three main fields: population and quantitative genetics, paleontology, and phylogenetics. I will provide a very brief discussion of how these three fields motivate the models and hypotheses in this book (see Pennell and Harmon 2013 for a more comprehensive review).

The fields of population and quantitative genetics include models of how gene frequencies and trait values change through time. These models lie at the core of evolutionary biology, and relate closely to a number of approaches in comparative methods. Population genetics tends to focus on allele frequencies, while quantitative genetics focuses on traits and their heritability; however, genomics has begun to blur this distinction a bit. Both population and quantitative genetics approaches have their roots in the modern synthesis, especially the work of Fisher (1930) and Wright (1984), but both have been greatly elaborated since then (Falconer et al. 1996; see Lynch and Walsh 1998; Rice 2004). Although population and quantitative genetic approaches most commonly focus on change over one or a few generations, they have been applied to macroevolution with great benefit. For example, Lande (1976) provided quantitative genetic predictions for trait evolution over many generations using Brownian motion and Ornstein-Uhlenbeck models (see Chapter 3). Lynch (1990) later showed that these models predict long-term rates of evolution that are actually too fast; that is, variation among species is too small compared to what we know about the potential of selection and drift (or, even, drift alone!) to change traits. This is, by the way, a great example of the importance of macroevolutionary research from a deep-time perspective. Given the regular observation of strong selection in natural populations, who would have guessed that long-term patterns of divergence are actually less than we would expect, even considering only genetic drift (see also Uyeda et al. 2011)?

Paleontology has, for obvious reasons, focused on macroevolutionary models as an explanation for the distribution of species and traits in the fossil record. Almost all of the key questions that I tackle in this book are also of primary interest to paleontologists - and comparative methods has an especially close relationship to paleobiology, the quantitative mathematical side of paleontology (Valentine 1996; Benton and Harper 2013). For example, a surprising number of the macroevolutionary models and concepts in use today stem from quantitative approaches to paleobiology by Raup and colleagues in the 1970s and 1980s (e.g. Raup et al. 1973; Raup 1985). Many of the models that I will use in this book – for example, birth-death models for the formation and extinction of species – were first applied to macroevolution by paleobiologists.

Finally, comparative methods has deep roots in phylogenetics. In fact, many modern phylogenetic approaches to macroevolution can be traced to Felsenstein’s (1985) paper introducing independent contrasts. This paper was unique in three main ways. First, Felsenstein’s paper was written in a remarkably clear way, and convinced scientists from a range of disciplines of the necessity and value of placing their comparative work in a phylogenetic context. Second, the method of phylogenetic independent contrasts was computationally fast and straightforward to interpret. And finally, Felsenstein’s work suggested a way to connect the previous two topics, quantitative genetics and paleobiology, using math. I discuss independent contrasts, which continue to find new applications, in great detail later in the book. Felsenstein (1985) spawned a whole industry of quantitative approaches that apply models from population and quantitative genetics, paleobiology, and ecology to data that includes a phylogenetic tree.

More than twenty-five years ago, “The Comparative Method in Evolutionary Biology,” by Harvey and Pagel (1991) synthesized the new field of comparative methods into a single coherent framework. Even reading this book nearly 25 years later one can still feel the excitement and potential unlocked by a suite of new methods that use phylogenetic trees to understand macroevolution. But in the time since Harvey and Pagel (1991), the field of comparative methods has exploded – especially in the past decade. Much of this progress was, I think, directly inspired by Harvey and Pagel’s book, which went beyond review and advocated a model-based approach for comparative biology. My wildest hope is that my book can serve a similar purpose.

My goals in writing this book, then, are three-fold. First, to provide a general introduction to the mathematical models and statistical approaches that form the core of comparative methods; second, to give just enough detail on statistical machinery to help biologists understand how to tailor comparative methods to their particular questions of interest, and to help biologists get started in developing their own new methods; and finally, to suggest some ideas for how comparative methods might progress over the next few years.

Comparative method

In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor and then extrapolating backwards to infer the properties of that ancestor. The comparative method may be contrasted with the method of internal reconstruction in which the internal development of a single language is inferred by the analysis of features within that language. [1] Ordinarily, both methods are used together to reconstruct prehistoric phases of languages to fill in gaps in the historical record of a language to discover the development of phonological, morphological and other linguistic systems and to confirm or to refute hypothesised relationships between languages.

The comparative method was developed over the 19th century. Key contributions were made by the Danish scholars Rasmus Rask and Karl Verner and the German scholar Jacob Grimm. The first linguist to offer reconstructed forms from a proto-language was August Schleicher, in his Compendium der vergleichenden Grammatik der indogermanischen Sprachen, originally published in 1861. [2] Here is Schleicher's explanation of why he offered reconstructed forms: [3]

In the present work an attempt is made to set forth the inferred Indo-European original language side by side with its really existent derived languages. Besides the advantages offered by such a plan, in setting immediately before the eyes of the student the final results of the investigation in a more concrete form, and thereby rendering easier his insight into the nature of particular Indo-European languages, there is, I think, another of no less importance gained by it, namely that it shows the baselessness of the assumption that the non-Indian Indo-European languages were derived from Old-Indian (Sanskrit).


Water scarcity is one of the most pressing issues facing agriculture today. In many countries, water for agriculture consumes about 70% of the total fresh water use. To meet the needs of a growing population, more food must be produced with less water [1]. Rice (Oryza sativa L.) is the primary source of food for more than half of the world's population. Rice is cultivated in highly diverse situations that range from flooded wetland to rainfed dryland [2]. Irrigated rice which accounts for 55 percent of the world rice area provides 75% of global rice production and consumes about 90% of the freshwater resources used for agriculture in Asia [3]. Water deficit is therefore a key constraint that affects rice production in different countries. Severe drought can reduce seriously rice production, leading to catastrophic crop failure [4]. There is a need to improve drought tolerance in rice to have sustainable rice production in water-limiting areas [5]. An understanding of the underlying physiological and molecular mechanisms is necessary to improve the adaptation of rice varieties to drought-prone environments [5, 6]. Progress has been made in detecting large effect quantitative trait loci (QTL) conferring drought tolerance in lowland and irrigated rice [5]. Still relatively limited information is available about the genetics and molecular control of drought tolerance.

Previous studies on genetics of drought tolerance in rice were primarily based on the analysis of mapping populations derived from parents of contrasting level of drought tolerance [7–9]. However, the heterogeneous genetic backgrounds of tolerant and susceptible germplasm often obscure the relationship between genetic variation and drought tolerance phenotypes. A more desirable approach is to use genetic stocks with a common genetic background but contrasting levels of tolerance to drought stress. Through selection in IRRI's drought breeding program, a set of advanced backcross lines was developed by backcrossing Aday Selection (AdaySel), a traditional variety to popular variety IR64 [10]. IR64 is the most widely grown rice variety in the tropical areas it carries many valuable agronomic traits but is highly sensitive to drought stress [11]. Two pairs of NILs in the background of IR64 with contrasting drought tolerance were selected from [12]: a) IR77298-14-1-2-B family: IR77298-14-1-2-B-10 (highly drought-tolerant) vs IR77298-14-1-2-B-13 (susceptible), and b) IR77298-5-6-B family: IR77298-5-6-B-18 (moderately drought-tolerant) and IR77298-5-6-B-11 (highly susceptible). These advanced backcross lines are considered pre-near isogenic lines because they are sister lines derived from a single family segregating for drought tolerance.

One important aspect for understanding drought tolerance is the response of root growth and development to water-deficit conditions [13]. Roots are important for maintaining crop yields, vital when plants are grown in soils containing insufficient supplies of water or nutrients [14], and one of the primary sites for stress signal perception that initiates a cascade of gene expression responses to drought [15, 16]. Previous studies showed that plant growth largely depends on the severity of the stress mild water deficit leads to growth inhibition of leaves and stems, whereas roots may continue to elongate [17]. Furthermore, root architecture is a key trait for dissecting the genotypic differences in rice responses to water deficit [13]. A variety of studies were carried out on the gene expression patterns of roots in common bean [18], sunflower [19], Arabidopsis [20, 21], maize [22] and other plants under drought stress. Gene expression profiles of upland and lowland rice for drought stress have been reported [23, 24], but these studies focused on comparing gene expression profiles of genotypes at seedling stage in a single stress condition. Currently, little is known about expression patterns in root under different levels of water deficit in drought-tolerant and susceptible genotypes at reproductive stage. In this study, we used the Agilent 4 × 44 K oligoarray system to conduct transcript profiling in root of two pairs of rice NILs exhibiting large differences in their yield and physiological and phenological traits under drought stress at reproductive stage. Our results suggest a greater number of DEGs in roots of highly tolerant NIL, IR77298-14-1-2-B-10 compared to other NILs in response to severe drought stress. Genes related to cell growth were mostly down-regulated, while those related to ABA biosynthesis, proline metabolism, ROS-scavenging enzymes and carbohydrate metabolism were highly activated in tolerant NILs. Despite their common genetic background (

97%) as backcross progeny from Aday Sel × IR64, the two pairs of NILs show distinctive differences in their gene expression profiles in response to drought stress.

Comparative phylogenetic analyses uncover the ancient roots of Indo-European folktales

Ancient population expansions and dispersals often leave enduring signatures in the cultural traditions of their descendants, as well as in their genes and languages. The international folktale record has long been regarded as a rich context in which to explore these legacies. To date, investigations in this area have been complicated by a lack of historical data and the impact of more recent waves of diffusion. In this study, we introduce new methods for tackling these problems by applying comparative phylogenetic methods and autologistic modelling to analyse the relationships between folktales, population histories and geographical distances in Indo-European-speaking societies. We find strong correlations between the distributions of a number of folktales and phylogenetic, but not spatial, associations among populations that are consistent with vertical processes of cultural inheritance. Moreover, we show that these oral traditions probably originated long before the emergence of the literary record, and find evidence that one tale (‘The Smith and the Devil’) can be traced back to the Bronze Age. On a broader level, the kinds of stories told in ancestral societies can provide important insights into their culture, furnishing new perspectives on linguistic, genetic and archaeological reconstructions of human prehistory.

1. Introduction

Recent investigations into the evolution of cultural diversity suggest that relationships among many languages [1–4], social behaviours [5–7] and material culture traditions [8–10] often reflect deep patterns of common ancestry that can be traced back hundreds or even thousands of years. In this study, we explore these relationships in a universally important and richly documented cultural domain: storytelling [11,12]. Theories concerning possible relationships between storytelling traditions and the descent histories of populations have a long pedigree, and were central to the concerns of pioneering folklorists in the nineteenth century. For example, Wilhelm Grimm argued that the traditional German tales that he and his brother Jacob had compiled were remnants of an ancient Indo-European cultural tradition that stretched from Scandinavia to South Asia: ‘The outermost lines [of common heritage in stories] … are coterminous with those of the great race which is commonly called Indo-Germanic, and the relationship draws itself in constantly narrowing circles round the settlements of the Germans … It is my belief that the German stories do not belong to the northern and southern parts of our fatherland alone but that they are the absolutely common property of the nearly related Dutch, English and Scandinavians’ [13], p. 576.

To date, however, efforts to investigate the descent histories of narrative traditions have been complicated by two main problems. Firstly, tales are not only transmitted ‘vertically’ from ancestral populations to their descendants but also spread ‘horizontally’ between contemporaneous societies as a result of trade, conquest and the dissemination of literary texts, profoundly disrupting the neat concentric patterns of common heritage envisaged by Grimm [14,15]. Secondly, given that folktales have been mainly transmitted through oral means, there is scant evidence to investigate their origins and historical distributions using conventional literary-historical methods. While Grimm believed that many folktales were likely to be thousands of years old, only a tiny minority can be traced back to before the emergence of the literary fairy tale in the sixteenth and seventeenth centuries. This has led to intense debates about the presumed antiquity of traditional tales [16], with some researchers claiming that many canonical fairy tales may actually be relatively recent literary inventions [17,18].

Here, we tackle these problems using quantitative phylogenetic methods that were initially developed in biology and have been recently employed to investigate the relationships between population histories and a number of cultural phenomena, such as languages [1,2,4], marriage practices [7], political institutions [19] material culture [8–10,20] and music [21]. Phylogenetic methods have also been applied to folklore to analyse cross-cultural distributions of international tale types/variants, and examine their relationships to spatial, genetic and linguistic patterns [22–25]. This research suggests that similarities among folktale corpora are correlated with both population histories and geographical proximity. However, no study has yet attempted to disentangle the specific legacies of common descent and regional diffusion, or to investigate how far back lineages of vertical transmission can be traced. In this paper, we address these issues directly.

2. Material and methods

2.1 Data

Data for our study were sourced from the Aarne Thompson Uther (ATU) Index—a catalogue of over 2000 distinct, cross-culturally stable ‘international tale types’ distributed among more than 200 societies [26]. We focused on ‘Tales of Magic’ (ATU 300–ATU 749), a category of stories featuring beings and/or objects with supernatural powers. We concentrated on magic tales as they represent the largest and most widely shared group of tales, and because they include the canonical fairy tales, which have been the main focus of debates about the origins of folktales [16]. We recorded the presence/absence of each these tales (n=275) in 50 Indo-European-speaking populations represented in the ATU Index (electronic supplementary material, table S1). We selected these populations as both their oral traditions [15] and their phylogenetic relationships [2,3] have been more intensively studied than any other group of cultures.

2.2 Trees

Following previous phylogenetic comparative studies of cultural traits [7,19,27,28], we employed language trees as a model for population histories. This approach is based on the well-established correspondences between population dispersals and the diversification of linguistic lineages [27]. Language trees represent an especially suitable model for the study of folktale inheritance since the latter consists of verbally transmitted traditions.

Trees for our study were sourced from Bouckaert et al.’s [2,29] Bayesian phylogenetic analyses of Indo-European languages. First, we matched each population included in our dataset with one of Bouckaert et al.’s linguistic groups. Next, we pruned the trees to remove taxa for which there was no corresponding folktale corpus except Hittite, an ancient Anatolian population that spoke a language considered to be an outgroup of the Indo-European language family [2,3,7,30]. Hittite was retained to root the trees for the purposes of the analyses described below (electronic supplementary material, figure S1).

2.3 Testing for phylogenetic signal

To test for signatures of vertical transmission, we measured how well the distribution of each tale could be accounted for by the populations’ linguistic relationships using Fritz and Purvis’ D statistic [31]. D is a measure of phylogenetic signal that expresses the number of character changes in a binary trait on a tree scaled by two null distributions: one in which character states are randomly reshuffled among the tips of the tree, and one where the character evolves under a selectively neutral, Brownian model of evolution. A D of 0 indicates that the distribution of character states among the taxa is what would be expected for a neutral trait under a purely vertical mode of inheritance, while values approaching 1 approximate a phylogenetically random distribution. D scores lower than 0 imply greater levels of phylogenetic conservatism than would be anticipated under a Brownian model, while scores higher than 1 suggest overdispersion. The phylogenetic signal indicated by a D score can be statistically assessed by testing whether the number of character state changes required for a trait is significantly lower than would be expected by chance, based on the distribution of values returned by the random model.

D values for each tale in our sample were estimated using the phylo.d function in the caper package in R [32]. We simulated the evolution of each trait 1000 times under both null models on a majority-rules consensus tree, which was calculated from the tree sample and rooted using Hittite as an outgroup (electronic supplementary material, figure S1). Tales were coded as present or absent in each population based on the information contained in the ATU Index (electronic supplementary material, table S1). As no folktale data are available for Hittite and the program does not allow for missing data, all tales were initially coded as absent in the outgroup. The results of the analyses were then checked by re-analysing the data with states for Hittite coded as present.

2.4 Autologistic analyses

Our second set of analyses tested whether phylogenetic signatures identified in the D analyses remained robust when accounting for the populations’ spatial relationships. Since many closely related Indo-European populations are also geographical nearest neighbours (figure 1), it is possible that the apparent non-random clumping of tales on the phylogeny may be the result of regional diffusion between societies. To address this issue, we employed an approach developed by Towner et al. [33] for fitting binary cultural traits to an autologistic model built on phylogenetic and spatial neighbour graphs (electronic supplementary material, figure S2). The model predicts the probability of a trait being present or absent in any given society from the state of the trait in its surrounding spatial and phylogenetic neighbours. The influences of these local dependencies are measured by parameters for phylogenetic (λ) and spatial (θ) proximity, with a level parameter (β) employed to control for frequency-of-occurrence. The likelihood of each parameter is estimated through MCMC simulations using the Gibbs sampler [34] to generate trait states (see [33] for a detailed explanation).

Figure 1. Approximate locations of Indo-European-speaking populations in Eurasia. Points are colour-coded by linguistic subfamily: red, Germanic pink, Balto-Slavic orange, Romance green, Celtic blue, Indo-Iranian Turquoise, Hellenic grey, Albanian brown, Armenian. Numbers correspond to point references for populations listed in the electronic supplementary material, table S2.

For the purposes of our analyses, we constructed a phylogenetic neighbour graph based on membership of the same linguistic subfamily (i.e. Romance, Germanic, Balto-Slavic, Celtic and Indo-Iranian) and a spatial neighbour graph based on distances between point references for each society (electronic supplementary material, table S2). The graphs included all the populations for which folktale data were available except for Romani, who, as a highly dispersed ethnic group, could not be identified with a specific geographical location. We sought to match the average number of spatial neighbours as closely as possible to the average number of phylogenetic neighbours, because a large disparity in the connectedness of the networks might confound any comparison of their effects on tale distributions [33]. We determined that linking societies located within a 1000 km radius of one another produced a spatial neighbour graph with a similar average number of neighbours (15.3) as the linguistic neighbour graph (13.6). There was substantial overlap between the two neighbour graphs, with 110 pairs of societies being both spatial and linguistic neighbours. Nevertheless, there was a sufficient number of unique spatial neighbour pairs (n=265) and unique linguistic neighbour pairs (n=224) to separate the effects of the two graphs on the tale distributions. Following initial tuning of parameter priors, 25 000 Gibbs realizations were sampled from 51 000 Markov chain Monte Carlo (MCMC) generations at an interval of two, with the first 1000 generations discarded as burn-in. The analyses were performed in R using the code written by Towner et al. [33].

2.5 Reconstructing ancestral states

To establish how far back shared folktales could be traced in Indo-European oral traditions, we mapped the evolutionary histories of the most phylogenetically conserved tales identified from the D and autologistic analyses using two models of discrete trait evolution implemented in M esquite v. 3.02 [35]: (i) a Markov k-state one parameter model (Mk1), which estimates a single instantaneous rate of change for both gains and losses, given the distribution of the focal trait, a tree and set of branch lengths (ii) an asymmetrical Markov k-state 2 parameter model (Mk2), which estimates separate rates for gains and losses on the tree. The most suitable model for each tale was selected on the basis of an asymmetrical likelihood ratio test. To incorporate uncertainty in Indo-European phylogenetic relationships and branch lengths, the tales were traced on every tree contained in our sample of 1000 Bayesian language phylogenies. Ancestral states were inferred for the nodes contained in a majority-rules consensus tree, which was rooted using Hittite as an outgroup. As no data on Hittite magic tales were available, trait states were coded as missing so that they did not bias the outcome of the analyses. The likelihood of any given tale having existed in a hypothetical ancestral population was calculated by estimating the average likelihood of the tale’s presence in the corresponding node across the tree, multiplied by the posterior probability of the node itself (i.e. its frequency in the tree sample figure 2).

Figure 2. Reconstructing tale descent histories. Example of an ancestral state reconstruction, showing ATU 330 ‘The Smith and the Devil’ traced on a consensus tree derived from 1000 Bayesian language trees. The proportion of black shading in each internal node represents the average probability of the tale being present in the corresponding hypothetical ancestor across the tree sample. The proportion of red shading in each node represents the number of trees in which the corresponding hypothetical ancestor was absent. Branches are colour-coded by linguistic subfamily. The oldest ancestral node that was reconstructed, Proto-Indo-European, is labelled ‘PIE’.

An additional set of Bayesian analyses were carried out on tales inferred as being potentially present in the populations’ hypothetical last common ancestor, ‘Proto-Indo-European’. We targeted this node for further investigation for two reasons: firstly, to test the support for the deepest reconstructions suggested by the analyses described above and secondly, to control for the higher degree of phylogenetic uncertainty toward the root of the Indo-European language tree, which can be more effectively addressed within a Bayesian framework. Instead of calculating transition rates that maximize the likelihood of a trait distribution for each individual tree and then averaging the likelihood of it being present or absent at a particular node across the tree sample, the Bayesian approach estimates a posterior probability of ancestral states that integrates uncertainty about both transition rates and phylogenetic relationships simultaneously [28,36]. The posterior probability is obtained by recording ancestral states at regular intervals during a MCMC simulation, in which the trees and transition rates used to map the trait are sampled in proportion to their probabilities. We carried out the analyses using the Multistate model implemented in the software package B ayes T raits v. 2.0 [36], using the same sample of 1000 Indo-European language trees and data on tale distributions from the ATU Index [26] as our previous analyses. Two sets of analyses were performed. The first estimated the posterior probability of each tale being present in Proto-Indo-European using the ‘most recent common ancestor’ command. The second analysis tested the relative support for each tale being present or absent by ‘fossilizing’ (i.e. fixing) the node in each state, and comparing the likelihood of the two models using Bayes Factors [37]. All the analyses employed uniform priors, the range of which was determined empirically following a maximum-likelihood analysis. The MCMC chains ran for 1 000 000 iterations, every 1000th of which was sampled into the posterior distribution following a burn-in period.

3. Results

D values for the 275 tales in our sample ranged from −2.06 to 3.9, with 100 tales exhibiting a higher degree of phylogenetic clumping than would be expected by chance (α=0.05) (electronic supplementary material, table S3). These results were stable whether trait states in the outgroup taxon, Hittite, were coded as present or absent.

When fitted to the autologistic model, the distributions of 81 of the 100 tales that returned a significant phylogenetic signal in the D analysis were positively associated with the populations’ linguistic affiliations (electronic supplementary material, table S4). Only 36 tales were positively associated with spatial proximity, while in 56 cases tales were found to be less likely to be shared among societies who are spatial neighbours. Overall, the autologistic analyses suggested that vertical transmission was more important than horizontal transmission in 76 tales (figure 3 and table 1).

Figure 3. Estimates for phylogenetic and spatial association in the autologistic analyses. Scatter plot of phylogenetic (λ) and spatial (θ) parameters estimated for 100 tales that returned a strong phylogenetic signal in the D analyses when fitted to the autologistic model.

Table 1. Effects of phylogenetic and spatial association on tale distributions estimated by the autologistic model. Numbers in the cells represent the number of tales affected positively, negatively or neutrally by spatial (Spa) and phylogenetic associations (Phy) among populations.

Ancestral states were inferred for the 76 most phylogenetically conserved tales identified in the D and autologistic analyses. All the tales except two could be traced back to at least one of the hypothetical common ancestors represented in figure 2 with a probability of greater than 50%, 71 of which could be inferred with a high degree of confidence (greater than or equal to 70% likelihood) [28] (electronic supplementary material, table S5). Fifty tales were reconstructed as having been present in the last common ancestor of one or more major Indo-European sub-families with a likelihood more than 50%, with 31 at 70% or higher (figure 4). Nineteen tales could be traced back to even earlier ancestral populations with a likelihood of more than 50%, including four that were inferred in the last common ancestor of all the populations included in the sample (Proto-Indo-European). However, only a small proportion of tales could be securely reconstructed in these groups, with four tales in Proto-Italic-Celtic and Proto-Italic-Celtic-Germanic, two in Proto-Western-European and no tales in Proto-Indo-European surpassing 70% likelihood.

Figure 4. Estimated contents of ancestral tale corpora. Reconstruction of ancestral Indo-European tale corpora based on analyses of the 76 most phylogenetically conserved tales. Tales contained in each box were reconstructed with a more than 50% likelihood of being present in the corresponding ancestral tale corpus whereas tales in bold represent cases where tales could be securely reconstructed (greater than or equal to 70%). Full results for the ancestral state reconstructions are provided in the electronic supplementary material, table S5. Asterisks denote reconstructions in Proto-Indo-European are based on the results of Bayesian analyses (table 2).

The Bayesian ancestral state reconstructions failed to support the presence of three out of the four tales that were tentatively inferred in Proto-Indo-European (table 2). However, the analyses reconstructed one tale, ATU 330 ‘The Smith and the Devil’, in this corpus, with a posterior probability of 87%. A fossil test returned positive support for the presence of ATU 330 (Bayes Factor 3.59).

Table 2. Results of the Bayesian analyses of Proto-Indo-European tales. Posterior probabilities for the presence/absence of tales reconstructed in Proto-Indo-European were obtained from a most recent common ancestor analysis, performed in B ayes T raits (v. 2) [36]. The relative support for each possibility was further assessed by a fossil test. Bayes Factor support for the presence of each tale was evaluated using the interpretive framework suggested by Kass & Rafftery [37].

4. Discussion

Our analyses of the distributions of Tales of Magic among Indo-European-speaking populations bear out the observations of previous researchers concerning the complex spatial and historical patterning of the international folktale record [14,15,24,25]. Nevertheless, they show that it is still possible to uncover deep signatures of common descent in the folktale traditions of related populations. The results of the D analyses suggested that a substantial number of tales (100 of 275) exhibit significant correlations with linguistic relationships that are consistent with vertical processes of cultural inheritance. The majority of these correlations (76 out of 100) remained robust even after accounting for spatial relationships among linguistically related Indo-European groups in the autologistic analyses. In fact, in most of these cases, spatial proximity appears to have had a negative effect on the tales’ distributions, suggesting that societies were more likely to reject than adopt these stories from their neighbours.

The latter finding contrasts with previous research that reports much stronger evidence for the spatial diffusion of folktales between neighbouring populations. A study by Ross et al. [24] found that similarities among European variants of the tale ‘The Kind and Unkind Girls’ (ATU 480) are strongly correlated with geographical proximity independently of linguistic relationships, but not vice versa. Another more recent study by Ross & Atkinson [25] suggests that the distributions of shared tale types among Arctic hunter–gather societies are predicted by both geographical and linguistic associations, with the former being more influential. However, it is important to emphasize that we only compared spatial versus phylogenetic effects for tales that had already been screened for a phylogenetic signal (in order to determine whether that signal was genuine). It is highly plausible that horizontal transmission played a much greater role in the tales whose distributions were not predicted by linguistic relationships in the D analyses—which included ATU 480 ‘The Kind and Unkind Girls’, consistent with Ross et al.’s [24] findings. This raises a more general question about why populations seem to readily adopt some tales from their neighbours, while apparently rejecting others. Theoretical studies of cultural evolution suggest that patterns of cultural diversity are often shaped by parochial transmission biases (e.g. conformism, neophobia) that inhibit the exchange of information between groups and preserve local distinctions [6,38–40]. However, relatively little work has examined the extent to which these biases target particular kinds of traits, or the circumstances under which they might be relaxed [9,41,42]. While the answers to these questions lie beyond the scope of this study, our findings regarding the differentiated phylogenetic and spatial distributions of folktales provide a rich context for further investigation into these problems.

The durability of the phylogenetic signatures returned by the D analysis and autologistic tests, highlighted by the ancestral state reconstructions, revealed the existence of shared ancestral traditions in each of the major clades of the Indo-European family (figure 4). The results of these analyses have major implications for current debates concerning the origins of Tales of Magic [16,17]. Whereas most folklorists since Grimm believe that written versions of fairy tales were originally derived from oral tradition, some literary scholars [17,18] have claimed that there is very little evidence to support the precedence of oral traditions over literary ones and argued that it is unlikely that these stories could have been transmitted intact for so many generations without the support of written texts. Our findings contradict the latter view, and suggest that a substantial number of magic tales have existed in Indo-European oral traditions long before they were first written down (electronic supplementary material, table S5). For example, two of the best known fairy tales, ATU 425C ‘Beauty and the Beast’ and ATU 500 ‘The Name of the Supernatural Helper’ (‘Rumplestiltskin’) were first written down in the seventeenth and eighteenth centuries [43]. While some researchers claim that both storylines have antecedents in Greek and Roman mythology [44,45], our reconstructions suggest that they originated significantly earlier. Both tales can be securely traced back to the emergence of the major western Indo-European subfamilies as distinct lineages between 2500 and 6000 years ago [2,3], and may have even been present in the last common ancestor of Western Indo-European languages (figure 4).

In general, the number of tales that could be inferred in ancestral tale corpora decreases as they approach the root of the tree, with a concomitant decline in the reliability of these reconstructions. Although fourteen tales were inferred as present in Proto-Western-Indo-European (more than 50% likelihood), only two had a likelihood of more than 70%. Four tales were inferred as having a greater than 50% likelihood of being present in Proto-Indo-European, none of which had a likelihood of more than 70%. While the phylogenetic signal of a tale is bound to be eroded over time by transmission errors, competition with other tales, population turnover and diffusion between groups, the reconstruction of very ancient Indo-European tale traditions is further problematized by the uncertainty associated with deeper nodes in the tree. Thus, whereas the hypothetical ancestors for Proto-Romance, Proto-Germanic, Proto-Celtic and Proto-Indo-Iranian have a posterior probability of 100% in our tree sample, the corresponding value for Proto-Western-Indo-European is 90%, falling to 77% for Proto-Indo-European. However, despite these limitations, we were able to trace the inheritance of several tales deep into Indo-European prehistory, securely reconstructing them in the tale corpora of Proto-Italo-Celtic (ATU 328, ATU 330, ATU 402 and ATU 554), Proto-Italo-Celtic-Germanic (ATU 328, ATU 330, ATU 402 and ATU 554) and Proto-Western-European (ATU 330 and ATU 554). Even more remarkably, the Bayesian analyses were able to infer the presence of one tale, ATU 330 ‘The Smith and the Devil’, in the last common ancestor of the Indo-European family, Proto-Indo-European (table 2).

In sum, the results of the ancestral state reconstructions demonstrate that phylogenetic comparative methods can yield penetrating insights into the contents of ancient tale corpora which are difficult to access using conventional literary-historical approaches. Of course, this does not diminish the value of excavating the literary record for evidence about the origins and development of oral tales. Indeed, research carried out in this vein can supply extremely useful means of cross-checking the results of comparative phylogenetic analyses. For example, research into tale types and motifs in Graeco-Roman, Germanic and Celtic mythology support the antiquity of many of the magic tales that we reconstructed in ancestral Indo-European populations (electronic supplementary material, table S5). These data provide useful materials for further efforts to validate our findings. Ancient variants could be used to calibrate phylogenetic analyses of specific tale types [22,23] in the same way that ancient languages are used to date the origins of linguistic families [2,3]. Hypotheses concerning the descent history of a given international type (e.g. ATU 330) could then be tested against the structure of phylogenetic relationships and estimated root age inferred from different historical and cultural versions of the tale.

In some cases, it may also be possible to evaluate inferences about ancestral tale corpora in relation to other sources of information about past societies, such as historical, archaeological, linguistic and genetic data. Our findings regarding the origins of ATU 330 ‘The Smith and the Devil’ are a case in point. The basic plot of this tale—which is stable throughout the Indo-European speaking world, from India to Scandinavia—concerns a blacksmith who strikes a deal with a malevolent supernatural being (e.g. the Devil, Death, a jinn, etc.). The smith exchanges his soul for the power to weld any materials together, which he then uses to stick the villain to an immovable object (e.g. a tree) to renege on his side of the bargain [26]. The likely presence of this tale in the last common ancestor of Indo-European-speaking cultures resonates strongly with wider debates in Indo-European prehistory, since it implies the existence of metallurgy in Proto-Indo-European society. This inference is consistent with the so-called ‘Kurgan hypothesis’, which links the origins of the Indo-European language family to archaeological and genetic evidence of massive territorial expansions made by nomadic pastoralist tribes from the Pontic steppe 5000–6000 years ago [3,46–48]. The association of these peoples with a Bronze Age technological complex, as reconstructed from material culture data [49] and palaeo-linguistic inferences of PIE vocabulary (which include a putative word for metal, aios) [50], suggests a plausible context for the cultural evolution of a tale about a cunning smith who attains a superhuman level of mastery over his craft. By contrast, the presence of this story in PIE society appears to be incompatible with the alternative ‘Anatolian hypothesis’ of Indo-European origins. The latter proposes a much earlier and more gradual process of demic diffusion associated with the spread of agriculture from Neolithic Anatolia 8000–9000 years ago [51]—prior to the invention of metallurgy. However, it should be noted that according to some variants of the model [2,52], the lineage leading to all surviving Indo-European languages may have diverged from the now extinct Anatolian languages as recently as 7000–5500 B.C.E, a range which overlaps with the earliest archaeological evidence for smelting at numerous sites in Eurasia [53]. Consequently, a Bronze Age origin for ATU 330 seems plausible under both major models of Indo-European prehistory.

On a more general level, this example highlights how the kinds of stories told in ancient populations often reflect broader features of their cultures. While the content of ATU 330 is most obviously relevant to the technological capabilities of Proto-Indo-European society, anthropologists have long speculated that folktales may preserve other kinds of information about the ancestral contexts in which they originated, such as social organization, subsistence practices and religion [14,54]. Comparative phylogenetic methods provide a powerful set of tools with which to investigate these hypotheses more scientifically. We anticipate that future studies in this area will not only shed new light on the origins of fairy tales, myths, legends and other types of traditional narrative, but also offer novel and complementary perspectives on archaeological, genetic and linguistic reconstructions of the past.

Data accessibility

The datasets supporting this article have been uploaded as part of the electronic supplementary material.

Communication Across Kinds

In a multiple plant species study, researchers demonstrated that plants could communicate stress cues to their neighbors that were in direct contact underground. Further, unstressed plants receiving stress cues were able to pass those cues to other nearby plants whose roots they touched. Using three types of grasses and the common garden pea plant, the researchers planted multiple of each in pots. They then subjected one of the plants in the pot to drought, while the others were watered. The two nearby plants both responded to the stress they sensed in their neighbors by closing their stomata to minimize water loss. However, within 24 hours, the neighboring plants had acclimated to the stress cues from their neighbor and lack of stress and reopened their stomata.9 While this study did not involve mycorrhiza, the researchers did speculate about the role mycorrhizal networks play in plant communication. It would be interesting to repeat this experiment but replace the root touch with a mycorrhizal network.


The inequities in healthcare access and poor health experienced by Indigenous peoples in Canada and worldwide are well known and documented, however, rarely has relevant evidence translated into improved health. The root causes of inequitable access to healthcare are complex and can be better addressed when understood within the social, historical, and political contexts. Understandings of access to healthcare from a biomedical perspective and the expectation that biomedical solutions alone can address barriers to access are insufficient and will not be effective in addressing these barriers (Gracey & King, 2009 Peiris et al., 2008 ). Postcolonial theoretical perspectives draw attention to the context surrounding inequities in access to healthcare, providing a more effective and compelling framework for understanding ‘how health, healing, and human suffering are woven into the fabric of the socio-historical-political context’ (Browne et al., 2005 , p. 19).

As nurses, we can no longer be complicit in perpetuating colonial structures and relationships that undermine Indigenous peoples health and access to healthcare. As we have argued, a postcolonial analysis uniquely frames our understanding of healthcare as a social space and social relationship. By situating access within a social domain, the role nurses can and ought to play in addressing access inequities becomes clear. Incorporating critical self-reflection and integrating cultural safety approaches into nursing practice and furthering the development of contextual knowledge gained through postcolonial-informed and other critical inquiry are key to addressing inequities in access to healthcare among Indigenous peoples both within and beyond our Canadian borders (Anderson et al., 2009 ). We, as humans and as nurses, have this capacity woven into our being: that which ‘gives rise to a realization that there is something wrong with the way things are, and that it is possible to change for the better’ (Chinn & Kramer, 2008 , p. 79).

Fate mapping the embryo

By the late 1800s, the cell had been conclusively demonstrated to be the basis for anatomy and physiology. Embryologists, too, began to base their field on the cell. One of the most important programs of descriptive embryology became the tracing of cell lineages: following individual cells to see what they become. In many organisms, this fine a resolution is not possible, but one can label groups of cells to see what that area of the embryo will become. By bringing such studies together, one can construct a fate map. These diagrams “map” the larval or adult structure onto the region of the embryo from which it arose. Fate maps are the bases for experimental embryology, since they provide researchers with information on which portions of the embryo normally become which larval or adult structures. Fate maps of some embryos at the early gastrula stage are shown in Figure 1.6. Fate maps have been generated in several ways.

Figure 1.6

Fate maps of different vertebrate classes at the early gastrula stage. All views are dorsal surface views (looking 𠇍own” on the embryo at what will be its back). Despite the different appearances of these adult animals, their fate maps (more. )

Observing living embryos

In certain invertebrates, the embryos are transparent, have relatively few cells, and the daughter cells remain close to one another. In such cases, it is actually possible to look through the microscope and trace the descendants of a particular cell into the organs they generate. This type of study was performed about a century ago by Edwin G. Conklin. In one of these studies, he took eggs of the tunicate Styela partita, a sea squirt that resides in the waters off the coast of Massachusetts, and he patiently followed the fates of each cell in the embryo until they differentiated into particular structures (Figure 1.7 Conklin 1905). He was helped in this endeavor by the peculiarity of the Styela egg, wherein the different cells contain different pigments. For example, the muscle-forming cells always had a yellow color. Conklin's fate map was confirmed by cell removal experiments. Removal of the B4.1 cell (which should produce all the tail musculature), for example, resulted in the absence of tail muscles (Reverberi and Minganti 1946).

Figure 1.7

Fate map of the tunicate embryo. (A) The 1-cell embryo (left), shown shortly before the first cell division, with the fate of the cytoplasmic regions indicated. The 8-cell embryo on the right shows these regions after three cell divisions. (B) A linear (more. )


1.2 Conklin's art and science. The plates from Conklin's remarkable 1905 paper are online. Looking at them, one can see the precision of his observations and how he constructed his fate map of the tunicate embryo.


The compound microscope. The compound microscope has been the critical tool of developmental anatomists. Mastery of microscopic techniques allows one to enter an entire world of form and pattern. [Click on Microscope]

Vital dye marking

Most embryos are not so accommodating as to have cells of different colors. Nor do all embryos have as few cells as tunicates. In the early years of the twentieth century, Vogt (1929) traced the fates of different areas of amphibian eggs by applying vital dyes to the region of interest. Vital dyes will stain cells but not kill them. He mixed the dye with agar and spread the agar on a microscope slide to dry. The ends of the dyed agar would be very thin. He cut chips from these ends and placed them onto a frog embryo. After the dye stained the cells, the agar chip was removed, and cell movements within the embryo could be followed (Figure 1.8).

Figure 1.8

Vital dye staining of amphibian embryos. (A) Vogt's method for marking specific cells of the embryonic surface with vital dyes. (B𠄽) Dorsal surface views of stain on successively later embryos. (E) Newt embryo dissected in a medial sagittal section (more. )

Radioactive labeling and fluorescent dyes

A variant of the dye marking technique is to make one area of the embryo radioactive. To do this, a donor embryo is usually grown in a solution containing radioactive thymidine. This base will be incorporated into the DNA of the dividing embryo. A second embryo (the host embryo) is grown under normal conditions. The region of interest is cut out from the host embryo and replaced by a radioactive graft from the donor embryo. After some time, the host embryo is sectioned for microscopy. The cells that are radioactive will be the descendants of the cells of the graft, and can be distinguished by autoradiography. Fixed microscope slides containing the sectioned tissues are dipped into photographic emulsion. The high-energy electrons from the radioactive thymidine will reduce the silver ions in the emulsion (just as light would). The result is a cluster of dark silver grains directly above the radioactive region. In this manner, the fates of different regions of the chick embryo have been determined (Rosenquist 1966).

One of the problems with vital dyes and radioactive labels is that they become diluted at each cell division. One way around this problem was the creation of fluorescent dyes that were extremely powerful and could be injected into individual cells. Fluorescein-conjugated dextran, for example, could be injected into a single cell of an early embryo. The descendants of that cell could then be seen by examining the embryo under ultraviolet light (Figure 1.9). More recently, diI, a powerfully fluorescent molecule that becomes incorporated into lipid membranes, has also been used to follow the fates of cells and their progeny.

Figure 1.9

Fate mapping using a fluorescent dye. (A) Specific cells of a zebrafish embryo were injected with a fluorescent dye that will not diffuse from the cells. The dye was then activated by laser in a small region (about five cells) of the late cleavage stage (more. )

Genetic marking

The problems with radioactive and vital dye marking include their dilution over many cell divisions and the laborious preparation of the slides. One permanent way of marking cells is to create mosaic embryos having different genetic constitutions. One of the best examples of this technique is the construction of chimeric embryos, consisting, for example, of a graft of quail cells inside a chick embryo. Chick and quail develop in a very similar manner (especially during early embryonic development), and a graft of quail cells will become integrated into a chick embryo and participate in the construction of the various organs. The substitution of quail cells for chick cells can be performed on an embryo while it is still inside the egg, and the chick that hatches will have quail cells in particular sites, depending upon where the graft was placed. The quail cells differ from the chick's in two important ways. First, the quail heterochromatin in the nucleus is concentrated around the nucleoli, making the quail nucleus easily distinguishable from chick nuclei. Second, there are cell-specific antigens that are quail-specific and can be used to find individual quail cells, even if they are in a large population of chick cells. In this way, fine-structure maps of the chick brain and skeletal system can be made (Figure 1.10 Le Douarin 1969 Le Douarin and Teillet 1973).

Figure 1.10

Genetic markers as cell lineage tracers. (A) Grafting experiment wherein the cells from a particular region of a 1-day quail embryo have been placed into a similar region of a 1-day chick embryo. (B) After several days, the quail cells can be seen by (more. )


Histotechniques. Most cells must be stained in order to see them different dyes stain different types of molecules. Instructions on staining cells to observe particular structures (such as the nucleus) are given here. [Click on Histotechniques]

Water moves into the root from the soil and then steady it moves into the root xylem, creating a column of water, which is progressively pushed upwards.

Evaporation of water molecules from the cells of a leaf (see the image given above) creates a suction process, which pulls water from the xylem cells of roots this process keeps going on.

The loss of water in the form of vapor from the leaves (i.e. aerial parts) of the plant is known as transpiration.

Transpiration, likewise, helps in the absorption and upward movement of water and minerals dissolved in it from roots to the leaves.

Transpiration also helps in the temperature regulation (in plants).

The transport of soluble products of photosynthesis is known as translocation, which occurs in the part of the vascular tissue known as phloem.

Along with photosynthesis products, the phloem also transports amino acids and other substances, which are ultimately delivered to roots, fruits, seeds, and to growing organs.

A comparison of automatic cell identification methods for single-cell RNA sequencing data

Background: Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification.

Results: Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods' sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments.

Conclusions: We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.

Keywords: Benchmark Cell identity Classification scRNA-seq.

Conflict of interest statement

The authors declare that they have no competing interests.


Performance comparison of supervised classifiers…

Performance comparison of supervised classifiers for cell identification using different scRNA-seq datasets. Heatmap…

Complexity of the datasets compared…

Complexity of the datasets compared to the performance of the classifiers. a Boxplots…

Classification performance across the PbmcBench…

Classification performance across the PbmcBench datasets. a Heatmap showing the median F1-scores of…

Classification performance across brain datasets.…

Classification performance across brain datasets. Heatmaps show the median F1-scores of the supervised…

Classification performance across pancreatic datasets.…

Classification performance across pancreatic datasets. Heatmaps showing the median F1-score for each classifier…

Performance of the classifiers during…

Performance of the classifiers during the rejection experiments. a Percentage of unlabeled cells…

Computation time evaluation across different…

Computation time evaluation across different numbers of features, cells, and annotation levels. Line…

Summary of the performance of…

Summary of the performance of all classifiers during different experiments. For each experiment,…

Phylogenetic Tools for Comparative Biology

I just pushed an update go the plot.Qmatrix S3 method in phytools (which is also used to plot "fitMk" and the geiger::fitDiscrete "gfit" object class) to visualize the different transition rates between discrete character states in a fitted Mk model using a &ldquocool&rdquo to &ldquohot&rdquo color gradient.

Here's a quick demo of how it works.

For starters, I'll load phytools which I've updated from it's GitHub page using devtools.

Next I can go ahead and load a tree & dataset.

Simply for convenience, I'll use the tree of Anolis lizards and ecomorphological habitat specialization, as follows.

Now we can go ahead and fit our model.

Because it'll create the most interesting visualization, I'll jump straight to the "ARD" &ldquoall-rates-different&rdquo model, even though this plotting method should work for any model of evolution.

Previously, plot.Qmatrix would've created the following graph (which it still does perfectly fine by default).

par(bg="black",fg="white") plot(fitARD,color=TRUE,lwd=3) title(main="Transition rates between ecomorph states under an ARD model", font.main=3,line=-1,col.main="white")

Just as we can when all rates are plotted using the same color, we can also turn off the plotting of transition rates that are estimated to be zero. Let's revert back to a white background and do exactly that.

Q<-matrix(c( 0, 2,0.5,0.5,0.1,0.1, 1, 0, 2,0.5,0.5,0.1, 0.5, 1, 0, 2,0.5,0.5, 0.5,0.5, 1, 0, 2,0.5, 0.1,0.5,0.5, 1, 0, 2, 0.1,0.1,0.5,0.5, 1, 0), 6,6,byrow=TRUE, dimnames=list(letters[1:6],letters[1:6])) diag(Q)<--rowSums(Q) Q

Watch the video: TIERZÜCHTUNG. Biologie. Genetik und Entwicklungsbiologie (January 2022).