To wrap up my series on tracing the connections between ancient Pueblo sites like Chaco Canyon and the modern Pueblos, I’d like to discuss a type of evidence I haven’t discussed much but that people often ask about: DNA evidence. This is the most direct way to tie one population to another, at least in theory, but it’s actually quite difficult to draw any specific conclusions from the work that has been done so far, and that’s not necessarily going to improve as more research is done. Which is not to say that research along these lines has been worthless; it hasn’t revealed anything inconsistent with data from other sources so far, but that in itself is interesting and provides support for the other approaches that have been tried. Because this is such a huge and important topic, I’ve decided to break my discussion of it into two posts, one on the archaeological study of DNA in general, and another on the application of these techniques to the Southwest in particular.
There are many different types of DNA analyses that can in theory be done, but when it comes to archaeological questions, especially those involving connections between ancient sites and modern people, it is generally necessary to analyze remains excavated by archaeologists. This involves studying what is known as “ancient DNA” (or “aDNA” for short), in addition to the DNA of modern populations. As Connie Mulligan of the University of Florida noted in an article published in American Antiquity in 2006, aDNA studies have a lot of potential but also a lot of challenges. Some of the major issues involved in aDNA research are preservation of the DNA, without which any study has no chance of success, and interpretation of the results of a successful analysis of ancient material.
Because DNA, like any other organic material, decays over time, aDNA studies are more difficult and expensive than DNA studies of modern populations, and in some cases there is simply not enough DNA left in archaeological material to do any analysis at all. Preservation is a function, in part, of local environmental conditions, which in the arid Southwest tend to be favorable for preserving organic material, so this is less of a concern in this area than in many others.
Another major consideration in doing aDNA analysis is contamination. The technique that makes aDNA analysis possible is called Polymerase Chain Reaction (PCR), which involves taking a small amount of DNA and exposing it to a chemical reaction that creates billions of copies which can then be analyzed. This can be enormously useful, but the reaction is very sensitive and if any extraneous organic material is introduced it is likely to reproduce its DNA instead of the ancient DNA, which can totally destroy the validity of the analysis. The main concern with aDNA analyses of human remains is modern human DNA from the researchers themselves, and this has been an issue with many studies. These days the major laboratories that do aDNA analysis have elaborate procedures to ensure that modern human DNA doesn’t contaminate their samples, and these are typically spelled out in the papers resulting from this research.
Furthermore, as Mulligan discusses, it’s important that researchers have a clear sense of what questions they are asking and how successful aDNA analyses are likely to be in answering them. For example, DNA analysis is unlikely to be able to unambiguously identify a given set of ancient remains as belonging (or ancestral) to a specific tribal group, since genetic affiliation doesn’t correlate with cultural identity at anything close to that level of specificity. In other words, aDNA analysis can potentially identify remains as being of Native American rather than European origin, but it can’t unambiguously identify remains with any particular modern tribe. On the other hand, it is potentially possible to use aDNA studies to identify migrations and population replacement in the past, if the groups in question are sufficiently distinct genetically. Mulligan actually uses an example from the prehistoric Southwest, which I’ll discuss further in the next post, to illustrate how it can be tricky to interpret differences in genetic characteristics between populations, especially at the level of detail at which these analyses are often conducted.
These concerns aside, DNA analysis can certainly be a powerful tool for understanding the past, especially when aDNA studies can be integrated with studies of modern DNA. A great example of this is a study that was recently published in Science about the prehistory of the North American Arctic. In this paper, which is available free on the Science website, the researchers report on a combination of aDNA and modern DNA analyses that demonstrate clearly that the people of the mysterious Dorset culture that inhabited Arctic Canada and Greenland from about 800 BC to AD 1300 are not ancestral to the modern Inuit inhabiting the same area, who are instead descended from the people of the Thule culture who immigrated into Canada from northern Alaska around AD 1200. This is solid, careful research that shows what DNA studies can reveal about the human past.
Much of the aDNA research in the Americas has focused on mitochondrial DNA (mtDNA), which is contained in the mitochondria of each cell, as opposed to nuclear DNA, which is contained in the cell nucleus. There are two main reasons for this.
One is that mitochondrial DNA is passed on (generally) unchanged through the maternal line, as opposed to nuclear DNA which undergoes meiosis, the process by which DNA from the mother and father is recombined in the course of creating a new embryo, meaning that any part of the genome that has gone through it cannot be easily traced from generation to generation. Mitochondrial DNA, in contrast, is passed on directly from mother to child, and the only changes are whatever mutations develop over time, which can be used to define specific haplogroups, or genetic groupings sharing certain distinctive mutations that are interpreted as indicating shared descent. Within each haplogroup, further mutations can be used to define various sub-haplogroups, which indicate closer relationship among the haplotypes (individual genetic profiles) that comprise them. The Y chromosome, which is passed on directly from father to son, isn’t affected by recombination during meiosis and can be used to trace descent in a similar fashion. However, mtDNA is more widely used for aDNA studies than Y-chromosome DNA, due to an additional difference between mtDNA and nuclear DNA. Due to the structure of mitochondria, each cell contains many more copies of its mtDNA than of its nuclear DNA, so mtDNA is much more likely to survive in ancient samples than nuclear DNA. This means there is much greater probability that studies of mtDNA using PCR will identify DNA to be replicated, and the result is that the existing database of mtDNA available for statistical analysis is much larger than that for nuclear DNA, including Y-chromosome DNA. Most aDNA studies in North America, at least, have therefore used mtDNA as a primary focus for research.
Early research on both ancient and modern DNA identified four main mitochondrial haplogroups among Native American populations. These were labeled A, B, C, and D. (Haplogroups are conventionally identified by capital letters, with more specific sub-haplogroups indicated by sequences of numbers and lowercase letters following the haplogroup letter.) These haplogroups all arose from earlier East Asian haplogroups, which agrees with the traditional interpretation that Native Americans descend from Asian groups that migrated across the Bering Strait. Some modern populations in these early studies showed low levels of an additional haplogroup, X, which had previously only been documented in Europeans. There was some question at first about whether this indicated post-Contact admixture with Europeans or an additional “founding” haplogroup, but it was later found in aDNA, showing clearly that it was indeed ancient in the Americas. The implications of this finding are hard to understand, but the general consensus at this point seems to be that the American examples descend from a very ancient and otherwise unknown Central Asian offshoot of the European X haplogroup. Wherever it came from, however, it is now quite clear that X is one of the founding haplogroups in the Americas.
Much aDNA research in North America, then, has focused on identifying the haplogroups of ancient remains and comparing them to those of other populations, both ancient and modern. Much of this research has involved treating assemblages of ancient remains (either from single sites or across a whole archaeological “culture”) as samples that can be compared statistically to samples from modern tribes. I find this dubious, since the ancient samples are typically small and there’s no way to tell how representative they are of the actual underlying population (however it’s defined). The statistical procedures often used to analyze haplogroup frequencies implicitly assume that these are random samples representative of the population, but there’s no real way to know if this is true and in most cases no particular reason to think it is. In theory it’s possible that the modern samples, at least, are representative of their populations, but I suspect it’s often not the case in practice here either. For both modern and ancient samples, it’s likely that other factors, such as level of preservation and willingness to provide samples, have strongly affected the composition of the samples. These factors may or may not have skewed the representativeness of the samples; the point is that there’s no real way to tell.
Given this sampling issue, I think the most conservative and defensible approach is to treat haplogroup distributions as nominal-level data: the most we can really say about a given haplogroup in a given sample is whether it is present or absent. That’s not very helpful, though, and it may be reasonable to take a further leap and treat the distributions as ordinal-level data: this allows us to make use of the fact that some haplogroups are much more common in a given sample than others to make some broad conclusions about haplogroup distributions on a larger scale. What isn’t justified, however, is treating the frequencies of haplogroups in a sample as interval/ratio-level data: using the actual numbers as if they are meaningfully representative of the underlying population, and plugging them into elaborate statistical formulas to compare them to other samples/populations. Not all aDNA studies do this sort of thing, but it’s common enough that I think it’s important to emphasize that it’s a problematic approach at best, and that any conclusions regarding probable relationships between populations based on this method shouldn’t be taken very seriously.
A better way to go beyond the crude data of haplogroup assignment is to sequence additional portions of the mitochondrial genome that are known to contain mutations that define sub-haplogroups within the assigned overall haplogroup. Enough research has been done at this point that quite a few sub-haplogroups are known, and when they show up in multiple samples, either ancient or modern, that provides a much firmer basis for hypothesizing meaningfully close relationships than statistical comparisons of haplogroup distributions among whole samples. Furthermore, since the mutations that define sub-haplogroups can be grouped hierarchically, it’s possible to construct trees showing how individuals in a given sample, or even across samples, that belong to the same haplogroup relate to each other. (Note that this isn’t quite the same as showing how the people were actually related, since we don’t know when the mutations that define these groups actually occurred or how the people whose remains were sampled were related to the people in whom the mutations originally occurred.) There’s a probabilistic aspect to this type of evidence, since there are multiple ways a particular set of mutations could have ended up together in the same haplotype, and determining the most likely sequence of events can require modeling and simulation. The more samples are analyzed, the larger the database of known mutations and sub-haplogroups becomes, and the more reliable the conclusions that can be drawn about relationships are.
So that’s the basic outline of how ancient DNA analysis works and the methodological concerns that need to be kept in mind when evaluating it. In the next post, we’ll look at some of the specific studies that have applied these methods to the Southwest, and what their results can and can’t tell us about Southwestern prehistory.
Mulligan, C. (2006). Anthropological Applications of Ancient DNA: Problems and Prospects American Antiquity, 71 (2) DOI: 10.2307/40035909
Raghavan, M., DeGiorgio, M., Albrechtsen, A., Moltke, I., Skoglund, P., Korneliussen, T., Gronnow, B., Appelt, M., Gullov, H., Friesen, T., Fitzhugh, W., Malmstrom, H., Rasmussen, S., Olsen, J., Melchior, L., Fuller, B., Fahrni, S., Stafford, T., Grimes, V., Renouf, M., Cybulski, J., Lynnerup, N., Lahr, M., Britton, K., Knecht, R., Arneborg, J., Metspalu, M., Cornejo, O., Malaspinas, A., Wang, Y., Rasmussen, M., Raghavan, V., Hansen, T., Khusnutdinova, E., Pierre, T., Dneprovsky, K., Andreasen, C., Lange, H., Hayes, M., Coltrain, J., Spitsyn, V., Gotherstrom, A., Orlando, L., Kivisild, T., Villems, R., Crawford, M., Nielsen, F., Dissing, J., Heinemeier, J., Meldgaard, M., Bustamante, C., O’Rourke, D., Jakobsson, M., Gilbert, M., Nielsen, R., & Willerslev, E. (2014). The genetic prehistory of the New World Arctic Science, 345 (6200), 1255832-1255832 DOI: 10.1126/science.1255832
So, was Farley Mowat right about the Irish colonizing Northern Canada before the vikings came?
still don’t get was there some difference between “elite and commoners”? (Lekson). Thanks Teo for perfect essay, it helps!
Eagerly looking forward to your next genetics post.
I think your summary of the nature of the evidence is generally on point, although your skepticism about “elaborate statistical formulas to compare them to other samples/populations” (presumably meaning techniques like Principal Component Analysis charts) probably overstates the extent to which this technique is problematic, particularly when you are charting many small aDNA samples on the chart in addition to modern populations. Also, the sophistication with which experienced PCA users can be alert to sample irregularity data is quite high (e.g. heavily inbred and isolated populations tend to look like outliers even when they are closely related to other populations).
FWIW, generally ancient DNA as a practical matter comes from organic material inside teeth or other small bones and protected from the environment by this exterior hard material – although one still needs the right preservation conditions as well.
One point you don’t really address is that the sampling size issues are very different in autosomal DNA (which is generally recoverable whenever you can get Y-DNA). The amount of data in a single autosomal DNA sample that differ between modern humans is in the hundreds of thousands. The number of mutations that distinguish an mtDNA sample is perhaps a thousand times smaller. The number of data points that goes into a Y-DNA sample haplogroup typing isn’t that different from mtDNA even though there are far more loci in Y-DNA than mtDNA.
Due to recombination over many generations within a population, a single autosomal DNA sample tells you as much about the population of which the person from whom the sample was drawn as scores or hundreds of mtDNA samples. Each living person is a random statistical sample of his ancestor’s autosomal DNA and you don’t have to go back nearly as many generations as you would expect for the set of people who are the ancestors of anyone now living and the set of people who are ancestors of everyone now living to become identical. If you are lucky enough to have even two or three autosomal samples, rather than just one, you can get a very accurate description of the population as a whole indeed, because population variability can be tested at hundreds of thousands of independent sites.
Also, autosomal DNA allows for identification of known genotypes that are functional and correspond to known phenotypes, rather than merely being ancestry informative random noise as is the case of mtDNA and Y-DNA haplogroups. For example, you can discern something about genetic indicators of gender, stature, pigmentation, immune system traits, lactase persistence, high altitude adaptations, and more from a single autosomal DNA sample.
Another thing that autosomal data provides is a somewhat less controversial clock to date the age of a sample or events in that sample’s ancestral population history than the mutation rate estimates that are used to estimate when particular mtDNA and Y-DNA haplogroups arose. For example, linkage disequilibrium which looks at how clumpy groups of genes in a DNA sample are and compares that to expectations based upon random recombination at each generation, can quite accurately date when admixture between historic populations occurred. For example, it is in principle possible to use this method to determine when the Na-Dene wave of Native Americans admixed with the Founder population. Autosomal DNA from just a handful of samples can also be used to make quite effective estimates of things like effective ancestral population sizes that can also be done with precision mtDNA data but less precisely.
Because autosomal data is so rich (with hundreds of thousands of data points each), it is also effectively impossible to assess it without elaborate statistical methods to summarize the evidence. Typically, one does automated cluster analysis by computer that attributes the sample to one or more hypothetical source components and look at the percentage of each component in your sample as compared to reference samples. In modern populations with undegraded autosomal samples, this kind of analysis is sufficient to identify, if not actually a person’s tribe, at least their tribe’s generally regional and ethnic and linguistic group affiliation. You might not be able to distinguish on Pueblo town from the one next door, but you could distinguish between, for example, a Pueblo Indian and a Ute or a Mayan.
This kind of distinction is, in general, harder in New World populations, especially once you are south of the Arctic and sub-Arctic area where there have been comparatively recent waves of migration, than in the Old World, with haplogroups of uniparental mtDNA and Y-DNA markers because the New World founding population was so small and had so few distinct haplogroups. Mutations happen rapidly enough that distinctions can evolve over 15,000 years +/-, but in populations that have even one bride exchange with each other per generation, those distinctions between populations almost evaporate as they blend into each other. Thus, while Amazonians might be distinct from Seminoles in Florida, distinguishing based on uniparental haplogroups within populations in the American SW that were not fully endogamous is more difficult.
One more point on autosomal DNA. Even one sample is enough to irrefutably establish any meaningful (e.g. 2% or more) non-Native American ancestry in an individual or any ancestor for many generations back of the population, and precision sequencing of most Native American mtDNA or Y-DNA would also suffice because most (but not all) Native American uniparental haplogroups are private to the New World.
This makes it quite easy to rule out lots of crackpot theories, and to validate others than seem far fetched with a single sample.
These are exciting times. Lots of new genetic and archaeological evidence of pre-Columbian contacts between the New World and the Old World after the initial founding event of the Americas, has emerged.