To wrap up my series on tracing the connections between ancient Pueblo sites like Chaco Canyon and the modern Pueblos, I’d like to discuss a type of evidence I haven’t discussed much but that people often ask about: DNA evidence. This is the most direct way to tie one population to another, at least in theory, but it’s actually quite difficult to draw any specific conclusions from the work that has been done so far, and that’s not necessarily going to improve as more research is done. Which is not to say that research along these lines has been worthless; it hasn’t revealed anything inconsistent with data from other sources so far, but that in itself is interesting and provides support for the other approaches that have been tried. Because this is such a huge and important topic, I’ve decided to break my discussion of it into two posts, one on the archaeological study of DNA in general, and another on the application of these techniques to the Southwest in particular.
There are many different types of DNA analyses that can in theory be done, but when it comes to archaeological questions, especially those involving connections between ancient sites and modern people, it is generally necessary to analyze remains excavated by archaeologists. This involves studying what is known as “ancient DNA” (or “aDNA” for short), in addition to the DNA of modern populations. As Connie Mulligan of the University of Florida noted in an article published in American Antiquity in 2006, aDNA studies have a lot of potential but also a lot of challenges. Some of the major issues involved in aDNA research are preservation of the DNA, without which any study has no chance of success, and interpretation of the results of a successful analysis of ancient material.
Because DNA, like any other organic material, decays over time, aDNA studies are more difficult and expensive than DNA studies of modern populations, and in some cases there is simply not enough DNA left in archaeological material to do any analysis at all. Preservation is a function, in part, of local environmental conditions, which in the arid Southwest tend to be favorable for preserving organic material, so this is less of a concern in this area than in many others.
Another major consideration in doing aDNA analysis is contamination. The technique that makes aDNA analysis possible is called Polymerase Chain Reaction (PCR), which involves taking a small amount of DNA and exposing it to a chemical reaction that creates billions of copies which can then be analyzed. This can be enormously useful, but the reaction is very sensitive and if any extraneous organic material is introduced it is likely to reproduce its DNA instead of the ancient DNA, which can totally destroy the validity of the analysis. The main concern with aDNA analyses of human remains is modern human DNA from the researchers themselves, and this has been an issue with many studies. These days the major laboratories that do aDNA analysis have elaborate procedures to ensure that modern human DNA doesn’t contaminate their samples, and these are typically spelled out in the papers resulting from this research.
Furthermore, as Mulligan discusses, it’s important that researchers have a clear sense of what questions they are asking and how successful aDNA analyses are likely to be in answering them. For example, DNA analysis is unlikely to be able to unambiguously identify a given set of ancient remains as belonging (or ancestral) to a specific tribal group, since genetic affiliation doesn’t correlate with cultural identity at anything close to that level of specificity. In other words, aDNA analysis can potentially identify remains as being of Native American rather than European origin, but it can’t unambiguously identify remains with any particular modern tribe. On the other hand, it is potentially possible to use aDNA studies to identify migrations and population replacement in the past, if the groups in question are sufficiently distinct genetically. Mulligan actually uses an example from the prehistoric Southwest, which I’ll discuss further in the next post, to illustrate how it can be tricky to interpret differences in genetic characteristics between populations, especially at the level of detail at which these analyses are often conducted.
These concerns aside, DNA analysis can certainly be a powerful tool for understanding the past, especially when aDNA studies can be integrated with studies of modern DNA. A great example of this is a study that was recently published in Science about the prehistory of the North American Arctic. In this paper, which is available free on the Science website, the researchers report on a combination of aDNA and modern DNA analyses that demonstrate clearly that the people of the mysterious Dorset culture that inhabited Arctic Canada and Greenland from about 800 BC to AD 1300 are not ancestral to the modern Inuit inhabiting the same area, who are instead descended from the people of the Thule culture who immigrated into Canada from northern Alaska around AD 1200. This is solid, careful research that shows what DNA studies can reveal about the human past.
Much of the aDNA research in the Americas has focused on mitochondrial DNA (mtDNA), which is contained in the mitochondria of each cell, as opposed to nuclear DNA, which is contained in the cell nucleus. There are two main reasons for this.
One is that mitochondrial DNA is passed on (generally) unchanged through the maternal line, as opposed to nuclear DNA which undergoes meiosis, the process by which DNA from the mother and father is recombined in the course of creating a new embryo, meaning that any part of the genome that has gone through it cannot be easily traced from generation to generation. Mitochondrial DNA, in contrast, is passed on directly from mother to child, and the only changes are whatever mutations develop over time, which can be used to define specific haplogroups, or genetic groupings sharing certain distinctive mutations that are interpreted as indicating shared descent. Within each haplogroup, further mutations can be used to define various sub-haplogroups, which indicate closer relationship among the haplotypes (individual genetic profiles) that comprise them. The Y chromosome, which is passed on directly from father to son, isn’t affected by recombination during meiosis and can be used to trace descent in a similar fashion. However, mtDNA is more widely used for aDNA studies than Y-chromosome DNA, due to an additional difference between mtDNA and nuclear DNA. Due to the structure of mitochondria, each cell contains many more copies of its mtDNA than of its nuclear DNA, so mtDNA is much more likely to survive in ancient samples than nuclear DNA. This means there is much greater probability that studies of mtDNA using PCR will identify DNA to be replicated, and the result is that the existing database of mtDNA available for statistical analysis is much larger than that for nuclear DNA, including Y-chromosome DNA. Most aDNA studies in North America, at least, have therefore used mtDNA as a primary focus for research.
Early research on both ancient and modern DNA identified four main mitochondrial haplogroups among Native American populations. These were labeled A, B, C, and D. (Haplogroups are conventionally identified by capital letters, with more specific sub-haplogroups indicated by sequences of numbers and lowercase letters following the haplogroup letter.) These haplogroups all arose from earlier East Asian haplogroups, which agrees with the traditional interpretation that Native Americans descend from Asian groups that migrated across the Bering Strait. Some modern populations in these early studies showed low levels of an additional haplogroup, X, which had previously only been documented in Europeans. There was some question at first about whether this indicated post-Contact admixture with Europeans or an additional “founding” haplogroup, but it was later found in aDNA, showing clearly that it was indeed ancient in the Americas. The implications of this finding are hard to understand, but the general consensus at this point seems to be that the American examples descend from a very ancient and otherwise unknown Central Asian offshoot of the European X haplogroup. Wherever it came from, however, it is now quite clear that X is one of the founding haplogroups in the Americas.
Much aDNA research in North America, then, has focused on identifying the haplogroups of ancient remains and comparing them to those of other populations, both ancient and modern. Much of this research has involved treating assemblages of ancient remains (either from single sites or across a whole archaeological “culture”) as samples that can be compared statistically to samples from modern tribes. I find this dubious, since the ancient samples are typically small and there’s no way to tell how representative they are of the actual underlying population (however it’s defined). The statistical procedures often used to analyze haplogroup frequencies implicitly assume that these are random samples representative of the population, but there’s no real way to know if this is true and in most cases no particular reason to think it is. In theory it’s possible that the modern samples, at least, are representative of their populations, but I suspect it’s often not the case in practice here either. For both modern and ancient samples, it’s likely that other factors, such as level of preservation and willingness to provide samples, have strongly affected the composition of the samples. These factors may or may not have skewed the representativeness of the samples; the point is that there’s no real way to tell.
Given this sampling issue, I think the most conservative and defensible approach is to treat haplogroup distributions as nominal-level data: the most we can really say about a given haplogroup in a given sample is whether it is present or absent. That’s not very helpful, though, and it may be reasonable to take a further leap and treat the distributions as ordinal-level data: this allows us to make use of the fact that some haplogroups are much more common in a given sample than others to make some broad conclusions about haplogroup distributions on a larger scale. What isn’t justified, however, is treating the frequencies of haplogroups in a sample as interval/ratio-level data: using the actual numbers as if they are meaningfully representative of the underlying population, and plugging them into elaborate statistical formulas to compare them to other samples/populations. Not all aDNA studies do this sort of thing, but it’s common enough that I think it’s important to emphasize that it’s a problematic approach at best, and that any conclusions regarding probable relationships between populations based on this method shouldn’t be taken very seriously.
A better way to go beyond the crude data of haplogroup assignment is to sequence additional portions of the mitochondrial genome that are known to contain mutations that define sub-haplogroups within the assigned overall haplogroup. Enough research has been done at this point that quite a few sub-haplogroups are known, and when they show up in multiple samples, either ancient or modern, that provides a much firmer basis for hypothesizing meaningfully close relationships than statistical comparisons of haplogroup distributions among whole samples. Furthermore, since the mutations that define sub-haplogroups can be grouped hierarchically, it’s possible to construct trees showing how individuals in a given sample, or even across samples, that belong to the same haplogroup relate to each other. (Note that this isn’t quite the same as showing how the people were actually related, since we don’t know when the mutations that define these groups actually occurred or how the people whose remains were sampled were related to the people in whom the mutations originally occurred.) There’s a probabilistic aspect to this type of evidence, since there are multiple ways a particular set of mutations could have ended up together in the same haplotype, and determining the most likely sequence of events can require modeling and simulation. The more samples are analyzed, the larger the database of known mutations and sub-haplogroups becomes, and the more reliable the conclusions that can be drawn about relationships are.
So that’s the basic outline of how ancient DNA analysis works and the methodological concerns that need to be kept in mind when evaluating it. In the next post, we’ll look at some of the specific studies that have applied these methods to the Southwest, and what their results can and can’t tell us about Southwestern prehistory.
Mulligan, C. (2006). Anthropological Applications of Ancient DNA: Problems and Prospects American Antiquity, 71 (2) DOI: 10.2307/40035909
Raghavan, M., DeGiorgio, M., Albrechtsen, A., Moltke, I., Skoglund, P., Korneliussen, T., Gronnow, B., Appelt, M., Gullov, H., Friesen, T., Fitzhugh, W., Malmstrom, H., Rasmussen, S., Olsen, J., Melchior, L., Fuller, B., Fahrni, S., Stafford, T., Grimes, V., Renouf, M., Cybulski, J., Lynnerup, N., Lahr, M., Britton, K., Knecht, R., Arneborg, J., Metspalu, M., Cornejo, O., Malaspinas, A., Wang, Y., Rasmussen, M., Raghavan, V., Hansen, T., Khusnutdinova, E., Pierre, T., Dneprovsky, K., Andreasen, C., Lange, H., Hayes, M., Coltrain, J., Spitsyn, V., Gotherstrom, A., Orlando, L., Kivisild, T., Villems, R., Crawford, M., Nielsen, F., Dissing, J., Heinemeier, J., Meldgaard, M., Bustamante, C., O’Rourke, D., Jakobsson, M., Gilbert, M., Nielsen, R., & Willerslev, E. (2014). The genetic prehistory of the New World Arctic Science, 345 (6200), 1255832-1255832 DOI: 10.1126/science.1255832