In 1946 the psychologist Stanley Smith Stevens, founder and director of Harvard’s Psycho-Acoustic Laboratory, published a short article in Science laying out a classification scheme for scales of measurement. This system, and the four scales it proposed, would go on to become extremely influential in the quantitative sciences, and it is still widely used. I learned about it in the statistics class I took in graduate school. It doesn’t seem to get much attention among archaeologists, however, which is unfortunate because the distinctions between different types of data and the scale by which they should be measured are quite important to a proper study of the archaeological record. This is particularly the case with different types of archaeological dating, which fall on different scales of measurement and are thus not directly comparable to each other. This post is an attempt to explain Stevens’s system and show where different types of archaeological dating fall within it and why it matters.
Stevens proposed four scales (today often called “levels“) of measurement: nominal, ordinal, interval, and ratio. They are distinguished by the type of empirical operation required to create them, as well as by their formal mathematical properties. As a result of these distinctions, different statistical techniques can be applied to scales at different levels.
- Nominal scales are the most basic, and result from essentially arbitrary labels being applied to either individual data points or groups of data points. Variables at this level of measurement are really just grouped into categories, and are therefore often called categorical variables. The labels can be numbers, which is how Stevens justifies including nominal scales within his classification system, but they can also be letters, words, or any other set of unique identifiers. Because they are really qualitative rather than quantitative classifications, nominal scales are not susceptible to many statistical analyses. The statistics than can be used are number of cases, mode (in the case that each class includes more than one data point), and certain other statistics such as chi-squared tests.
- Ordinal scales result from the rank-ordering of data points. The numerical labels in this case do have a real mathematical meaning, unlike with nominal variables, but it is a very rudimentary one, limited to the determination of relative order. The distances between values are not defined. The statistics available for ordinal variables are more extensive than for nominal ones, but still somewhat limited. The median can be calculated as a measure of central tendency, and interquartile range can serve as a measure of dispersion. Other statistics such as mean and standard deviation are often calculated for ordinal data, but as Stevens notes this is not really appropriate since they assume that the intervals between values are equal, which may not be the case.
- Interval scales have values with equal intervals between them, and most quantitative statistics, including mean and standard deviation, are appropriate with them. The zero point on an interval scale is arbitrary, however, and negative values can be meaningful.
- Ratio scales are similar to interval scales but have non-arbitrary zero points and cannot take negative values. They are used in contexts involving the counting of actual objects. In addition to mean and standard variation, statistics such as coefficient of variation that depend on meaningful ratios can be used with.
There are numerous examples of variables that are measured at these different levels. Before getting into examples specific to archaeological dating, here are some general ones:
- Nominal: Anything that is grouped into categories with no relative order. Stevens gives the example of the numbers assigned to football players as a situation in which each category (i.e., number) applies only to a single data point, but there are also many systems in which multiple data points fall into a single category. Gender, race, color, shape, and texture are examples.
- Ordinal: Stevens notes that most of the scales used by psychologists are ordinal. Subjective scales ranking perceptions numerically but without explicitly defining the differences between values fall into this category. The Mohs scale of mineral hardness, which is based on whether one mineral can scratch another, is another example of an ordinal scale.
- Interval: Probably the best-known example of a variable measured at the interval level is temperature. Both the Fahrenheit and Celsius temperature scales have consistent intervals and arbitrary zero points, both of which differ between the two. Also an interval scale, and important for the discussion of dating below, is calendar time.
- Ratio: Any scale based on counting physical object is at the interval level, as is any scale that has a real zero point and no negative values, such as measurement of length or weight using any consistent system of units. Returning to the example of temperature, the Kelvin scale is at the ratio level, since zero Kelvin is equivalent to absolute zero.
With that in mind, let’s move on to archaeological dating. There are a wide variety of dating systems used in archaeology, and I’m just going to focus on a few that are particularly common especially in the Southwest. They are organized by level of measurement; note that this is based on the way the dates are conventionally presented, which in some cases is different from the way they are initially derived.
Nominal: Since nominal variables have no order, they are generally not well suited for dating. However, the traditional use of “phases” consisting of certain combinations of artifact assemblages, architectural styles, and so forth might fit into this category, especially in areas where the phases are not easily placed in a relative order through stratigraphy or absolute dating. This has generally not been the case in the Southwest, where stratified sites are common, but until the development of other dating techniques it was widely used in areas like the Plains and the Great Basin where stratified sites are uncommon and many sites are surface artifact scatters that can only be dated by the presence of diagnostic artifacts.
Ordinal: Stratigraphy, the determination of the relative ages of components within a site based on their relative order in the ground, is an ordinal-scale dating system. It’s impossible to tell either the absolute date of a given layer or the length of time during which it was deposited in the absence of other evidence, such as material within the layer that can be dated by other means. The relative thickness of different layers may give some clue as to differences in how long it took them to be deposited, but there are so many other potential factors that could determine the thickness of layers that this is a highly unreliable method.
Another, less obvious, ordinal-scale dating system is uncalibrated radiocarbon. This is an important point, and I don’t think it is widely understood. Radiocarbon dating is based on the ratio of one isotope of carbon to another within organic material, and radiocarbon dates are expressed in years “before present” (BP), with “present” now conventionally defined as AD 1950. If all else were equal, these BP dates would be interval-scale (not ratio, because the zero point at 1950 is arbitrary). All else is not equal, however, and it turns out that one of the basic assumptions behind the radiocarbon technique, that the background level of carbon in the atmosphere is constant, is actually false. The amount of background carbon has varied over time, so radiocarbon dates must be calibrated by calculating radiocarbon ages for materials of known age (for relatively recent periods this mostly means tree rings) and developing a calibration curve to convert straight radiocarbon ages to calendar dates. There are now several programs freely available to do these calibrations, the best known being CALIB and OxCal, and there is really no excuse these days for only reporting uncalibrated dates. Unfortunately, many archaeologists continue to do this, in some cases justifying it by the fact that so many dates already in the literature are uncalibrated. This is true, but it’s still no excuse. Calibrating those old dates is totally possible. Furthermore, and this is where the ordinal nature of the dates becomes important, while uncalibrated radiocarbon dates can be compared with each other they cannot be directly compared with dates derived from any other dating method. This is the crucial difference between ordinal and interval scales when it comes to dating. In many places virtually the only dating method used is radiocarbon, so this doesn’t seem like such a big deal, but in places where other methods are used its importance becomes clear. This is especially the case in the Southwest, with its long history of (interval-scale) tree-ring dating. Without calibration, a radiocarbon date cannot be compared to a tree-ring date. They aren’t measuring on the same scale.
Interval: Calendar dates (in any calendar), and dates derived from any method that are expressed as calendar dates, are interval-scale data. In an archaeological context this includes dates from historical records, tree-ring dates, archaeomagnetic dates, and calibrated radiocarbon dates. These can all be compared to each other, although they vary in precision and some “dates” end up as ranges rather than point estimates when put on this scale. This is true for archaeomagnetic and calibrated radiocarbon dates.
Ratio: Any dates expressed as “before present” with “present” meaning the actual present rather than an arbitrary date like 1950 are ratio-scale data. The main example I know of is luminescence dating (although not all luminescence dates are reported this way), though there are probably others. While this seems like a good way to express dates at first glance, it’s actually somewhat problematic because the “present” keeps on moving as time goes on. If dates are going to be expressed this way it is absolutely crucial to say what the “present” date is. Once you do that, though, it’s trivially easy to convert the date to a calendar date, and this is probably the best way to go. This would need to be done anyway to compare dates like this to those from the more numerous interval-scale methods. What is particularly annoying is the tendency for some calibrated radiocarbon dates to be cited as “BP” values without explicitly indicating if the “present” refers to 1950, the actual present, or something else. In general I think people should report dates as calendar years whenever possible. “BP” is just too confusing and tricky a concept.
The key thing I would like people to take away from this discussion is that data can only be compared directly if they are at the same level of measurement and using the same scale. When it comes to dates specifically, we already have this nifty interval-level scale called the calendar, and consistently using it to express dates, no matter how they were derived, is the best way to ensure that data from different researchers working in different places can be compared.
Stevens, S. (1946). On the Theory of Scales of Measurement Science, 103 (2684), 677-680 DOI: 10.1126/science.103.2684.677