Dentitions, Distance, and Difficulty: A Comparison of Two Statistical Techniques for Dental Morphological Data

The Mean Measure of Divergence (MMD) has become the standard statistical technique for assessing biological affinities when using frequencies of dental morphological characteristics (Scott and Turner, 1997). There are several advantages in using this statistic: It is appropriate for nominal data, it is relatively easy to compute, and it is comparable among researchers. There is however, a drawback to using the MMD; it is only appropriately used when the traits being studied are independent. The assumption of independence is weak for several dental characteristics, so intertrait correlations must be tested, and traits that are correlated must be removed from of a MMD analysis. An alternative to MMD is the Mahalanobis’ D2 statistic, which allows correlated features to be used in affinity measures (Mahalanobis, 1936). However, as originally formulated, this statistic is useful only for metric, not nominal, data. Konigsberg (1990) used a pseudo-Mahalanobis’ D2 to determine biological affinity using non-metric data. This statistic has the potential to allow distance measures to be based on a greater variety and number of dental characteristics than the MMD. Of course, like MMD, the D2 statistic has its drawbacks. The primary problems with the application of this statistic are its limited applicability when analyzing a number of traits with little or no correlation, the need for multiple observations per individual, and its relatively more difficult computation. Because every trait must be compared to every other for each sample being studied, comparing more than a few traits at a time can become quite ABSTRACT One of the main uses of dental morphological data is to study patterns of affinities among populations. Many different approaches to this purpose are available, each one having its own strengths and weaknesses. For this study, observations were made of the morphology of 614 African American and 327 European American dentitions (n = 941). Each of these samples was divided into three groups based on the time in which they lived. Affinities among the resulting six groups were estimated based on the Dentitions, Distance, and Difficulty: A Comparison of Two Statistical Techniques for Dental Morphological Data

The Mean Measure of Divergence (MMD) has become the standard statistical technique for assessing biological affinities when using frequencies of dental morphological characteristics (Scott and Turner, 1997). There are several advantages in using this statistic: It is appropriate for nominal data, it is relatively easy to compute, and it is comparable among researchers. There is however, a drawback to using the MMD; it is only appropriately used when the traits being studied are independent. The assumption of independence is weak for several dental characteristics, so intertrait correlations must be tested, and traits that are correlated must be removed from of a MMD analysis.
An alternative to MMD is the Mahalanobis' D 2 statistic, which allows correlated features to be used in affinity measures (Mahalanobis, 1936). However, as originally formulated, this statistic is useful only for metric, not nominal, data. Konigsberg (1990) used a pseudo-Mahalanobis' D 2 to determine biological affinity using non-metric data. This statistic has the potential to allow distance measures to be based on a greater variety and number of dental characteristics than the MMD. Of course, like MMD, the D 2 statistic has its drawbacks. The primary problems with the application of this statistic are its limited applicability when analyzing a number of traits with little or no correlation, the need for multiple observations per individual, and its relatively more difficult computation. Because every trait must be compared to every other for each sample being studied, comparing more than a few traits at a time can become quite

ABSTRACT
One of the main uses of dental morphological data is to study patterns of affinities among populations. Many different approaches to this purpose are available, each one having its own strengths and weaknesses. For this study, observations were made of the morphology of 614 African American and 327 European American dentitions (n = 941). Each of these samples was divided into three groups based on the time in which they lived. Affinities among the resulting six groups were estimated based on the

Dentitions, Distance, and Difficulty: A Comparison of Two Statistical Techniques for Dental Morphological Data
Heather Joy Hecht Edgar* Laboratory of Human Osteology, Maxwell Museum of Anthropology, University of New Mexico, Albuquerque, New Mexioc 87131 arduous, even with a computer. Additionally, the inclusion of a new sample for analysis requires the recalculation of all measures of affinity among groups, not simply the measures of affinity of the new sample with the original groups, as with the MMD.
This study presents the results of a comparison of MMD and pseudo-D 2 methods for determining biological affinity among several samples. The goals are to investigate whether the two types of analysis result in similar findings, and if not, to consider why.

MATERIAL
The data for this study comes from the dentitions of 941 African Americans and European Americans, analyzed as part of a larger study of the microevolution of African American dental morphology. Samples come from collections temporarily or permanently housed at the National Museum of Natural History, National Museum of Health and Medicine, Cleveland Museum of Natural History, University of Tennessee Health Sciences Center, Ohio State University, and Arizona State University. The samples were divided into six groups, based on ancestry and time period. The samples sizes and time periods are listed in Table  1.
For this study, a maximum of 136 observations of 32 morphological characteristics was possible per dentition. Observation procedures were based on the Arizona State University dental anthropology system (Turner et al., 1991). No significant directional asymmetry of expression or sexual dimorphism was found, so rights and lefts were combined (with the greatest trait expression being represented), as were observations from males and females. Observations were then dichotomized with guidance from Haeussler et al. (1989), Irish (1993), Irish and Turner (1990), Scott and Turner (1997), and Turner (1987).
All statistics were performed using the SAS statistical package (SAS Institute Inc., 1990). Associations between traits were determined using the likelihood ratio statistic. The list of traits that was used for each analysis can be found in Table 2. Traits used in the MMD analysis are independent from each other. To invert the matrix of correlations, the D 2 analysis requires that most variables have some tetrachoric correlation with all other variables. Several variables were eliminated from D 2 analyses because they were found to have little or no correlation with other variables, and thus the tetrachoric correlation matrix was singular. Different variable combinations were used in each analysis because of the requirements of each statistics; traits should be uncorrelated for the MMD and correlated for the D 2 .

Mean Measure of Divergence
The MMD statistic was developed by C. A. B. Smith, and was first used to look at changes due to inbreeding in mice (Grewal, 1962;.  first applied it to the study of biological affinities or distance in humans. The MMD estimates biological distance between samples based on the degree of phenetic similarity (Irish, 1997). The statistic requires an assumption of independence of traits. Like D 2 , it is useful if trait expression varies in a population, when frequencies are 5-95% (de Souza and Houghton, 1977). Some major benefits of its use are its ability to work with incomplete data and its applicability to samples as small as 10-20 observations. MMD is defined as: where Θ 1 and Θ 2 are the arc sin (sin -1 ) transformations of the observed frequencies in the two samples being compared, n 1 and n 2 are the sample sizes, and c is the number of characters employed (Freeman and Tukey, 1950).

Pseudo-Mahalanobis' D 2
The Pseudo-Mahalanobis' D 2 is defined as the sum of squares of differences between corresponding mean values of two sets of measurements, weighted by the variance/covariance matrix (Burnaby, 1966): where χ ik is the mean of expression for sample i for k traits, and χ jk is the same for sample j. The middle term (∑) is the pooled covariance matrix between the k traits (Manly, 1994). In this study, the means of trait expressions are the threshold values corresponding to the trait frequencies in the samples (Falconer, 1981), and the middle term is a pooled matrix of tetrachoric correlations between the traits (Brown, 1977). These transformations account for correlations between characteristics being used (Konigsburg, 1990;Mizoguchi, 1977) and the threshold nature of dental morphological traits (Scott and Turner, 1997

Procrustes' transformation
The purpose of this statistic is to rotate and scale two sets of coordinates so as to achieve the best fit between them (Gower, 1971(Gower, , 1975. For this study, the coordinates come from principal coordinates analysis of four distance matrices, and represent the first two axes of each matrix. The better the fit between two sets of coordinates, the smaller the summed deviations should be. Gower (1971) refers to the statistic as R 2 (for residual), but it can also be found as S 2 (for sum of squares) (Goodall, 1991) and M 2 (for minimum) (Jackson, 1995). R 2 is defined as: where P i and P i * represent the corresponding points in two different sets of coordinates. The R 2 statistic is the sum of squared differences after rotation and scaling. The smaller the R 2 , the smaller the difference is between the two sets of coordinates. For this study, a small R 2 will indicate good agreement between the MMD and D 2 statistics.

RESULTS
Before discussing the direct comparison of statistical methods, an examination of the pictures presented by each analysis is in order. Due to the difficulty in performing pseudo-Mahalanobis' D 2 with a large quantity of traits, maxillary and mandibular traits were considered separately.

Measures of affinity
Results for MMD analyses based on maxillary and mandibular traits can be seen in Tables 3 and 4, respectively. The maxillary traits show a separation between African Americans (AA) and European Americans (EA) at all time periods. There is a closer relationship between early and middle EA than either to late EA. Early AA is different from all groups, with middle and late AA being most like late EA. Analysis of the mandibular traits emphasizes the split between EA and AA and minimizes other details.
Results for the D 2 analyses are summarized in Tables 5 (maxillary traits) and 6 (mandibular traits). The results for the maxillary traits seem to emphasize the time difference between groups rather than differences in ancestry. Late and middle AA and EA cluster most closely, with early AA and EA being very distant from each other and all other groups. The results based on the mandibular trait D 2 are the most difficult to characterize. There is a large difference between early and middle AA, and a relatively small difference between middle and late AA. While the indication that change in the African American gene pool slowed down after the Civil War reflects known historical patterns of admixture (Davis, 1991), it does not explain the apparent similarity of early EA and middle AA, the smallest distance in the matrix. This information is graphically presented in Figure 1, which shows the principal coordinates of the relationships among the six groups resulting from MMD analyses,  Fig. 3. MMD Principal coordinates after procrustes transformation.   Figures 3 and 4 show the relationships between the six samples after rotation and scaling of the principal coordinates for MMD and D 2 , respectively. The coordinates for maxillary MMD results acting as a baseline for both tables. Each of the other groups has been redrawn to its best fit, meaning the one that yields the smallest residual. The residuals between all the groups are summarized in Table 7. There is no test of significance for R 2 , but it can be seen that all the values are relatively small except for between the D 2 for maxillary and mandibular characteristics. It is possible to simplify this table by performing a principal coordinates analysis for this R 2 matrix and display the relationships in the simplest geometric space. A graph of these coordinates shows relationship between the four methods of determining affinity. Figure 5 shows that the two MMD matrices are in nearly perfect agreement. The two D 2 matrices are quite different from each other, but neither is more different from the MMD matrices than the other.

Procrustes analysis
It remains to be explained why the D 2 matrices are so different from each other. One possible explanation is a lack of differences between the samples being studied in these particular traits. In fact, among the traits used for the mandibular D 2 analysis, there is half the average difference in expression between groups as there is in the maxillary D 2 and MMD, and one quarter as much difference as in mandibular MMD.

CONCLUSIONS
Overall, there is very good agreement between the biological distance matrices generated using MMD and pseudo-Mahalanobis' D 2 statistics. Both statistics have their place in the analysis of biological distance, especially when utilizing characteristics of dental morphology. As with all statistics, the MMD and D 2 are limited by the data they analyze. If there is little difference between samples for the characteristics in question, the results will show small distances; if the differences are large for those particular characteristics, the distances will be large as well. A careful evaluation of the data should be made before attempting any measure of affinity.
When there are many traits available for analysis and they have little inter-trait correlation, MMD is appropriate. When the data consist of a relatively few, correlated traits, a pseudo-Mahalanobis' D 2 is more accurately applied, as it makes no assumption about a lack of correlation between traits. In a large study, the use of both statistics may allow analysis of more of the collected data. If all things are equal and either statistic is applicable, MMD is simpler to use and more widely comparable.