The Impact of Dental Wear on the Analysis of Morphological Affinities based on Dental Non-metric Traits

Dental wear is described as a limitation to dental morphological studies, as it obscures important crown trait features, resulting in significant differences on trait frequencies, an essential component for estimating biodistances. However, the actual impact of dental wear on biological distances still requires further characterization. We explore the impact of dental wear on morphological affinities for Brazilian pre-colonial series in the context of worldwide reference series. Twenty crown traits were scored using the Arizona State University Dental Anthropology System, and dental wear was quantified as an ordinal scale between 1 (no wear) and 8 (crown eroded). Eight crown trait frequencies are significantly associated with dental wear (p<0.05), demonstrating its impact on their analysis. To explore this impact on biodistances, data were divided by wear categories (all teeth, low-wear, moderate/severe wear) and morphological affinities among series were compared through Euclidean distances, Mean Measure of Divergence, and Principal Component Analysis. Results show that the impact of wear is only meaningful when a sample contains many wear-biased traits with only moderate/severe wear. We conclude that, despite the impact of wear on individual trait frequencies, its impact on morphological affinities can be mitigated by including other variables or when comparisons focus only on large-scale biological differences.

Dental morphology played an important role in discussing the settlement of the Americas since the first half of the 20 th century (Dahlberg, 1945;Hrdlička, 1920Hrdlička, , 1921. After the development of standardized methods for data collection, such as the Arizona State University Dental Anthropology System (ASUDAS), studies of dental non-metric traits increased significantly over the years (Powell, 1995;Scott & Irish, 2017;Scott & Turner II, 1997;Sutter, 2005;Turner II et al., 1991). However, as researchers studied different archaeological series, different conclusions were drawn about how the Americas were first settled by modern humans. Some studies argue that all Native Americans are more strongly related to each other than to any other group outside the Americas, and share a rather homogenous dental morphological pattern, related to Northeast Asians populations that first crossed the Bering Strait (Greenberg et al., 1986;Scott, Schmitz, et al., 2018;Scott & Turner II, 1997;Turner II, 1990;Turner II & Scott, 2013). Others suggest phenotypic variation within the Americas is larger, with some Native American groups biologically related to Southeast Asians, meaning that at least two distinct biological populations crossed the Bering Strait during pre-colonial times (Haydenblit, 1996;Huffman, 2014;Lahr & Haydenblit, 1995;Ortiz, 2013;Powell, 1995Powell, , 1997Powell & Neves, 1998;Powell & Rose, 1999;Sutter, 2005Sutter, , 2009. This discrepancy in narratives has often been attributed to issues regarding the replicability of ASUDAS, as observer error is often an anticipated concern (Marado, 2017;Nichol & Turner, 1986;Wu & Turner, 1993). Also, the combination of which morphological traits are used to assess biological affinities may have an important influence on the results (Rathmann & Reyes-Centeno, 2020). Furthermore, dental wear has also been suggested as a noteworthy concern on its own, causing bias in scoring non-metric traits (Burnett et al., 2013;Burnett, 2016;Stojanowski & Johnson, 2015). Dental wear is a physiological phenomenon on which tooth enamel and dentine are gradually worn over time by attrition, abrasion and/or erosion mechanisms (Kaidonis, 2008). Many dental non-metric traits are features located in the tooth crown, so dental wear may gradually erase morphological details and impact scoring decisions (Scott et al., 2016). The effects vary for each particular trait and can result in the under-estimation of trait frequencies (i.e., attributing lower grades or absence to traits that should be scored as higher grades or present), or over-estimation of frequencies (i.e., higher trait expressions are scored regardless of wear, but lower/absent expressions are scored as missing data under the same circumstances) (Burnett et al., 2013;Burnett, 2016). If the error in the estimations of trait frequency is significantly biased between teeth with low and moderate/ severe wear, it violates the assumptions that samples have data missing completely at random (MCAR) (Burnett et al., 2013;Stojanowski & Johnson, 2015). Data MCAR is a central tenet in the reconstruction of population parameters based on samples, because it means missing values follow the same distribution as the observed values (Bhaskaran & Smeeth, 2014), and therefore information about the population has not been skewed by the data that was not observable.
Many non-metric dental traits have been shown to be susceptible to wear-related bias: shoveling UI1, cusp number LM2 (Burnett et al., 2013;Stojanowski & Johnson, 2015), distal accessory ridge UC, mesial canine ridge UC, accessory ridges UP, lingual cusp number LP2, hypocone UM2 (Burnett et al., 2013;Burnett, 2016), double shoveling UI1, enamel extensions UM1, deflecting wrinkle LM1 (Stojanowski & Johnson, 2015). However, the clear impact of wear-biased traits on multivariate analysis has not been formally evaluated. It is possible that a certain amount of error is acceptable as long as it does not change the interpretations of the results. In other words, the measured attributes are still valid as long as they are meaningfully reflecting real biological relationships (Houle et al., 2011).
Going back to the example about the peopling of the Americas, the debate around dental wear is particularly relevant. Although there is consilience that Native Americans share a recent common ancestor with Asians, there is no clear agreement about which Asian dental complex they are more related: 1) a specialized pattern which emerged approximately between 20 and 11 thousand years ago (kya) in Northeast Asia, with high frequencies of shoveling UI1, double shoveling UI1, one-rooted UP1, enamel extensions UM1, pegged/reduced/ missing UM3, deflecting wrinkle LM1, threerooted LM1; commonly referred to as the Sinodont pattern (Turner II, 1989, 1990; or 2) a generalized and more simplified pattern which appears between 25 and 40kya in Southeast Asia (Turner II, 2006), with lower trait frequencies of the same above-mentioned traits, and a higher frequency of four-cusped LM2; commonly described as the Sundadont pattern (Scott, Schmitz, et al., 2018;Turner II, 1990).
Some authors suggest that Native Americans have a different derived dental morphological pattern from both Sinodonts and Sundadonts (Scott, Schmitz, et al., 2018;Stojanowski & Johnson, 2015). While keeping ties to Sinodont groups such as Northeast Asians, Native Americans have even higher trait frequencies of some traits (e.g., shovel-ing UI1, double shoveling UI1), which can be viewed as "super-Sinodont" (Scott, Schmitz, et al., 2018). In other words, it seems that there are considerable differences on the dental morphological patterns between Native American and Asian populations, which is even larger in some traits than the differences observed between Asian Sinodonts and Sundadonts (Scott, Schmitz, et al., 2018).
To contribute to this discussion, and at the same time to illustrate the impact of dental wear in dental non-metric analyses, we present a case study of a Brazilian coastal series dated to between 10.0 and 1.0 kya. Our study subsets this dataset into different series based on dental wear degrees and compare their morphological affinities within a global reference framework, using a combination of only wear-biased traits, only unbiased traits, and all traits pooled together. These analyses aim to improve our understanding of the impact dental wear has in multivariate statistical analyses, and to explore if at any point dental non-metric traits stop being meaningful markers of biological relationships.
A total of 20 crown traits from ASUDAS were scored (Scott & Irish, 2017;Turner II et al., 1991), and dental occlusal wear was noted according to Smith (1984). To improve sample sizes, we used the total tooth count method to calculate trait frequency: when available, both antimeres were scored for each trait, and sample frequencies were calculated by dividing the total number of positive expressions by the total number of teeth analyzed (Scott, 1980). While this approach may add redundant information to the data, as individuals are often scored twice (Scott, 1980;Scott & Irish, 2017;, previous studies have shown that results based on individual and total counting methods produce very similar results, and thus can be used for comparative purposes (Marado, 2014;Scott, 1980). As the main goal of this study is to explore the impact of wear bias on the estimations of morphological affinities, we opted for the method that would maximize the number of teeth and dental wear information included.
Intra-observer error of dichotomized traits was calculated with a subsample of 128 individuals, analyzed by the first author twice with approximately one month interval between analyses. Only teeth that were scored for dental wear were considered in this analysis, and Cohen's Kappa coefficient of agreement was used to assess the level of agreement between analyses. Kappa's values were classified as follows: 0.00-0.20 (slight agreement); 0.21-0.40 (fair agreement); 0.41-0.60 (moderate agreement); 0.61-0.80 (substantial agreement); 0.81-99 (almost perfect agreement) (Landis & Koch, 1977).
To test the impact of wear on morphological affinities among series, we only included teeth scored for both dental wear and morphological traits, and followed a similar approach to Burnett (2013): three categories of dental wear were established based on the scale of Smith (1984): low wear (Grades 1-3); moderate wear (Grades 4-5); and severe wear (Grades 6-8). As there were very low sample sizes of traits scored on teeth with severe wear, we combined teeth with moderate or severe wear. Afterwards, we compared trait presence and absence between low and moderate/severe wear groups using Fisher's Exact tests.
Finally, we evaluated the morphological affinities among series through multivariate exploratory analyses, comparing our samples with other skeletal series from Southeast Asia, Asia, Circumpolar, North America, Mesoamerica, and South America (Scott & Irish, 2017). All data tables used for com-parative purposes are available in Scott and Irish (2017). Furthermore, we split our sample into three series based on dental wear categories: 1) Brazilian coast, which includes all teeth regardless of dental wear; 2) Brazilian coast (low wear), which excludes teeth with moderate/severe dental wear; 3) Brazilian coast (Mod/Sev wear), which uses only teeth with moderate/severe occlusal dental wear. We recognize it is unlikely for a researcher to select only moderate/severe wear traits in any study on dental morphology. However, some archaeological series are very limited, and sometimes only composed by individuals with substantial amounts of dental wear. Thus, we use this series as a way to infer the maximum amount of error that can result from the use of only teeth moderately to severely worn out.
To assess possible trait correlations between groups, and check if correlations varied significantly between combinations of wear-biased and/or unbiased traits, Spearman correlations were calculated over trait frequencies of three different data sets: A) Only wear-biased traits; B) Only unbiased traits; and C) all traits combined. To mitigate the impact of multicollinearity, for each highly correlated pair of variables (r≥0.7), we removed one of those traits from the multivariate analyses.
Next, Euclidean distances and Mean Measure of Divergence without sample size correction were calculated for each of the three datasets, and represented through Kruskal Multidimensional Scaling. Mantel matrix correlation tests were applied to compare distance matrices generated by both methods for each dataset to test the level of similarities between them. The morphological affinities were also explored through Principal Component Analysis, and the first two principal components were extracted from the average trait frequencies for the series and represented in a scatterplot.
Together, these different multivariate analyses allow us to evaluate the impact of wear biases in estimating morphological affinities (and biological relationships) among samples, by illustrating to what degree the inclusion of biased frequencies affect the overall pattern of affinities among series when inserted in a broader comparative framework. Furthermore, as we expect our samples to share a Native American dental complex, as suggested by Turner and Scott (2013), any deviation from this cluster may lead us to assume that dental wear can shift the results significantly, enough to bias our ancestry estimations at a worldwide scale, as suggested by some authors (Scott, Schmitz, et al., 2018;Turner II, 2006;Turner II & Scott, 2013).
All statistical analyses were done in R (R Core Team, 2020), with functions written by two of us (MH and DF), and complemented by the packages ggplot2 (Wickham, 2016), ggfortify (Tang et al., 2016), MASS (Venables & Ripley, 2002), vegan (Oksanen et al., 2013), and irr (Gamer et al., 2012). Figure 1 shows the intra-observer error for all the analyzed traits in this study. While most traits show substantial agreement or higher, three traits only reached moderate agreement (metaconule UM1, anterior fovea LM1, and groove pattern LM2), and so should be considered with caution. These traits are also traits that show significant bias from dental wear ( Table 1), suggesting that dental wear may play a role in the consistent scoring of these traits. However, it is worth noting that some traits with almost perfect agreement are also wear biased, one by underestimation (double shoveling UI1) and the other by overestimation of trait frequencies (shoveling UI1). Therefore, the role of wear on the replicability of trait analysis depends on the type of trait and should be assessed accordingly. Figure 2 shows the distribution of dental wear grades for each scored morphological trait. As can be seen, the distribution of trait scores is largely similar between teeth with low and moderate/ severe dental wear. Table 1 shows the sample sizes and trait frequencies for the series in this study, as well as the results for the Fisher Exact tests comparing trait frequency by wear degrees. There were 4,191 dental trait scores in total, 2,069 on teeth with low wear and 2,122 on teeth with moderate/severe wear. Half of the dental traits have larger sample sizes on teeth with low wear, and the other half have larger sample sizes on teeth with moderate/ severe wear. However, in some traits there is a clear larger sample size for teeth with low wear (e.g., deflecting wrinkle LM1 and accessory ridges UP2). Eight of the 20 traits (40%) show significant dental wear bias (p<0.05). The effect of dental wear varies between traits: shoveling UI1 and hypocone UM1 are biased towards increased trait frequencies (25%, 2/8) whereas the remaining trait biases (75%, 6/8) resulted in the underestimation of their frequencies. Therefore, in the Brazilian context, as in other studies, dental wear is more prone to bias traits by underestimating their frequencies (Burnett et al., 2013).

Results
In the analyses comparing the Brazilian series with the reference series, the following traits were excluded because they are not available from Scott and Irish (2017): anterior fovea LM1, accessory ridges UP2, and accessory cusps UP1. This resulted in a dataset of 17 traits, seven of which show significant wear bias. We calculated the absolute mean difference of each trait between all pairs of reference series and compared it with the frequency differences observed between low and moderate/severe wear groups (   continental biological profiles. Although seven wear-biased traits show significant wear bias (see Table 1), only three of them (Carabelli cusp, hypocone, and protostylid) show wear bias that exceeds the average difference in the reference series. Therefore, most traits in the Brazilian series show wear biases that are smaller than the majority of differences among the reference series. These three traits should be considered as the most problematic, and may lead to a more significant bias in the patterns of morphological affinities observed in our data. Before running the multivariate analyses, Spearman correlation tests among the 17 traits were done to check for collinearity of variables. The correlation tests revealed a strong correlation (r≥0.7) between double shoveling UI1 and groove pattern LM2 (SI2). Therefore, groove pattern LM2 was removed from the analyses using all 17 traits (dataset C). When testing correlations among wear-biased (dataset A) and unbiased (dataset B) traits, no strong correlations were found (SI2), and no traits were removed from the analyses with these datasets.
The results of the multidimensional scaling based on Euclidean distances ( Figure 3) and Mean Measures of Divergence (Figure 4) show very similar results, as the two distance measurements show extremely high correlations (Mantel correlation tests: r=0.950, p≤0.001 for biased traits; r=0.960, p≤0.001 for unbiased traits; r=0.961, p≤0.001 for combined traits). Each of the distances matrices produced in this study can be accessed in Supplementary Information 3 (SI3).
The analyses using datasets with biased ( Figure  3A and Figure 4A) and with combined traits ( Figure 3C and Figure 4C) show a cluster composed by Asian and Southeast Asian groups, a second cluster formed by North American and Circumpolar series, and a third cluster mostly formed by Mesoamerican and South American series. Greater Northwest coast is a constant outlier for North America, since it is within the expected variation for Mesoamerica/South America. Japan is also an outlier of the Asian cluster, standing between them and Mesoamerica/South Americans. Finally, in both Euclidean distances and Mean Measure of Divergence, the Brazilian coast series are within the Mesoamerica/South America cluster, with the wear bias pushing the series slightly away from this cluster.
However, the results using only unbiased traits ( Figure 3B and Figure 4B) show important differences from the other analyses. In this case, there are only two clear clusters, one made of Asian and Southeast Asian groups, and another composed by Circumpolar, North American, Mesoamerican and South American series. This reduced number of traits reduces the ability of the analysis to discriminate among most of the geographical regions represented in the reference dataset, which suggests that the inclusion biased traits may be important to infer population structure within the Americas. In other words, this exercise illustrates the fact that removing wear-biased traits may sometimes be more harmful to the study of morphological affinities than their inclusion. Regarding our particular samples, the Brazilian Coast series, although closer to the Native American cluster, is still considerably distant from it, which to some extent may highlight some degree of inter-observer error between the first author of this study and Christy Turner II, who analyzed the worldwide comparative samples (Scott & Irish, 2017). Nevertheless, the Brazilian series appear close to each other, irrespective of the degree of wear considered, which shows that wear bias by itself is not enough to cause the association of series with another geographic region, as suggested before (Turner II, 2006;Turner II & Scott, 2013).
The Principal Component Analyses ( Figure 5) show very similar results to Euclidean Distance and Mean Measure of Divergence and helps to identify traits responsible for the population structure within the Americas discussed previously. Shoveling UI1 and double shoveling UI1 are particularly relevant traits to distinguish between Circumpolar/North America and Mesoamerican/ South American series, with frequencies being higher on Central and South Native American groups ( Figure 5A and Figure 5C). As these traits are missing on the unbiased dataset ( Figure 5B), the distinction between Circumpolar, North Americans, Mesoamericans, and South Americans is not evident. Finally, overall, these results reinforce that despite significant differences in frequencies due to dental wear, these differences are not strong enough to change the relative pattern of morphological affinities of the Brazilian series when inserted in a large comparative framework.
Nevertheless, among Brazilian Coast series with different amount of wear, there is a pattern where the subset using only teeth with moderate/severe wear is more separated from other groups (the only method where this pattern is not observed is on the Principal Component Analysis). This suggests that although using only teeth with moderate/ severe wear may not change the overall interpretations of the morphological affinities of the series, it is still adding error to the interpretations, especially if the analysis is concerned with patterns of associations within smaller geographic scales.

Discussion
The results show that several traits in the Brazilian series present significantly different frequencies between low and moderate/severe wear groups (shoveling UI1, double shoveling UI1, metaconule UM1, Carabelli cusp UM1, hypocone UM2, anterior fovea LM1, protostylid LM1, entoconulid LM1). These differences can result in variation of up to 31.2% (anterior fovea LM1) of the frequency of the traits. However, when some of these traits are included in multivariate analyses along with other traits, this discrepancy is mitigated, as our series appear closely associated to each other in most analyses, despite the significant trait frequency differences among them. Discrepancies in the multivariate analyses are only relatively important when a series is composed exclusively of teeth with moderate/severe dental wear and when all traits show significant dental-wear bias. But even in these cases, our results do not indicate drastically different patterns of morphological affiliation of the Brazilian series. In reality, we see more important deviations from this pattern when biased traits are removed (see Figures 3B and 4B), suggesting that the removal of biased traits may not be always the ideal solution for studies of dental morphological affinities.
The main reason for these discrepant results, where individual traits show significant differences, but they do not impact the overall pattern of morphological affinities in multivariate space is due to the fact that individual trait frequencies have a small contribution to the overall position of the series in the multivariate space. Even though the frequency of traits can vary as much as 31.2% in some traits, this variation is only a small portion of the final distances between group or have a small contribution to principal component score of that group. Given that the wear bias for most variables is smaller than the average difference seen among the reference series (see Table 2), this small contribution of each trait to the final multivariate results does not significantly impact the pattern of morphological affinities among them. In other words, the wear bias in this case represents a small fraction of the total variance seen among series in the data.
These results support that, even though trait frequency differences should not be overlooked, wear-biased traits should still be considered in studies that are trying to contextualize the morphological affinities of series within larger comparative frameworks (i.e., in situations where it is ex- pected that the wear bias is consistently smaller than the average differences among comparative series; see Table 2). Therefore, our results suggest that it is possible to contextualize better the validity of wear-biased traits in studies of morphological affinities, especially when these traits represent important components of the biological profile of populations. Shoveling UI1 and double shoveling UI1 are two examples of traits that have been noted to be wear biased in different independent studies (Burnett et al., 2013;Stojanowski & Johnson, 2015), including ours. However, they are also very important when characterizing dental variation patterns between Asian and Native American groups (Scott, Schmitz, et al., 2018;Turner II, 1990). When combined with other ASUDAS traits, although biased by dental wear (shoveling UI1, p=0.043, double shoveling UI1, p=0.042) they did not have a significant impact on the pattern of morphological affinities of the Brazilian series in relation to the Mesoamerica/South America cluster. Therefore, in response to the claims that dental wear may be responsible for the dental variation researchers have found within Native American groups (Turner II, 2006;Turner II & Scott, 2013), we argue that it seems rather unlikely, for it would require several traits to have wear-biased frequencies causing differences of the same order of magnitude of what is observed between continents, which is not the case in our analyses.
Our analyses do not show any strong morphological affinities among Native Americans and Southeast Asian groups (Scott & Turner II, 1997;Turner II & Scott, 2013). In this study, as in previous studies, Brazilian coast series are within the dental phenotypic variation of Native Americans (Turner II & Scott, 2013). This occurs in all multivariate analyses, independent of wear-biased traits, or sub-sampled series based on dental wear grades. This is another argument to take into account when excluding teeth or variables based in dental wear alone. In a large scale of analysis, if wear-bias is not very significant, and series are not composed exclusively by teeth with moderate/ severe dental wear, removing worn teeth may cause the removal of important diagnostic traits, potentially resulting in more meaningful changes in morphological affinity patterns than if wearbiased traits are kept in the analyses. This is illustrated well by our analyses using only unbiased traits. Furthermore, this also offers some confidence to the interpretation of multivariate morphological affinities of series for which there is no precise information about their dental wear. Alt-hough it is often a standard data-collection procedure, not many studies report dental wear grades in dental morphological studies. Our study shows that, although this would be optimal to interpret possible discrepancies between series, it does not imply that such comparisons should not be made when the scale of the variance in the comparison framework is larger than the variance that results from wear bias. Caution must be taken, however, when contextualizing populations within smaller regional contexts, or within populations that share strong morphological affinities, as in these cases the wear-bias can be higher than the differences that define the biological affinities among series. Therefore, the scale of analysis is essential in making the decision of whether to included wearbiased traits, and we recommend that future studies consider the relationship between the variance in the data that can be the result of wear-bias and the variance that is the result of difference between series. As long as the latter is larger than the former, wear-biased traits can be informative of morphological affinities and could be considered in the analyses.
Finally, we agree with previous claims that dental wear is more susceptible to downgrading morphological traits (Turner II, 2006;Turner II & Scott, 2013). Out of the eight identified wear-biased traits, only 25% were biased towards increasing their frequency (2/8), and the remaining 75% (6/8) resulted in the underestimation of the frequencies.
As occlusal wear increases, the features of each crown trait become less pronounced, leading the observer to score lower grades, when they should have been scored as not observable. This may occur partially due to the unconscious necessity of an observer to reach substantial sample sizes.

Conclusions
Our study corroborates previous studies showing dental wear bias is a valid concern when analyzing dental non-metric traits, and its assessment should become standard procedure in future studies whenever possible (Burnett et al., 2013). However, while wear-biased traits have an impact on trait frequencies, when combined with other variables, and in a large scale of analysis, its impact may be not meaningful in interpreting the patterns of morphological affinities among series. This impact is directly dependent on the scale of analysis, and regional studies must be more cautious in the inclusion of wear-biased traits, as in contexts with relatively small differences among groups, wearbias can become meaningful. In other words, the scale of analysis is a key factor when deciding whether to use wear-biased traits.
We hope this study offers a more optimistic perspective about the impact of dental wear in dental morphological studies and gives a better perspective on how meaningful wear-related bias affects the interpretations of morphological affinities among past populations. Our study suggests that eliminating worn teeth by default may not always be the best solution, since it may exclude important discriminatory variables, or invalidate future studies due to a significant reduction on sample sizes.