Carabelli Trait in Australian Twins: Reliability and Validity of Different Scoring Systems

We assessed the intraand inter-observer reliability of two methods of scoring or categorizing Carabelli trait in both primary and permanent dentitions (Hanihara, 1961; Dahlberg, 1963). By using dental casts obtained from twins, we also compared the expression of Carabelli trait within and between monozygotic (MZ) co-twins to clarify the ontogenetic processes leading to different forms of trait expression. While intra-observer concordance rates were generally good (70 90%), inter-observer concordance rates were poor (35 60%). This indicates that considerable caution is needed when comparing data for Carabelli trait derived from different samples by different researchers. By comparing categories or scores for Carabelli trait in both dentitions of MZ co-twins, we found inter-relationships between groove and cuspal forms of the feature. Although the Arizona State University system developed by Turner is commonly used nowadays to score the Carabelli trait, we would encourage researchers interested in clarifying genetic influences and ontogenetic processes in both dentitions to refer to the often over-looked plaque of Hanihara and also Dahlberg’s plaque P12B. This should improve the reliability and validity of data obtained by helping to clarify the inter-relationships between the different phenotypic expressions of Carabelli trait. Dental Anthropology 2010;23(1):7-15. *Dr. Hasegawa performed this research while on sabbatical leave as a Visiting Lecturer in the School of Dentistry, The University of Adelaide. Correspondence to: Yuh Hasegawa, School of Life Dentistry at Niigata, The Nippon Dental University, 1-8, Hamaura-chou, Chuuou-ku, Niigata-city, Niigata, Japan 951-8580 E-mail: haseyu@ngt.ndu.ac.jp

Carabelli trait is expressed on the mesio-palatal surface of human maxillary molar crowns, particularly primary second and permanent first molars, and the feature shows a quasi-continuous pattern of expression (Harris, 1977). Investigations of Carabelli trait, one of many so-called nonmetric dental crown traits, are usually based on classifying or scoring the feature with reference to standard plaques, leading to calculations of its frequency of occurrence and degree of expression. Although most investigations of Carabelli trait have used standard plaques, there are distinct differences in reported frequencies of the trait in similar population groups, probably resulting more from observational inconsistencies than true variation (Scott, 1980). Misclassification of the trait is further compounded by the relatively large number of different classification methods available to the researcher (Kieser and Merwe, 1984). Recently, the effect of inter-observer errors was reported when using dental morphological features to calculate genetic distances in ancient Mayans, with different 'cut points' for determining presence and absence of traits, such as Carabelli trait, influencing the outcomes of the analyses (Cucina and Wrobel, 2008). Although Carabelli trait has been studied extensively within and among human populations, there is still uncertainty about the validity of the different methods of classification, including which is the most suitable to use in primary and permanent dentitions.
Twin studies provide a valuable approach for clarifying the relative contributions of genetic and environmental effects to phenotypic variability (Eaves, 1982;Townsend et al., 2009). Indeed, a study of Carabelli trait in the permanent dentition of South Australian twins indicated a very strong genetic contribution to observed variation, with an estimate of heritability around 90% (Townsend and Martin, 1992). Pinkerton et al. (1999) extended this earlier investigation by analysing the expression of Carabelli trait in both the primary and permanent dentitions of a large sample of Australian twins, highlighting the importance of genetic influences on Carabelli trait variation and disclosing patterns of variation in trait expression between dentitions.
The aim of our present study was to explore the reliability and validity of two methods for classifying Carabelli trait (Hanihara, 1961;Dahlberg, 1963), by scoring the feature in both primary and permanent dentitions of a sample of Australian twins. By examining trait expression within and between the dentitions of monozygotic (MZ) twin pairs we also aimed to gain some insight into the underlying causes of observed variation, and to clarify which phenotypic forms of Carabelli trait might be more closely related in terms of their ontogeny. Given the strong genetic influence on variation in Carabelli trait, it was hypothesized that any differences in phenotypic expression of the trait within and between MZ co-twins would tend to be small, reflecting environmental and/or epigenetic influences operating during odontogenesis. By comparing classifications or scores for Carabelli trait using the different methods, we aimed to shed light on the validity of the systems, including their ability to reflect ontogenetic processes.

MATERIALS AND METHODS
A total of 200 sets of dental casts, representing 50 pairs of monozygotic (MZ) and 50 pairs of dizygotic (DZ) twins, were examined and scored for Carabelli trait, using the systems of Hanihara (1961) and Dahlberg (1963). The dental casts were selected from a collection of over 600 pairs housed in the School of Dentistry at the University of Adelaide. The twins were all of European ancestry and their ages ranged from 8.3 years to 11.5 years, with a mean age of 9.5 years. Zygosities were confirmed either by comparison of genetic markers in the blood or by DNA analysis of buccal cells (Townsend and Martin, 1992). The probability of monozygosity, given concordance for all the systems that were analysed, was greater than 99.0%. The ongoing study of teeth and faces of Australian twins was approved by the Committee on the Ethics of Human Experimentation, The University of Adelaide (Approval No. H/07/84A), and all participants have provided informed consent.
The methods of Hanihara (1961) and Dahlberg (1963) were used to classify Carabelli trait on primary maxillary second molars and permanent maxillary first molars, respectively. Plaster replicas of the standard plaques provided by Dahlberg (1956) and Hanihara (1961) were used to facilitate standardization in scoring. Dahlberg originally produced two plaques, P12A and P12B, with the 'P' denoting 'preliminary' (Figs. 1a,b). The former  intra-and inter-observer reliability in scoring Carabelli trait, and plaque P12B was used to provide additional insights into variability in trait expression in selected pairs of MZ twins. Although Dahlberg stressed his plaque P12B should not be used to define classes of Carabelli trait, he emphasized that pit and grooves should be noted in addition to cuspal forms. Hanihara's Plaque D7 was also used to score Carabelli trait, with the 'D' referring to 'deciduous' (Fig. 2). It presents eight categories of Carabelli trait and has been used to interpret the relationship between pit and cuspal forms in the primary dentition.
Assessments were made by one observer for all subjects on two separate occasions, enabling an estimation of the intra-observer reliability of both methods to be made. Two broad categories, referred to as 'concavities' and 'convexities', were used to compare intra-observer concordance rates using the methods of both Hanihara and Dahlberg. The 'concavities' category included scores 0 to 3 in Hanihara's system and categories a-d in Dahlberg's P12A system. The 'convexities' category included scores 4 to 7 in Hanihara's system and categories 'e-g' in Dahlberg's P12A system. In his analysis of the American Indian dentition, Dahlberg (1963) grouped the 'b' and 'c' categories together to represent various types of grooves and pits, and then combined the categories 'd' to 'g' to represent all sizes of cusps. We have chosen to include category 'd' as a 'concavity' for the purposes of our reliability tests, reflecting the presence of two grooves or furrows.
To assess inter-observer reliability, ten pairs of twins were selected at random and classified for Carabelli trait by three observers using the methods of Dahlberg and Hanihara. These three observers had different amounts of experience in classifying Carabelli trait. Observer A was a person with considerable experience, observer B had one year of experience, and it was the first time that observer C had scored Carabelli trait. After making their observations, inter-observer concordance rates between the three observers were calculated. Chi-square tests were also performed to compare the scoring of Carabelli trait between methods with statistical significance set at an alpha level = 0.05.
After assessing reliability, Carabelli trait was reexamined in all pairs of MZ twins where co-twins showed discordant expression of the feature by referring to both of Dahlberg's plaques, P12A and P12B, as well as Hanihara's D7 plaque. Given the recognized strong genetic contribution to variation of the trait, it was considered that close examination of those MZ twin pairs who showed different degrees of expression of the feature on primary and permanent teeth, or between sides, would provide additional insights into the validity of the scoring systems and also into the underlying biological processes leading to the observed phenotypes.
has been commonly used for categorizing the size of Carabelli trait in the permanent dentition, whereas the latter, less well-known plaque was designed to highlight groove-cusp morphology, following on from descriptions by Meredith and Hixon (1953). Dahlberg created plaque P12B with the intention of evaluating pits and other surface irregularities found at the sites commonly occupied by Carabelli cusp. He suggested that, for future reference, pits and grooves should be counted as features relative to Carabelli trait, and that plaque P12B might be used to provide a limited guide to the trait's development (Dahlberg, 1956).
In this study, plaque P12A was used for assessing Table 1 shows the intra-observer concordance rates for scoring Carabelli trait on two separate occasions for primary second molars. Values ranged from around 70% to 90% reflecting good intra-observer reliability. A significant difference in concordance rates between the scoring methods was noted for 'concavities' in the DZ sample. In the 'convexities' category there was a significant difference in concordance rates between the methods for MZ twins and for the total sample (Table 1). Table 2 shows the concordance rates between first and second assessments for permanent first molars. Values ranged from 75% to 85%. No significant differences in either the 'concavities' or 'convexities' categories were found between the methods. Table 3 indicates the inter-observer concordance rates among the three observers for scoring Carabelli trait on primary second molars and Table 4 provides similar data for permanent first molars. The concordance rates were generally low, highlighting that inter-observer reliability for scoring was relatively poor. Using the method of Hanihara, the concordance rate between observer A and C was highest, followed by the rate between observer B and C, and the rate between observer A and B was lowest for both primary second molars and permanent first molars. Using Hanihara's method, the concordance rate between observer A and C was 65% for primary second molars and 40% for permanent first molars. The concordance rates between observer B and C, and between observer A and B, were around 35% for both primary second molars and permanent first molars.

RESULTS
Using the method of Dahlberg with plaque P12A, the concordance rate between observer B and C was 62.5% and the rates between observer A and B and between observer A and C were each 47.5% for primary second molars. The concordance rate between observer A and B was 47.5% and the rate between observer A and C was 45%, but the rate between observer B and C was only 35% for permanent first molars. There were differences between observers regarding the interpretation of what constituted a groove or an eminence in both Hanihara's and Dahlberg's systems. The observers also had difficulty in classifying both the pit and Y-shaped categories using Hanihara's system, and there were differences in interpretation between the groove, Y-shaped, and cuspal grades in Dahlberg's system. Where there were differences in classification or scoring of Carabelli trait within or between MZ cotwins, the differences tended to be small, as we had hypothesized. By examining closely the cases where there were differences between sides or between primary and permanent dentitions within an MZ twin, or differences between MZ co-twins, we were able to gain some insight into the ability of the different classification systems to reflect the phenotypic variation observed, and also to clarify how each category or score related to others. Figures 3 and 4 represent two pairs of MZ twins who were selected because they showed discordant expressions of Carabelli trait that assisted in considering the validity of the Dahlberg and Hanihara systems. Table  5 shows the categories and scores for the trait, based on Dahlberg's plaques P12A and P12B, and also using Hanihara's plaque, for both the primary second molars and the permanent first molars in these two pairs of twins. The results provided in Table 5 were obtained by three observers each scoring the feature independently, then reaching a consensus on which category or score best matched the phenotypic expressions observed. It can be seen that there were differences in expression both within and between the twin pairs. For example, the primary and permanent molars for T331A were all scored as category 'b' according to Dahlberg' Table  5. Similarly, there were differences in the categories and scores recorded for twins T338A and B. In these cases, the expression of Carabelli trait was greater on the primary molars than the permanent teeth, and there were also differences in expression between sides and between co-twins. The reader is encouraged to view the figures carefully and then to score the different teeth in both sets of twins. It becomes evident that the different phenotypic forms of Carabelli trait do seem to be linked to each other but there are many forms of the feature that are difficult to classify with any certainty.

DISCUSSION
The method of Dahlberg (1963) has been used commonly by many researchers to classify Carabelli trait on permanent first molars, although there have been numerous scoring methods developed over the years, including Shapiro's (1949) nine-grade classification, Goose and Lee's (1971) five-grade classification and Alvesalo et al.'s (1975) five-grade classification. Currently, the most widely used method for classifying Carabelli trait in the permanent dentition is The Arizona State University Dental Anthropology System devised by Christy G. Turner and his colleagues (Turner et al., 1991). This method is based on Dahlberg's plaque P12A but the categorical classification system of Dahlberg has been replaced by a numerical system from 0 to 7. The categories and the scores match reasonably well, although scores 3 and 4 in Turner's system refer to small and large Y-shaped depressions, whereas categories 'd' and 'e' on Dahlberg's plaque P12A represent a double groove and a Y-shaped groove, respectively.
Dahlberg's P12A plaque includes absence and seven degrees of expression of Carabelli trait, ranging from a single groove (or so-called 'furrow'), a pit, a double groove, a Y-shaped groove, to various sizes of cusps. In this scheme, categories 'f 'to 'h' represent increasing sizes of cusp. However, his P12B plaque does not address any size sequence, rather it considers pit-groove relationships. Although this plaque does not appear to have been used very widely in the past, it did assist the observers in this study to focus on the inter-relationship among pits, furrows and grooves, and cusps of various sizes. In cases where Carabelli trait was difficult to categorize, reference to P12B provided additional guidance in deciding which category to choose. Although Dahlberg's method was developed for the permanent dentition, it has been used to score Carabelli trait in both the primary and mixed dentitions (Pinkerton et al., 1999) with additional reference to the plaque of Hanihara (1961).
As this study progressed it became clearer that there were some discrepancies in the expression of Carabelli trait between the primary and permanent dentitions. The primary molars tended to display a higher frequency of Y-shaped groove forms, whereas cuspal forms were more common in the permanent dentition. This finding has been reported previously by other researchers (Saunders and Mayhall, 1982;Pinkerton et al., 1999;Adler, 2006). Kieser (1984) examined the expression of Carabelli trait on primary and permanent molars and reported a high degree of equivalence of expression of Carabelli trait in both dentitions. He hypothesized that this result was consistent with low epigenetic but high genetic influence on Carabelli trait expression. We have noted previously that, if the trait appears on the permanent first molar of an individual, it is almost always present on the primary second molar. However, if the trait appears on the primary molar, it may not be expressed on the permanent molar. Consistent with Kieser's view, we have interpreted this finding as reflecting similar underlying genetic influence for Carabelli trait in both dentitions, with environmental and/or epigenetic influences being more likely to modify trait expression on the permanent molar that forms later and develops over a longer period of time (Townsend and Brown, 1981).
The plaque D7 of Hanihara was designed specifically to score Carabelli trait in the primary dentition and, therefore, some limitations were noted when attempting to use it to score different convexity categories in the permanent teeth. Interestingly, Hanihara's description of his system does not refer to Y-shaped grooves specifically, rather the term 'depression' is used. Nevertheless, the examples of depressions provided on Hanihara's plaques do have a characteristic Y-shaped appearance. Dahlberg's P12A system provides a comprehensive categorization of the cuspal categories of the trait but it does not address the peculiarities of the various pit/groove relationships to any extent. For example, it is often difficult to decide whether a short groove that ends in a deeper depression should be classed as a groove or a pit. It is also often difficult to determine whether double grooves lie either side of a slight elevation that would warrant a cuspal classification. Similarly, Y-shaped grooves may or may not be associated with a convexity of the lingual surface of the tooth.
Despite these difficulties, it appears that an acceptable level of intra-observer reliability can be reached for scoring Carabelli trait using the methods of either Dahlberg or Hanihara. We achieved concordance values in the range of 70-90%. Observers tend to develop their own internal calibration for classifying difficult examples of the trait that is based on their interpretation of the system of classification being used. It would appear that it is probably best to use the Dahlberg system when classifying Carabelli trait in the permanent dentition and the Hanihara system in the primary dentition, while acknowledging that each method has its limitations. However, the level of inter-observer reliability was very low whichever method was used in either dentition.
Our concordance values were in the range of only 35-60%. This finding reinforces the view that considerable caution is needed when making comparisons of data for Carabelli trait derived from different samples by different researchers.
For studies of the mixed dentition, where a uniform system of classifying Carabelli trait on both primary and permanent molars is desirable, it is suggested that a modified system could be used that draws on the methods of both Hanihara and Dahlberg. It is interesting that the Arizona State University (ASU) system for classifying Carabelli trait in the permanent dentition is slightly different from the system proposed originally by Dahlberg, with the 'double groove' category of Dahlberg replaced by a 'Y-shaped groove' category (Turner et al., 1991;Dahlberg, 1963). Even though it was developed for the permanent dentition, the ASU system, with its use of scores rather than categories and its modification of the original Dahlberg system, provides an additional very useful perspective for attempting to classify the range of expression of Carabelli trait in both dentitions.
Although distinguishing and classifying minor differences in phenotypic expression of Carabelli trait may not be as important in population-based anthropological studies as deciding whether the trait is present or not, we contend that fine discrimination in phenotypic expression is desirable in genetic studies and also in clarifying ontogenetic processes. We would propose for these types of studies that all available reference sources should be considered, including Dahlberg's plaque P12B, to assist in describing and then recording the rather complex inter-relationships between grooves and cusps.
The variations in expression of Carabelli trait demonstrated in the two pairs of MZ twins reported in this paper highlight the wide range of expressions of the trait that are possible and confirm that no single scoring system is likely to be able to capture all possible phenotypic forms. The two examples we have provided also support the view that, despite a strong over-riding genetic influence on observed variation, relatively minor modifications in environmental and/or epigenetic influences within or between co-twins can apparently lead to different phenotypic expressions in Carabelli trait.
The types of expressions of Carabelli trait observed within the MZ co-twins, particularly in terms of the expression of different groove forms, confirm that there is an inter-relatedness between groove forms and cuspal forms of the trait. Our findings in twins suggest that increasing expression of Carabelli trait follows a continuum from simple grooves, to pits, to double grooves, to Y-shaped grooves, and then to cusps of various sizes, in a similar order to that represented in Dahlberg's plaque P12A. Even though Carabelli trait has probably been studied by dental anthropologists more than any other dental feature, there is still much to learn about the nature of the ontogenetic mechanisms that lead to its various expressions on primary and permanent molar teeth. We would strongly encourage researchers who are planning to study Carabelli trait to refer to the the plaque of Hanihara and plaque P12B of Dahlberg prior to commencing any study, as these earlier, often over-looked works, provide valuable insights into the rationale and limitations of the classification systems used most commonly nowadays, for example, the ASU system which is based on Dahlberg's plaque P12A.
One area that deserves further exploration is comparison of the expression of Carabelli trait on the external surface of dental crowns with its expression at the dentino-enamel junction, a structure that reflects the folding of the internal enamel epithelium of the developing tooth. Researchers such as Kraus (1952), Korenhof (1963), Sasaki and Kanazawa (1999), Avishai et al. (2004) and Skinner et al. (2009) have all explored the morphology of the dentino-enamel junction using different approaches. We plan to extend these studies by applying micro-CT scanning to exfoliated molar teeth of MZ twins where there are differences in phenotypic expression of Carabelli trait within and between co-twins.
In conclusion, we would like to reiterate the comments of Mayhall (1999) who emphasized the need for "more and better genetic studies" of dental morphological traits and the need to improve our understanding of "why the traits we observe are as they appear."