Comparison of the reproducibility of two cervical vertebrae maturation methods

Aim: Facial orthopaedic treatments based on the stimulation or restrictions of craniofacial bone growth are more effective when carried out during the pubertal growth spurt. The aim of this cross-sectional study was to evaluate the reproducibility of two cervical vertebrae methods (CVM) with manual tracing and direct visual inspection. Methods: A sample of 60 lateral cephalometric radiographs (10 of each of the 6 CVM stages) was randomly selected from 171 records. 5 orthodontists classified these radiographs according to the skeletal maturation stage in 2002 and 2005, and the application of both methods was conducted by direct visual inspection and evaluation through manual tracing. Results: The average reliability of the two methods determination and the two forms of evaluation was substantial. The direct visual inspection evaluation showed the highest reliability and agreement interexaminer values for both methods, as well as the intraexaminers evaluation. Conclusion: The reproducibility of CVM method was substantial, indicating its clinical use to determine the skeletal maturity and the ideal moment for treatment execution.


Introduction
Facial orthopaedic treatments based on the stimulation and/or restriction of craniofacial bone growth are more effective when carried out during the pubertal growth spurt, as during this period, the facial bone structures perform at maximum capacity in response to stimuli offered by applied mechanics 1,2 . Therefore, the determination of skeletal maturation is widely used, as the chronological age has been considered a parameter of little reliability to assess the craniofacial development stage of the subject 1,3-5 .
The degree of skeletal maturation of craniofacial bones can be determined by the hand-wrist x-ray or by the evaluation of the cervical vertebrae maturation [6][7][8] . The hand-wrist x-ray evaluation is considered the gold standard method, since it allows the evaluation in a small area of the development of a great number of ossification centres, that present close relation with the whole skeletal bone development 6 . However, the evaluation of the cervical vertebrae maturation stage has the advantage of reducing costs and the patients' exposure to X-rays 9,10 , as the cephalometric radiograph is part of the initial orthodontic documentation 5,8,10,11 .
In 2002, Bacetti et al. 12 published a modified version of the method of determining skeleton maturity from the analysis of cervical vertebrae. With five stages of CVM, this method made determining skeletal maturation possible through the vertebrae C2, C3 and C4 using only one cephalometric radiograph. Later, in 2005 13 , the same authors presented an improved version of the method to evaluate CVM with six stages of MVC, which allowed the physician to identify the ideal moment for dental skeletal disharmony treatment.
According to Gabriel et al. 7 (2009), the clinical usage of CVM analysis must be conditioned to its accuracy and reproducibility. Although some studies have reported reproducibility levels over 90% 6,14-16 , they show some methodological flaws that interfere directly with these findings 7 .
Cunha et al. 17 (2018) showed that there was no significant difference when comparing the reproducibility of the evaluation of skeletal maturity through hand-wrist to cervical vertebrae x-rays, and both methods were considered useful for clinical planning. A systematic review by Santiago et al. 18 (2012) showed that the levels of scientific evidence related to the reliability of CVM to foresee the pubertal bone growth spurt is low due to few amounts of studies on the subject, even though some studies report a good correlation between both methods and considerable levels of reproducibility.
The aim of this study was to evaluate the reproducibility of two cervical vertebrae methods (CVM) with manual tracing and direct visual inspection. The study hypothesis is that the reproducibility of the two methods is sufficient to determine the skeletal maturity of young patients.

Materials and Methods
initially, in this cross-sectional study, pre-treatment records of 171 subjects treated on Federal University of Juiz de Fora were selected between the ages of 7 and 18 years old, regardless of sex, and who did not present: 1) history of facial, hand or wrist trauma; 2) congenital malformations nor acquired ones affecting the cervical vertebrae, hand or wrist; or 3) any syndrome or hormonal alteration associated with development alterations. Moreover, subjects would have to present good quality of both hand-wrist and lateral cephalometric x-rays that were taken on the same date. This study was approved by the Ethics Committee of Federal University of Juiz de Fora (comment number: 2.634.344).
In order to guarantee the homogeneity of the sample, the hand-wrist x-rays were evaluated (by a single researcher -P.H.R.D.) and the subjects were classified in one of the 11 stages according to the Fishman's method 19 . Subjects were also classified in CVM stages based on the correlation between this method with the stages established by Fishman 19 , as suggested by Hassel and Farman 14 (1995) ( Table 1). The sample was composed by 10 randomly selected subjects of each of the 6 CVM stages (Table 1), with a total of 60 subjects.
The lateral cephalometric x-rays were evaluated and subjects were classified according to skeletal maturation stages using the qualitative methods proposed by Baccetti et al. 12 in 2002 (method 1) and in 2005 13 (method 2). They were applied through direct visual inspection and manual tracing evaluation of C2, C3 and C4 vertebrae using 0.5mm mechanical pencil (Faber Castel®, Stein, Germany) on acetate sheet (Orthometric, Franklin, USA).
The four evaluations (methods 1 and 2, with direct visual inspection and manual tracing) were carried out between June and August 2015, by 5 orthodontists without previous experience with the methods. Manual tracing was carried out by the orthodontists at each evaluation stage. Immediately before the evaluations, the examiners were trained on applying the methods through an expository lesson given by a dental surgeon, specialized in Radiology and with methodological experience. All examiners were trained together.
In a dark room, the lateral cephalometric x-rays were located on a constant source of white light (negatoscope) and covered with a black sheet of paper of 300g/m 2 of Table 1. Distribution of the 171 subjects preselected accordingly to skeletal maturation stages through Fishman 19 and Hassel and Farman's 14 methods. grammage and with a rectangular clipping in the centre, which only allowed the view of the cervical vertebrae. The positioning of the lateral cephalometric radiographs was performed by a single researcher (P.H.R.D.). During the evaluations, templates of methods 1 and 2 were provided to the examiners, and access to the information of the participants was not permitted (age, gender nor dentition images).
The evaluation of the 60 lateral cephalometric radiographs by the 5 examiners was done at two different moments (T1 and T2), with a six-week interval, and the order of the lateral cephalometric radiographs was randomly modified and the training repeated by all examiners. Therefore, 1200 evaluations were carried out at each moment, being 600 at each different evaluation moment.

Statistical analysis
The degree of reliability of the methods herein evaluated by direct inspection and manual tracing was evaluated by weighted kappa test, determining intra and interexaminers coefficients. Reliability was considered moderate when the values of Kappa varied between 0.41 and 0.60; substantial, when they varied between 0.61 and 0.80 and excellent, when they were above 0.81 20 . Intraexaminers disagreements were evaluated in accordance to the amount of CVM stages presented between the evaluations at T1 and T2 for each examiner. The analyses were done using the Software SPSS Statistics 23 (IBM, Chicago, USA), significance level of 0.05. The sample power was determined using the statistical R pwr package.

Results
The power of the sample (n=60) for this study was 73.5% (1-β = 0.735), with a type β error of 0.265. A minimum effect of 0.30 and β/α = 1 were considered. Table 2 shows the intraexaminer reliability between T1 and T2 moments for both methods. Reliability values showed small variation when comparing methods 1 and 2, considering each of the evaluation forms (manual tracing and direct visual inspection). Considering each examiner, the method and the evaluation the findings are approximately 5% of excellent reliability, 75% of substantial and 20% of moderate reliability and all averages were substantial. Intraexaminers disagreements between T1 and T2 moments occurred, in its majority, due to a difference of one CVM stage in both evaluation forms of methods 1 and 2 (Table 3). Almost all cases (above 94%) occurred due to a variation of up to two CVM stages. Despite the fact that the manual tracing evaluation of method 1 presented the minor average of intraexaminer reliability (Table 2), it showed the highest disagreement occurrence (87.4%) for only 1 CVM stage ( Table 3).
The comparison between classifications of CVM by the five examiners resulted in substantial reliability, varying between 0.62 and 0.70, as seen in Table 4. The highest percentage of interexaminers agreement (67.50%) was observed in method 1 evaluation through direct visual inspection and the minor percentage (51.83%) was obtained at method 2 evaluation through manual tracing. The evaluation through direct visual inspection showed the highest values of interexaminer reliability and agreement for both methods (Table 4), as well as in the intraexaminers evaluation.  The evaluation through manual tracing showed an increase of the reliability and agreement degree from T1 to T2, while the evaluation through visual inspection showed a reduction of the degree for methods 1 and 2 ( Table 4).
The majority of the disagreements (above of 75%) between examiners occurred due to a difference of one CVM stage (Table 4). In all the classification forms, there was an increase of one stage in disagreement percentage from T1 to T2. However, apart from classification of method 2 through visual inspection, all situations showed a reduction of the concentration from T1 to T2 considering the degree of disagreement in one or two stages what demonstrates an increase of the occurrence of the most discrepant disagreements (three or four CVM stages).

Discussion
The application of the analysis of the cervical vertebrae maturation (CVM) as a method to determine the skeletal development stage must be conditioned to its accuracy and reproducibility, allowing identification of the period where the craniofacial bones respond more effectively to facial orthopaedic treatments 1,2 . However, low reliability of the CVM method in identifying the bone development stages 5 and methodological flaws in some researches 21 raise doubts concerning its clinical applicability.
In order to reduce the possibility of the sample to contain a discrepant number of subjects at specific CVM stages and whose identification might have been either easier or more difficult, in the present research we initially determined the skeletal maturity through Fishman method 19 and its correlation with CVM method 14 , and only then select the participants in a random and homogeneous way for each of the stages.
The segregation of the sample in CVM stages based on the hand-wrist radiograph determination of skeletal maturation was considered possible because high levels of correlation between the methods were reported 15,16,21,22 .
In the present research, all examiners made use of templates for consultation during evaluations 7 and they were presented to all lateral cephalometric radiographs at the same moment and immediately after training so as to prevent any possibility of examiners to be confused between methods or between stages.
The clinical and scientific validity of the method of determining CVM is directly related to its reproducibility among different examiners. As for the clinical application of the method, it is necessary for professionals to have a consensus in its determination. However, according to Cericato et al. 23 (2015) the majority of studies which evaluate these methodologies does not address interexaminers tests, therefore compromising the level of scientific evidence.
The values of inter (0.62-0.70) and intraexaminers (0.64-0.67) reliability obtained in the present research are inferior to other findings in literature, which report reliability coefficients between 0.85 and 0.98 16,21,22 . Nevertheless, most of these studies did not use rigorous statistical evaluations specific for association with ordinal data 7 .
Considering method 2 through direct visual inspection, Gabriel et al. 7  Moreover, the use of Kendall's W test to determine the reliability may have embodied some inaccuracies to the results, since this test is indicated for comparisons between up to 2 examiners 8 , so the weighted Cohen's kappa test is more indicated.
Although Gabriel et al. 7 (2009) have adopted a reduced interval between evaluations (2 weeks) and provided one template of the method to the examiners at the moment of the evaluation, which could increase the reproducibility of the method, its pointers of inter and intraexaminers agreement and reliability were lower than the ones in the present research. This can be associated with the lack of standardization of the intensity, clearness and contrast of the radiographic images and the luminosity of the environment for evaluating these images, as the images were available in digital format and not printed, as in this research, which may have influenced the examiners' perception and interpretation.
As well as in other studies 24,25 , in the present research the reliability pointer was determined through the weighted kappa coefficient for intra and interexaminers evaluation which takes into account not only the percentage of agreement between the evaluations but also the degree of inconsistency among disagreements 7 , widely characterizing the reproducibility degree of the method. This explains the difference between agreement percentages and reliability values (coefficient of Kappa) obtained for each method 12,13 and evaluation (direct visual inspection and manual tracing) used in the present research.
Another factor that may have contributed for the divergence of results obtained by the present research and Gabriel et al. 7 (2009) was the homogeneous distribution of the lateral cephalometric radiographs during the different CVM stages. When cephalometric radiographs of different CVM stages do not have equal chances of being selected for the sample or they are not homogeneously enclosed, a selection bias can be incorporated when the occurrence of a higher number of CVM stages with easier or more difficult identification is allowed.
Other studies described higher values of agreement among examiners using method 2. However, they used questionable means to evaluate the classification reproducibility. Wiwatworakul et al. 25 (2015) reported an average percentage of 96.6% of interex-aminers agreement although they used only two examiners, facilitating the equality of classification among them. Perinetti et al. 24 (2014) found interexaminers reliability between 0.81 and 0.82, although previously to CVM classification. Their examiners had been trained in the method until reaching 75% of correct identification of cases, which calibrated them in advance.
The results show that the analysis through direct visual inspection, even though presenting higher values of agreement and reliability, had greater prevalence of disagreement in two or more CVM stages in relation to manual tracing, possibly due to the fact that once the tracing is finished, the definition of the form of the vertebra is facilitated because its tracing is based on a defined contour and new mental delimitations of vertebrae limits based on radiographic image are no longer necessary.
In accordance with the present research findings, method 1 performed better in terms of interexaminer agreement and reliability in direct visual inspection evaluation. This may have occurred due to the additional stage of method 2, which was identified by the presence of the concavity in the inferior edge of C2. This characteristic generated a great doubt during classification according to the examiners of the present research. On the contrary, Sohrabi et al. 8 (2016) and Nestman et al. 26 (2011) reported higher values of reproducibility when determining the concavity of the inferior edge of C2, C3 and C4 vertebrae than their general form. However, despite the evaluated characteristic of the cervical vertebrae, the fact is that a lower number of stages adopted by method 1 in comparison with method 2 reduces the possibility of disagreement among evaluations.
We acknowledge a limitation of clinical application of the results in the present study regarding the training received by Orthodontists prior to CVM definition. The training was needed for the results not to be affected by the levels of knowledge of the examiners, although we understand the difficulties regarding the access to specific training of CVM by Orthodontists.
In conclusion, the methods of determining CVM published by Baccetti, Franchi and McNamara in 2002 and 2005 presented substantial reproducibility both for direct visual inspection and for manual tracing of the cervical vertebrae. The analysis through direct visual inspection presented higher values of reliability and agreement when compared with the manual tracing.