An IRB approved retrospective review of all patients with a diagnosis of HFM and a preop 3DCT was performed. A gold-standard score based on consensus between surgeons at our institution stratified the population into Mild(0-1), Moderate(2a), and Severe(2b-3). Clinical KPC score was used as a second comparison. 3DCT scans were evaluated by surgeons and rated according to the KPC. Percent agreement was compared between these standards and the scores of our raters. ANOVA was used for statistical significance.
16 craniofacial surgeons with 248yrs(avg 15.5yrs) of experience from 11 institutions were surveyed. 41 patients met criteria including 38 patients with documented clinical scores. When comparing the raters’ 3DCT-based classification to the clinical KPC scores, the average agreement was 38.1%(range: 27.3% for Type 2 to 57.9% for Type 3). There was improved rater identification of Type 3 mandibles(p<.001), however, as a group all raters were equally unable to accurately identify mandibular severity as compared to clinical assessment(p=.90). When comparing the raters’ 3DCT-based classification to our gold standard, the average total agreement was 67.8%(range: 40.0% for Moderate to 84.6% for Severe) with improved identification of Mild and Severe mandibles(p<.001). As a group all raters were equally unable to accurately identify mandibular severity as compared to the gold standard(p= .97).
The introduction of 3DCT into the diagnostic paradigm highlights the inaccuracy and variability of traditional classification systems. Our results question the accuracy and reproducibility of the current clinical paradigm suggesting the need to reexamine the classification of HFM.