Endocrine Feedback Loop – Episode 49
Artificial Intelligence in the Diagnosis of Cushing Syndrome
Date: May 16, 2024
Host: Dr. Chase Hendrickson (Vanderbilt University Medical Center)
Guests:
- Dr. Katie Gutenberg (University of Texas at Houston; pituitary expert)
- Dr. Odelia Cooper (Cedars Sinai, Los Angeles; pituitary disease researcher)
Episode Overview
This episode examines a forthcoming article in the Journal of Clinical Endocrinology and Metabolism exploring whether a machine learning (ML) algorithm can aid or even replace the gold standard, bilateral inferior petrosal sinus sampling (BIPSS), in differentiating subtypes of ACTH-dependent Cushing syndrome. With BIPSS being invasive, costly, and not widely available, the possibility of an AI-driven, less invasive diagnostic alternative sparks significant interest but also skepticism about accuracy, generalizability, and clinical applicability.
Key Discussion Points
1. Background on ACTH-Dependent Cushing Syndrome and Current Diagnostics
- Etiologies: Primarily pituitary ACTH-producing tumors (Cushing’s disease) and less commonly, ectopic ACTH-producing tumors.
- Traditional Diagnostic Pathway: Dynamic endocrine testing plus imaging, with BIPSS remaining the gold standard, especially when MRI is negative for adenoma or shows microincidentalomas.
- Limitations of BIPSS:
- Invasive and requires expert centers
- Complications are rare but serious
- Sensitivity and specificity not 100%, plus technical and anatomical pitfalls
"IPSS is an invasive test... Pituitary MRIs in Cushing’s disease are often negative... Therefore, we have to rely on invasive approaches."
— Dr. Odelia Cooper (03:52)
Complications and Limitations of BIPSS
- Groin hematomas & vasovagal reactions (<5%)
- Rare but serious: pulmonary embolism, cranial nerve palsies, subarachnoid hemorrhage
- False negatives: anatomical venous variants, improper catheterization, non-active disease
- False positives: rare ectopic tumors with vasopressin receptors
- Newer strategies: Prolactin normalization to improve accuracy
2. Introduction to Machine Learning in this Context
- ML as a Diagnostic Tool: Potential for AI to analyze complex test results and patterns beyond rigid diagnostic algorithms; goal to develop an ML model for differentiating ACTH-dependent Cushing’s subtypes.
- Study Design:
- Multicenter, retrospective (2016-2022; three Istanbul centers)
- Included patients with BIPSS and a minimum 1-year follow-up
- Excluded those lacking pathologic confirmation or with missing data
"These authors were trying to understand if a machine learning algorithm would perform reasonably well in the differential diagnosis of Cushing's syndrome."
— Dr. Chase Hendrickson (08:07)
Considerations about Study Methodology
- Cross-sectional diagnostic accuracy design
- Importance of “gold standard” being assumed 100% accurate—rarely true in reality
3. Study Method Details & Diagnostic Pathway
- Initial Tests: Urine free cortisol (UFC), late night salivary cortisol, 1mg dexamethasone suppression. Positive if ≥ 2/3 positive, then:
- 2-Day, 2mg Dexamethasone Suppression Test: Used as confirmatory test (not common in the US; raises compliance and confounding concerns).
- ACTH Measurement: <10 ng/dL = ACTH-independent; >20 = ACTH-dependent; intermediate triggers further stimulation tests
- Imaging:
- Adenoma >6mm → surgery
- <6mm or non-visible → BIPSS
- BIPSS Technique: Used CRH stimulation, multiple sampling times
- Diagnostic Criteria for Subtypes:
- Cushing's: central:peripheral gradient >3
- Ectopic: no BIPSS gradient + no adenoma + no suppression on high-dose dexamethasone + no CRH response
"I do question really why the authors did proceed with this two day test if they already had two out of three tests that were positive... The two day low dose dex suppression test doesn't add any additional information."
— Dr. Odelia Cooper (13:18)
MRI Technology Discussion
- 1.5T MRI sensitivities poor for microadenomas
- Advances: Dynamic contrast, 3T/7T magnets, SPGR, better T1/T2 sequences
- Up to 80% sensitivity for small lesions now possible
4. Machine Learning Algorithm: Feature Selection and Application
- Variables Considered: Age, sex, biochemical (ACTH, cortisol, UFC, salivary cortisol, dex suppression results, potassium), radiologic (adenoma presence, size, MRI intensity), “dummy” labs (glucose, cholesterol)
- Ultimately Used:
- ACTH
- Results of all formal suppression and stimulation tests
- Potassium
- Adenoma diameter
- Excluded as Unimportant: Age, sex, simple cortisol level, adenoma presence, T2 intensity, dummy variables
"I was also surprised that some of these key parameters, age and sex and cortisol, were removed..."
— Dr. Odelia Cooper (26:30)
Concerns About Feature Selection
- Prior machine learning studies found age, sex, and cortisol levels significant
- This cohort’s ectopic group was 88.5% female, unlike most clinical populations
5. Results
Patient Cohort
- 131 patients underwent IPSS; 106 included in analysis (75% Cushing’s disease, 25% ectopic).
Diagnostic Performance
- BIPSS
- Sensitivity: 91%
- Specificity: 72%
- Accuracy: 85%
- Improved at higher gradient cutoff (>6.9), but not consistent with prior reports (often nearer 97–100% sensitivity/specificity)
"I was a little surprised at their accuracy of being only 85%. That’s fairly low."
— Dr. Odelia Cooper (30:00)
- Peak ACTH During BIPSS
-
215: Sensitivity 88%, Specificity 90% (useful for judging successful cannulation)
-
Machine Learning Algorithm
- Best Model: Logistic regression
- Accuracy: 86%
- AUC: 0.85
- Top Predictive Features (via SHAP analysis):
- 2-day, 2mg dexamethasone suppression test (higher values in ectopic)
- Suppression following high-dose dexamethasone suppression test
- Late night salivary cortisol
- Adenoma diameter (favoring Cushing’s disease)
"Both the 2-day, 2mg dexamethasone test and high-dose dex test were the top important features... but there are just a number of limitations to relying on this test."
— Dr. Odelia Cooper (33:47)
6. Critical Appraisal & Limitations
Concerns Raised by Hosts & Experts
- Heavy reliance on dexamethasone suppression testing not standard in the US
- Dexamethasone levels not measured—risk for false positives/negatives
- Unclear how factors like high CBG, medications, or OCP use (common in females) were handled
- BIPSS performance lower than typical, and lack of prolactin correction
- Ectopic cohort unusually female-predominant
- Definition of “gold standard” brings uncertainty; lack of universal pan-imaging for ectopic diagnosis
- Machine learning model may not be generalizable—depends on input variables many centers do not routinely use
- Model not validated outside a single geographic, highly specific cohort
"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data..."
— Dr. Odelia Cooper (42:28)
Quality of Study
- Commendable: Comprehensive testing, attempt to broaden access to alternatives to BIPSS
- Problematic: Highly specific local practice, non-routine tests, uncertain generalizability
- Echoes need for large-scale, prospective, and diverse cohort validation
7. Practical Implications and Next Steps
- Current Applicability:
- Not ready for broad clinical adoption, especially in centers lacking routine use of tested features
- Unlikely US/endocrinologists would have all necessary testing results, especially high-dose dexamethasone suppression
- Potential Use:
- Could be a supplementary tool or research aid where BIPSS is unavailable
- Essential Future Work:
- Larger, multi-center validation studies
- Test algorithm in varied populations, especially outside Turkey
- Ensure input variables align with common real-world clinical practice
- Authors’ Conclusion: Machine learning can support diagnosis where BIPSS isn't available—but evidence insufficient to replace current standard
Notable Quotes & Memorable Moments
-
On the challenge of AI-fit diagnostic medicine:
"We all have questions about how artificial intelligence will affect the care we provide... we will need to be particularly critical about how AI might interface with a diagnostic approach that has been developed over many decades."
— Dr. Chase Hendrickson (00:56) -
On BIPSS limitations:
"[False negatives could result if] patients are not actively hypercortisolemic at the time of IPSS, such as in cyclical Cushing..."
— Dr. Odelia Cooper (05:50) -
On generalizability:
"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data and the performance of the learning algorithms that have to be suitable for the target population."
— Dr. Odelia Cooper (42:28) -
On current clinical utility:
"I went online and tried to do it on some of my patients and realized, I never do the high dose dex [suppression] test... I don't think we're ready yet for prime time."
— Dr. Odelia Cooper (42:28)
Key Timestamps
- 00:56 – Introduction & overview of episode's focus
- 03:01 – Background on ACTH-dependent Cushing and BIPSS limitations
- 07:39 – Intro to machine learning objectives for this paper
- 08:07 – Study design, cohort definition, discussion of gold standard
- 13:18 – Critique of two-day dexamethasone suppression test utility
- 17:52 – MRI advances in detecting pituitary microadenomas
- 20:28 – Gold standard definition: strengths and weaknesses
- 26:30 – Machine learning feature selection, variables debated
- 28:44 – Results: accuracy and limitations of IPSS in the study
- 32:08 – Machine learning models & SHAP feature importance
- 39:08 – Host and guest critique of study quality and limitations
- 42:28 – Discussion of real-world applicability and reservations
- 44:56 – Outro; episode wrap-up
Final Takeaways
- AI/machine learning offers intriguing potential for supporting diagnosis when gold-standard testing is inaccessible, but this episode highlights persistent limitations in methodology, data quality, and generalizability.
- Current practice should not change on the basis of this study; more robust, multi-center, and diverse evaluations are needed.
- Clinical context and expert interpretation remain essential, especially when rare diseases, nuanced diagnostics, and variable test performance intersect.
