Endocrine Feedback Loop – Episode 49

Artificial Intelligence in the Diagnosis of Cushing Syndrome
Date: May 16, 2024
Host: Dr. Chase Hendrickson (Vanderbilt University Medical Center)
Guests:

Dr. Katie Gutenberg (University of Texas at Houston; pituitary expert)
Dr. Odelia Cooper (Cedars Sinai, Los Angeles; pituitary disease researcher)

Episode Overview

This episode examines a forthcoming article in the Journal of Clinical Endocrinology and Metabolism exploring whether a machine learning (ML) algorithm can aid or even replace the gold standard, bilateral inferior petrosal sinus sampling (BIPSS), in differentiating subtypes of ACTH-dependent Cushing syndrome. With BIPSS being invasive, costly, and not widely available, the possibility of an AI-driven, less invasive diagnostic alternative sparks significant interest but also skepticism about accuracy, generalizability, and clinical applicability.

Key Discussion Points

1. Background on ACTH-Dependent Cushing Syndrome and Current Diagnostics

Etiologies: Primarily pituitary ACTH-producing tumors (Cushing’s disease) and less commonly, ectopic ACTH-producing tumors.
Traditional Diagnostic Pathway: Dynamic endocrine testing plus imaging, with BIPSS remaining the gold standard, especially when MRI is negative for adenoma or shows microincidentalomas.
Limitations of BIPSS:
- Invasive and requires expert centers
- Complications are rare but serious
- Sensitivity and specificity not 100%, plus technical and anatomical pitfalls

"IPSS is an invasive test... Pituitary MRIs in Cushing’s disease are often negative... Therefore, we have to rely on invasive approaches."
— Dr. Odelia Cooper (03:52)

Complications and Limitations of BIPSS

Groin hematomas & vasovagal reactions (<5%)
Rare but serious: pulmonary embolism, cranial nerve palsies, subarachnoid hemorrhage
False negatives: anatomical venous variants, improper catheterization, non-active disease
False positives: rare ectopic tumors with vasopressin receptors
Newer strategies: Prolactin normalization to improve accuracy

2. Introduction to Machine Learning in this Context

ML as a Diagnostic Tool: Potential for AI to analyze complex test results and patterns beyond rigid diagnostic algorithms; goal to develop an ML model for differentiating ACTH-dependent Cushing’s subtypes.
Study Design:
- Multicenter, retrospective (2016-2022; three Istanbul centers)
- Included patients with BIPSS and a minimum 1-year follow-up
- Excluded those lacking pathologic confirmation or with missing data

"These authors were trying to understand if a machine learning algorithm would perform reasonably well in the differential diagnosis of Cushing's syndrome."
— Dr. Chase Hendrickson (08:07)

Considerations about Study Methodology

Cross-sectional diagnostic accuracy design
Importance of “gold standard” being assumed 100% accurate—rarely true in reality

3. Study Method Details & Diagnostic Pathway

Initial Tests: Urine free cortisol (UFC), late night salivary cortisol, 1mg dexamethasone suppression. Positive if ≥ 2/3 positive, then:
2-Day, 2mg Dexamethasone Suppression Test: Used as confirmatory test (not common in the US; raises compliance and confounding concerns).
ACTH Measurement: <10 ng/dL = ACTH-independent; >20 = ACTH-dependent; intermediate triggers further stimulation tests
Imaging:
- Adenoma >6mm → surgery
- <6mm or non-visible → BIPSS
BIPSS Technique: Used CRH stimulation, multiple sampling times
Diagnostic Criteria for Subtypes:
- Cushing's: central:peripheral gradient >3
- Ectopic: no BIPSS gradient + no adenoma + no suppression on high-dose dexamethasone + no CRH response

"I do question really why the authors did proceed with this two day test if they already had two out of three tests that were positive... The two day low dose dex suppression test doesn't add any additional information."
— Dr. Odelia Cooper (13:18)

MRI Technology Discussion

1.5T MRI sensitivities poor for microadenomas
Advances: Dynamic contrast, 3T/7T magnets, SPGR, better T1/T2 sequences
Up to 80% sensitivity for small lesions now possible

4. Machine Learning Algorithm: Feature Selection and Application

Variables Considered: Age, sex, biochemical (ACTH, cortisol, UFC, salivary cortisol, dex suppression results, potassium), radiologic (adenoma presence, size, MRI intensity), “dummy” labs (glucose, cholesterol)
Ultimately Used:
- ACTH
- Results of all formal suppression and stimulation tests
- Potassium
- Adenoma diameter
Excluded as Unimportant: Age, sex, simple cortisol level, adenoma presence, T2 intensity, dummy variables

"I was also surprised that some of these key parameters, age and sex and cortisol, were removed..."
— Dr. Odelia Cooper (26:30)

Concerns About Feature Selection

Prior machine learning studies found age, sex, and cortisol levels significant
This cohort’s ectopic group was 88.5% female, unlike most clinical populations

5. Results

Patient Cohort

131 patients underwent IPSS; 106 included in analysis (75% Cushing’s disease, 25% ectopic).

Diagnostic Performance

BIPSS
- Sensitivity: 91%
- Specificity: 72%
- Accuracy: 85%
- Improved at higher gradient cutoff (>6.9), but not consistent with prior reports (often nearer 97–100% sensitivity/specificity)

"I was a little surprised at their accuracy of being only 85%. That’s fairly low."
— Dr. Odelia Cooper (30:00)

Peak ACTH During BIPSS
- 215: Sensitivity 88%, Specificity 90% (useful for judging successful cannulation)

Machine Learning Algorithm

Best Model: Logistic regression
- Accuracy: 86%
- AUC: 0.85
Top Predictive Features (via SHAP analysis):
1. 2-day, 2mg dexamethasone suppression test (higher values in ectopic)
2. Suppression following high-dose dexamethasone suppression test
3. Late night salivary cortisol
4. Adenoma diameter (favoring Cushing’s disease)

"Both the 2-day, 2mg dexamethasone test and high-dose dex test were the top important features... but there are just a number of limitations to relying on this test."
— Dr. Odelia Cooper (33:47)

6. Critical Appraisal & Limitations

Concerns Raised by Hosts & Experts

Heavy reliance on dexamethasone suppression testing not standard in the US
Dexamethasone levels not measured—risk for false positives/negatives
Unclear how factors like high CBG, medications, or OCP use (common in females) were handled
BIPSS performance lower than typical, and lack of prolactin correction
Ectopic cohort unusually female-predominant
Definition of “gold standard” brings uncertainty; lack of universal pan-imaging for ectopic diagnosis
Machine learning model may not be generalizable—depends on input variables many centers do not routinely use
Model not validated outside a single geographic, highly specific cohort

"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data..."
— Dr. Odelia Cooper (42:28)

Quality of Study

Commendable: Comprehensive testing, attempt to broaden access to alternatives to BIPSS
Problematic: Highly specific local practice, non-routine tests, uncertain generalizability
Echoes need for large-scale, prospective, and diverse cohort validation

7. Practical Implications and Next Steps

Current Applicability:
- Not ready for broad clinical adoption, especially in centers lacking routine use of tested features
- Unlikely US/endocrinologists would have all necessary testing results, especially high-dose dexamethasone suppression
Potential Use:
- Could be a supplementary tool or research aid where BIPSS is unavailable
Essential Future Work:
- Larger, multi-center validation studies
- Test algorithm in varied populations, especially outside Turkey
- Ensure input variables align with common real-world clinical practice
Authors’ Conclusion: Machine learning can support diagnosis where BIPSS isn't available—but evidence insufficient to replace current standard

Notable Quotes & Memorable Moments

On the challenge of AI-fit diagnostic medicine:

"We all have questions about how artificial intelligence will affect the care we provide... we will need to be particularly critical about how AI might interface with a diagnostic approach that has been developed over many decades."
— Dr. Chase Hendrickson (00:56)
On BIPSS limitations:

"[False negatives could result if] patients are not actively hypercortisolemic at the time of IPSS, such as in cyclical Cushing..."
— Dr. Odelia Cooper (05:50)
On generalizability:

"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data and the performance of the learning algorithms that have to be suitable for the target population."
— Dr. Odelia Cooper (42:28)
On current clinical utility:

"I went online and tried to do it on some of my patients and realized, I never do the high dose dex [suppression] test... I don't think we're ready yet for prime time."
— Dr. Odelia Cooper (42:28)

Key Timestamps

00:56 – Introduction & overview of episode's focus
03:01 – Background on ACTH-dependent Cushing and BIPSS limitations
07:39 – Intro to machine learning objectives for this paper
08:07 – Study design, cohort definition, discussion of gold standard
13:18 – Critique of two-day dexamethasone suppression test utility
17:52 – MRI advances in detecting pituitary microadenomas
20:28 – Gold standard definition: strengths and weaknesses
26:30 – Machine learning feature selection, variables debated
28:44 – Results: accuracy and limitations of IPSS in the study
32:08 – Machine learning models & SHAP feature importance
39:08 – Host and guest critique of study quality and limitations
42:28 – Discussion of real-world applicability and reservations
44:56 – Outro; episode wrap-up

Final Takeaways

AI/machine learning offers intriguing potential for supporting diagnosis when gold-standard testing is inaccessible, but this episode highlights persistent limitations in methodology, data quality, and generalizability.
Current practice should not change on the basis of this study; more robust, multi-center, and diverse evaluations are needed.
Clinical context and expert interpretation remain essential, especially when rare diseases, nuanced diagnostics, and variable test performance intersect.

Endocrine Feedback Loop – Episode 49

Artificial Intelligence in the Diagnosis of Cushing Syndrome
Date: May 16, 2024
Host: Dr. Chase Hendrickson (Vanderbilt University Medical Center)
Guests:

Dr. Katie Gutenberg (University of Texas at Houston; pituitary expert)
Dr. Odelia Cooper (Cedars Sinai, Los Angeles; pituitary disease researcher)

Episode Overview

Key Discussion Points

1. Background on ACTH-Dependent Cushing Syndrome and Current Diagnostics

Etiologies: Primarily pituitary ACTH-producing tumors (Cushing’s disease) and less commonly, ectopic ACTH-producing tumors.
Traditional Diagnostic Pathway: Dynamic endocrine testing plus imaging, with BIPSS remaining the gold standard, especially when MRI is negative for adenoma or shows microincidentalomas.
Limitations of BIPSS:
- Invasive and requires expert centers
- Complications are rare but serious
- Sensitivity and specificity not 100%, plus technical and anatomical pitfalls

"IPSS is an invasive test... Pituitary MRIs in Cushing’s disease are often negative... Therefore, we have to rely on invasive approaches."
— Dr. Odelia Cooper (03:52)

Complications and Limitations of BIPSS

Groin hematomas & vasovagal reactions (<5%)
Rare but serious: pulmonary embolism, cranial nerve palsies, subarachnoid hemorrhage
False negatives: anatomical venous variants, improper catheterization, non-active disease
False positives: rare ectopic tumors with vasopressin receptors
Newer strategies: Prolactin normalization to improve accuracy

2. Introduction to Machine Learning in this Context

ML as a Diagnostic Tool: Potential for AI to analyze complex test results and patterns beyond rigid diagnostic algorithms; goal to develop an ML model for differentiating ACTH-dependent Cushing’s subtypes.
Study Design:
- Multicenter, retrospective (2016-2022; three Istanbul centers)
- Included patients with BIPSS and a minimum 1-year follow-up
- Excluded those lacking pathologic confirmation or with missing data

"These authors were trying to understand if a machine learning algorithm would perform reasonably well in the differential diagnosis of Cushing's syndrome."
— Dr. Chase Hendrickson (08:07)

Considerations about Study Methodology

Cross-sectional diagnostic accuracy design
Importance of “gold standard” being assumed 100% accurate—rarely true in reality

3. Study Method Details & Diagnostic Pathway

Initial Tests: Urine free cortisol (UFC), late night salivary cortisol, 1mg dexamethasone suppression. Positive if ≥ 2/3 positive, then:
2-Day, 2mg Dexamethasone Suppression Test: Used as confirmatory test (not common in the US; raises compliance and confounding concerns).
ACTH Measurement: <10 ng/dL = ACTH-independent; >20 = ACTH-dependent; intermediate triggers further stimulation tests
Imaging:
- Adenoma >6mm → surgery
- <6mm or non-visible → BIPSS
BIPSS Technique: Used CRH stimulation, multiple sampling times
Diagnostic Criteria for Subtypes:
- Cushing's: central:peripheral gradient >3
- Ectopic: no BIPSS gradient + no adenoma + no suppression on high-dose dexamethasone + no CRH response

"I do question really why the authors did proceed with this two day test if they already had two out of three tests that were positive... The two day low dose dex suppression test doesn't add any additional information."
— Dr. Odelia Cooper (13:18)

MRI Technology Discussion

1.5T MRI sensitivities poor for microadenomas
Advances: Dynamic contrast, 3T/7T magnets, SPGR, better T1/T2 sequences
Up to 80% sensitivity for small lesions now possible

4. Machine Learning Algorithm: Feature Selection and Application

Variables Considered: Age, sex, biochemical (ACTH, cortisol, UFC, salivary cortisol, dex suppression results, potassium), radiologic (adenoma presence, size, MRI intensity), “dummy” labs (glucose, cholesterol)
Ultimately Used:
- ACTH
- Results of all formal suppression and stimulation tests
- Potassium
- Adenoma diameter
Excluded as Unimportant: Age, sex, simple cortisol level, adenoma presence, T2 intensity, dummy variables

"I was also surprised that some of these key parameters, age and sex and cortisol, were removed..."
— Dr. Odelia Cooper (26:30)

Concerns About Feature Selection

Prior machine learning studies found age, sex, and cortisol levels significant
This cohort’s ectopic group was 88.5% female, unlike most clinical populations

5. Results

Patient Cohort

131 patients underwent IPSS; 106 included in analysis (75% Cushing’s disease, 25% ectopic).

Diagnostic Performance

BIPSS
- Sensitivity: 91%
- Specificity: 72%
- Accuracy: 85%
- Improved at higher gradient cutoff (>6.9), but not consistent with prior reports (often nearer 97–100% sensitivity/specificity)

"I was a little surprised at their accuracy of being only 85%. That’s fairly low."
— Dr. Odelia Cooper (30:00)

Peak ACTH During BIPSS
- 215: Sensitivity 88%, Specificity 90% (useful for judging successful cannulation)

Machine Learning Algorithm

Best Model: Logistic regression
- Accuracy: 86%
- AUC: 0.85
Top Predictive Features (via SHAP analysis):
1. 2-day, 2mg dexamethasone suppression test (higher values in ectopic)
2. Suppression following high-dose dexamethasone suppression test
3. Late night salivary cortisol
4. Adenoma diameter (favoring Cushing’s disease)

"Both the 2-day, 2mg dexamethasone test and high-dose dex test were the top important features... but there are just a number of limitations to relying on this test."
— Dr. Odelia Cooper (33:47)

6. Critical Appraisal & Limitations

Concerns Raised by Hosts & Experts

Heavy reliance on dexamethasone suppression testing not standard in the US
Dexamethasone levels not measured—risk for false positives/negatives
Unclear how factors like high CBG, medications, or OCP use (common in females) were handled
BIPSS performance lower than typical, and lack of prolactin correction
Ectopic cohort unusually female-predominant
Definition of “gold standard” brings uncertainty; lack of universal pan-imaging for ectopic diagnosis
Machine learning model may not be generalizable—depends on input variables many centers do not routinely use
Model not validated outside a single geographic, highly specific cohort

"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data..."
— Dr. Odelia Cooper (42:28)

Quality of Study

Commendable: Comprehensive testing, attempt to broaden access to alternatives to BIPSS
Problematic: Highly specific local practice, non-routine tests, uncertain generalizability
Echoes need for large-scale, prospective, and diverse cohort validation

7. Practical Implications and Next Steps

Current Applicability:
- Not ready for broad clinical adoption, especially in centers lacking routine use of tested features
- Unlikely US/endocrinologists would have all necessary testing results, especially high-dose dexamethasone suppression
Potential Use:
- Could be a supplementary tool or research aid where BIPSS is unavailable
Essential Future Work:
- Larger, multi-center validation studies
- Test algorithm in varied populations, especially outside Turkey
- Ensure input variables align with common real-world clinical practice
Authors’ Conclusion: Machine learning can support diagnosis where BIPSS isn't available—but evidence insufficient to replace current standard

Notable Quotes & Memorable Moments

On the challenge of AI-fit diagnostic medicine:

"We all have questions about how artificial intelligence will affect the care we provide... we will need to be particularly critical about how AI might interface with a diagnostic approach that has been developed over many decades."
— Dr. Chase Hendrickson (00:56)
On BIPSS limitations:

"[False negatives could result if] patients are not actively hypercortisolemic at the time of IPSS, such as in cyclical Cushing..."
— Dr. Odelia Cooper (05:50)
On generalizability:

"One of the issues with machine learning models is that it's very dependent on the nature and characteristics of the data and the performance of the learning algorithms that have to be suitable for the target population."
— Dr. Odelia Cooper (42:28)
On current clinical utility:

"I went online and tried to do it on some of my patients and realized, I never do the high dose dex [suppression] test... I don't think we're ready yet for prime time."
— Dr. Odelia Cooper (42:28)

Key Timestamps

00:56 – Introduction & overview of episode's focus
03:01 – Background on ACTH-dependent Cushing and BIPSS limitations
07:39 – Intro to machine learning objectives for this paper
08:07 – Study design, cohort definition, discussion of gold standard
13:18 – Critique of two-day dexamethasone suppression test utility
17:52 – MRI advances in detecting pituitary microadenomas
20:28 – Gold standard definition: strengths and weaknesses
26:30 – Machine learning feature selection, variables debated
28:44 – Results: accuracy and limitations of IPSS in the study
32:08 – Machine learning models & SHAP feature importance
39:08 – Host and guest critique of study quality and limitations
42:28 – Discussion of real-world applicability and reservations
44:56 – Outro; episode wrap-up

Final Takeaways

AI/machine learning offers intriguing potential for supporting diagnosis when gold-standard testing is inaccessible, but this episode highlights persistent limitations in methodology, data quality, and generalizability.
Current practice should not change on the basis of this study; more robust, multi-center, and diverse evaluations are needed.
Clinical context and expert interpretation remain essential, especially when rare diseases, nuanced diagnostics, and variable test performance intersect.

EFL049 - Artificial Intelligence in the Diagnosis of Cushing Syndrome

Summary

Endocrine Feedback Loop – Episode 49

Episode Overview

Key Discussion Points

1. Background on ACTH-Dependent Cushing Syndrome and Current Diagnostics

Complications and Limitations of BIPSS

2. Introduction to Machine Learning in this Context

Considerations about Study Methodology

3. Study Method Details & Diagnostic Pathway

MRI Technology Discussion

4. Machine Learning Algorithm: Feature Selection and Application

Concerns About Feature Selection

5. Results

Patient Cohort

Diagnostic Performance

Machine Learning Algorithm

6. Critical Appraisal & Limitations

Concerns Raised by Hosts & Experts

Quality of Study

7. Practical Implications and Next Steps

Notable Quotes & Memorable Moments

Key Timestamps

Final Takeaways

Transcript

Summary

Endocrine Feedback Loop – Episode 49

Episode Overview

Key Discussion Points

1. Background on ACTH-Dependent Cushing Syndrome and Current Diagnostics

Complications and Limitations of BIPSS

2. Introduction to Machine Learning in this Context

Considerations about Study Methodology

3. Study Method Details & Diagnostic Pathway

MRI Technology Discussion

4. Machine Learning Algorithm: Feature Selection and Application

Concerns About Feature Selection

5. Results

Patient Cohort

Diagnostic Performance

Machine Learning Algorithm

6. Critical Appraisal & Limitations

Concerns Raised by Hosts & Experts

Quality of Study

7. Practical Implications and Next Steps

Notable Quotes & Memorable Moments

Key Timestamps

Final Takeaways