Summary6 min read

Practical AI in Healthcare

S1, E39 — Sarah Rossetti, RN, PhD: Nursing Informatics & the CONCERN Early Warning System

Date: May 31, 2026
Guests:

Hosts: Steven Labkoff, MD & Leon Rozenblit, JD, PhD
Guest: Sarah Rossetti, RN, PhD (Columbia University)

Episode Overview

This episode spotlights nursing informatics and the innovative CONCERN Early Warning System—an AI-powered tool leveraging nursing documentation to predict patient deterioration notably earlier than traditional systems. Dr. Sarah Rossetti details the rationale, design, deployment, and impact of CONCERN, underlining the profound value of nursing observation, the untapped predictive potential in EHR documentation traces, and steps toward broader implementation.

Key Discussion Points & Insights

Sarah Rossetti’s Journey to Nursing Informatics

Background: Started as a critical care nurse at Mass General, experiencing both paper and early digital record systems (03:08).
Observation: “What I observed is that... for the patients that these more senior experienced nurses clearly were not worried about... there was a lot of white space on the paper. It looked very different for patients that... this nurse had some concerns... That was a densely packed flow sheet.” (10:55, D)
Transition to Informatics: Clinical questions from hands-on care inspired PhD research and subsequent postdoctoral work in biomedical informatics at Columbia University (05:49).

The Overlooked Power of Nursing Documentation

Nursing Surveillance: Nurses continuously assess, intervene, and document nuanced changes, but much expert judgment remains uncaptured by traditional EHR fields (06:57).
Example: “This point of 98% is not telling you anything. This patient is not doing well. They are decompensating even though their value looks normal.” (08:16, D)
Invisible Insights: The “gestalt” of experienced nurses is rarely harnessed by standard analytics (09:46–10:55).

The CONCERN Early Warning System

Concept: CONCERN (Communicating Narrative Concerns entered by Registered nurses) builds predictive models using patterns in the frequency, timing, and density of nursing documentation.

“So at its core, we’ve built a machine learning-based model that uses the metadata patterns within nursing documentation to predict patients at risk of deterioration.” (11:52, D)
Approach: Rather than using patient physiology data alone, CONCERN models the nurse as a “super sensitive sensor” (20:35, C), viewing the EHR documentation trace as a direct signal of clinical concern.

Day in the Life of a Nurse & Documentation Trace

Workflows: Shift handoff, patient assessment, regular and PRN vital sign checks, and medication administration—all reflected in structured and unstructured EHR records (15:02, D).
Signal in Documentation: Frequent, off-schedule documentation or comments (“Held medication due to low BP”) often signal rising nurse concern before objective vital sign deviation (20:09, D).

Model Development & Methodological Insights

Model Features:
- Emphasizes documentation frequency/timing (not just values)
- Incorporates temporal/contextual features (hospital day, weekday/weekend, shift, seasonality)
- Ensemble of ~1200 models to match context-specific patterns
“The signal is in the frequency of the documentation... The value is not what is driving the prediction.” (28:03, D)
User-Centered Design:
- Provides risk levels (red/yellow/green) and exposes model features for transparency (26:09–26:53, D)
- No workflow interruption—risk scores are written non-intrusively into EHR flowsheets (41:00–42:06, D)

Multi-site Implementation & Trial Outcomes

Scale: Deployed across 74 units in four major hospitals (Columbia & Boston area) covering ~60,000 patients (31:39–31:48, D).
Outcomes:
- 35.6% reduction in instantaneous mortality risk
- 7.5% reduction in sepsis risk
- Over half a day shorter length of stay
- Increase in early ICU transfers, leading to improved patient outcomes
  
  “The early—the patients transferred early—they had much better outcomes. Highly statistically significant... Those late transfers did not do as well.” (33:14, D)

Addressing Critiques: Alarm Fatigue, Generalizability, Equity

Alarm Fatigue: No pop-ups or interruptive alerts; integration with existing EHR views, no additional documentation burden (39:59–41:01, D).
Generalizability: Replicated performance in multiple sites & datasets; ongoing pediatric and broader site expansions (43:58, D).
Equity Considerations: Proactive bias checks on model outputs—by race and primary language—leading to mitigations in model deployment (42:06–43:15, D).

Future of Nursing Data & Ambient AI

AI’s Role: Ambient and computer-vision systems could offer even more granular, closer-to-behavior signals in the future (46:25–48:13, D).
Documentation Burden: COVID-19 natural experiment shows nurses maintain essential documentation for care, even when requirements are relaxed (46:25–47:21, D).

Memorable Quotes & Moments

On capturing nurse intuition:

“It really is expert assessment by the nurse. You know, we say it’s that gestalt, but it really is based in evidence and experience.” (10:55, D)
On the innovative modeling shift:

“You model the nurse as the sensor system and... the documentation trace that they’re leaving as the proxy for what the nurse is thinking. That’s absolutely beautiful.” (20:35, C)
On why frequency of notes matters:

“The frequency of those notes appearing is one kind of signal. The actual value... is another... But what you’re not measuring is that spidey sense... And the proxy for that is the frequency of the documentation.” (27:41, B)
On user-friendly design:

“Our system does not have a pop-up or fire an alert or interrupt in any way... we actually write the score back to a flow sheet row. So it’s available in the system for anywhere we want to produce it.” (39:59–41:00, D)
On measurable impact:

“Early ICU transfer is going to go much better than a late ICU transfer. But the literature doesn’t have great evidence about this because we can’t measure early vs late ICU transfer. So we were able to unpack that...” (33:14, D)

Key Timestamps for Important Segments

| Timestamp | Topic/Quote | |-----------|-------------| | 03:08 | Sarah’s nursing and informatics career journey begins | | 06:57 | Nursing surveillance and the “invisible” value of nursing documentation | | 10:55 | Insight that “gestalt” can be inferred from documentation density | | 11:52 | Birth of the CONCERN study concept | | 15:02 | A 12-hour nursing shift & how nurses document care | | 20:35 | Modeling the nurse as a clinical “sensor” | | 22:00 | Extracting the predictive signal from noisy EHR documentation | | 28:03 | Differentiating between frequency and value in documentation | | 31:39 | Scope and logistics of CONCERN’s multi-site trial | | 32:30 | Summary of major clinical outcomes with CONCERN deployment | | 39:56 | Addressing alarm fatigue, generalizability, and equity | | 42:06 | Implementation details: non-interruptive design and bias monitoring | | 46:25 | Impact of documentation burden and prospects for ambient AI | | 49:17 | Final elevator pitch for CONCERN’s approach and implications | | 51:24 | Concrete first step: reduce nursing burden to maximize surveillance value | | 51:49 | Resources: concern toolkit and study links |

Actionable Insights for Healthcare Leaders

Nursing Data = Clinical Signal: Nursing EHR patterns are powerful, often-overlooked predictors of patient deterioration and should be treated as such.
Reduce Burden, Enhance Outcomes: Streamlining and unburdening nursing documentation enables more effective clinical surveillance.
Adoption Pathways: Interested institutions can access implementation resources via the CONCERN Toolkit and engage with Sarah’s team for research licensing and collaboration.
Trust and Integration: Successful AI tools must be non-intrusive, transparent, and built around real clinician workflows.

Learn More

CONCERN Toolkit: concerntoolkit.org
Columbia DBMI Concern Study: Columbia Department of Biomedical Informatics website (Concern Study Page)
Relevant Publication: Nature Medicine, full trial details and methods
Contact: Links available via toolkit and Columbia websites

Sarah Rossetti’s Final Takeaway:

“These nursing behaviors have an outsized influence on patient outcomes. Clinically, we’ve always known that... but we can now actually measure it. This is a valuable use case—there’s so much more we can learn by modeling clinician judgment and process, not just raw patient data.” (49:17, D)

Loading summary

Transcript77 lines

[00:02]
A
Welcome to Practical AI in Healthcare, the podcast that cuts through the noise to spotlight real world solutions delivering real world value. From patient care to clinical research, from life sciences to patient engagement, we focus on what truly matters in healthcare today. No hype, no theory, just practical insights where AI is making a true impact. Dr. Steven Lapkoff and Dr. Leanne Rosenblitt are your hosts as we explore what's real and moving the needle in this exciting new domain. Welcome aboard and let's get to it. As many of our listeners know, Leon and I work very closely with the DCI Network Division of Clinical Informatics at Beth Israel Deakins Medical center in Boston. This June, the network is hosting Patient powered Digital Health 2026. The conference will bring together patients, innovators, industry leaders, healthcare providers and policymakers to shape the next generation of real world patient centered solutions. The meeting will run from June 22nd to the 24th in Boston at Harvard Medical School. We've arranged for our listeners to get a discount on registration to the meeting. If you Register now before June 15th and use promo code PracticalAI June no spaces, you'll receive 30% off your registration fee. You can learn more at dcinetwork.org patients2026. In addition, we're always looking for sponsors. If you or your company are interested in becoming a sponsor a please reach out to admincinetwork.org see you in Boston.
[01:29]
B
Hello and welcome to this week's edition of Practical AI in Healthcare. My name is Dr. Stephen Lapkoff and I'm here every week with my colleague Dr. Leon Rosenblit. How's it going, Leon?
[01:38]
C
Oh, I'm great, Steve. Really excited to have our guest today, especially on National Nurses Day.
[01:45]
B
Yeah, I think we lucked into this recording date and we have actually today is the first time we've actually had a nurse on the podcast and to that I want to actually apologize to the audience and apologize to Sarah, who we're about to introduce. We've been remiss in having the nursing informatics perspective on the podcast, but we're going to remedy that today. Today our guest is Sarah Rossetti. She is a PhD nursing professional out of Columbia University and she has done some absolutely amazing things that we're going to talk about today that revolve around nursing care and patient care and patient risk and involving AI and modeling, which has been some of the most groundbreaking work that's ever hit the scene in the last, actually in my entire career, to be honest. And we'll unpack that as we go through it. But before we go on. Sarah, welcome to the podcast.
[02:42]
D
Oh, thank you so much. Thrilled to be here.
[02:45]
B
So Sarah, you know, we wanted to speak to you and have you. First of all, the first question we usually ask our guests is how did you get your superhero cape? How did you get from, you know, through your career and to where you are today? And how did it all re,
[03:02]
D
how
[03:02]
B
did it all result in you landing in, in this chair today?
[03:08]
D
Great, great question. So I think I would start by talking about where I started clinically. I was very fortunate to. So I went to University of Pennsylvania for my nursing school as an undergrad and straight from there I was able to go to the new graduate critical care program that at the time Mass General offered and so had six months of some additional training there and starting right off in the CCU at Mass General. The reason I, you know, talk about that, it was a very interesting time there. We were on essentially hybrid charts. We were using paper based flow sheets, we were using cpoe, you know, electronic orders. And so really started as I was a novice nurse, observe the systems around me, how they were working, how they worked well, what could still be done there was obviously all the work done there at the Lab of Computer Science for many years. From there I worked as a travel nurse. And so I worked at a community hospital in California and saw different systems, saw how those impacted work. And after that worked in New York and started my PhD program in nursing informatics. And so those clinical experiences and some very specific patient cases got me asking questions about how these systems influence how we document what we do and observing how nurses practice and record their care. And so during my PhD training in nursing informatics, I really was thinking through many questions that still today I'm researching and answering. And so I completed then a postdoc in biomedical informatics at Columbia University at the Department of Biomedical Informatics. I worked back up in Boston for about seven years at what was then Partners Healthcare System and with an appointment at Brigham and Women's Hospital and Harvard Medical School as really doing a blend of operational informatics work and research as well. And at that time, and I know we'll talk about the concern study, that's when I put in with Ken Ricado, my MPI for grant funding for that. And then I've been back at Columbia, jointly appointed in the Department of Biomedical Informatics and in the School of Nursing and I've been back for about eight years now.
[05:49]
B
Wow. We have a couple of intersection points in common, although I don't know that they temporarily interacted I also trained at the Brigham from my fellowship time back in the day, but that was probably a much longer time ago than you were there. And did you interact with Octo Barnett when you were at.
[06:05]
D
No.
[06:05]
B
No. Was he already gone by the time you went through?
[06:08]
D
I think so. I think so, yeah. Yeah.
[06:10]
B
He's one of the grandfathers of the field, and I got to interact with him when I was working up there. But again, it was many, many years ago. So tell me a little bit about, you know, nursing hasn't, you know, we dropped the ball in getting nurses involved here. I know that amia, the American Medical Informatics association, has not always been as sensitive to bringing nurses into the picture. But. But nursing is at the forefront of patient care. Nursing is the very, you know, the spear tip of patient care. And you nurses see and record things that are incredibly important, incredibly valuable, and yet it doesn't always bubble to the top. Maybe you could unpack a little bit about that with regards to what you saw at the CCU at Mass General, why you lead to the concern project.
[06:58]
D
Yeah, sure. So there's this concept of nursing surveillance. So when we're talking about nursing in the hospital setting, nurses do many things. They're coordinating care, they're administering medications, they're assessing patients. But within the nursing process, there is this very clear activity of nursing surveillance. And the nurse is assessing the patient, observing for changes in the patient's condition and responding to those changes. Whether that's an additional intervention, an additional assessment, oftentimes more frequently assessing a patient that they suspect might. Might have something brewing. We use that term clinically, right? Oh, there might be an infection brewing, the patient looks a little bit different, and so they're paying more attention. And this process is very dynamic. It is not captured in our documentation. Well, it's not captured in our electronic health records. If you think of the ehr, there are many structured data points, especially over the past 20 nurses, documentations become highly structured. There's actually been a lot of movement away from narrative nursing notes. And so we have these discrete fields that don't link yet the nursing process and nursing surveillance is linking these fields. So I'll use a really specific example. When a patient in the hospital has a normal oxygen saturation in their flow sheets, you know, maybe they're 98%, you know, almost all day, they're bouncing right around those, those high numbers. But a nurse puts a comment into the flow sheet saying, well, now they're at six liters, you know, inferring in the morning, maybe they Were receiving supplemental, a low level of supplemental oxygen at 2 liters. Now they're receiving 6 liters of supplemental oxygen. You know, the max through your nasal cannula. That is really informative. But there's no great way that our eh, really surface that information. The nurse puts that comment in to communicate. This point of 98% is not telling you anything. This patient is not doing well. They are decompensating even though their value looks normal. And so amidst all this the nurse is speaking with the care team, the nurse is escalating discussions to see what else should we be doing. Is somebody else going to come check on this patient? What is the plan? And that's a process that's for the most part invisible in the data that we analyze for patient outcomes.
[09:46]
B
And you know, one of the things, you know, as I've mentioned, I was, I trained as a cardiologist so I was in the same zones as you are, not at the same time, but whenever I would make rounds, the very first question is, you know, going to the nurse, what's going on? Because the nurses would have the most accurate perspective on exactly what was going on. Because as a clinician, as a physician rather, I don't spend every day at the patient's bedside. I don't spend hours and hours at the bedside. But the nurses do. Yeah, and they actually, you know, in the CCU setting generally they had two patients, maybe one, maybe in a, in a step down unit, it's three or four but it's not a big ratio. It's not like 10 to 1 on a med surg floor. So the nurses get the most accurate and then they have this gestalt, they have this gut feeling about what's going on. And you know, I've been oftentimes wondering, especially since I've read some of your papers and read through your work, you know, how do we capture that gestalt? How do we do that? And I guess that is that where you began to get the beginnings of your thoughts around building the concern. And maybe you can even unpack what the concern acronym stands for and, and go through all that.
[10:55]
D
Yeah, without a doubt. And it really is expert assessment by the nurse. You know, we say it's the, it's that insider, that gestalt, but it really is based in evidence and experience. So when I was a new nurse and I was trying to kind of understand, you know, how these systems worked, how the more experienced nurses were interpreting patients, conditions and states, what I observed is that, and this was when flow sheet documentation was on paper that for the patients that these more senior experienced nurses clearly were not worried about, and this was in the icu, there was a lot of white space on the paper, you know, not densely recorded, but there were. It looked very different for patients that clearly in receiving handoff, this nurse had some concerns and was worried in perhaps in that handoff, guiding me on things to keep an eye out for. That was a densely packed flow sheet. And so that was the first observation I had which led to asking this research question of can we understand a nurse's level of concern from the density of their documentation, which is what the concern study is. At its core. We've built a machine learning based model that uses the patterns, the metadata patterns within nursing documentation to predict patients at risk of deterioration. So outcomes such as needing a transfer to the icu. Because the model is both for acute care patients as well as ICU patients, risk for mortality in the hospital, risk for a rapid response, risk for sepsis, for example. And as our modeling has advanced, it's not only the frequency in which nurses document vital signs or comment associated with those vital signs or increased notes, it's also when they do it at uncommon times. So in the acute care setting, if a nurse is going into a patient's room at 3am and recording vital signs, you might ask, well, why? Because it's typically important to let patients rest who are, you know, in a condition where they should be, which really should be most acute care for patients. And so these signals are actually quite strong and we've been able to model them out and predict deterioration just about two days earlier than other early warning systems. And colleagues from the University of Utah actually took one of our less advanced models and replicated the ability for good predictive performance in the EICU collaborative data set, which is a data set of over 200 hospitals throughout the country. So we've shown the generalizability and reproducibility of this modeling approach. This is a nursing practice that, that is very consistent, you know, at least across our country. And we're, we're starting to explore areas internationally where we might, you know, hopefully learn if it holds there as well.
[14:15]
C
So Saris, really fantastic findings and, you know, and I want to dig into the method a little bit before we get into the math and the model architecture. I want to slow down and teach us and our audience a little bit vocabulary. I think the audience of this podcast leans heavily physician exec and informaticist, and most of them have a rough mental model of what a doc does on Rounds. But fewer of us have, including myself, have a clear picture what a nurse on a medical surgical floor on the ICU is doing throughout a 12 hour shift. In particular, what kind of trace are they leaving under normal circumstances in ehr, and you've mentioned some of it, flow sheets, but can you walk us through that and sort of explain the touch points of what is the normal record keeping look?
[15:03]
D
Yeah, sure. Yes. Yeah. So, so let's talk about, you know, a day shift starts at 7am this is typically a 12 hour shift, 7am to 7pm the nurse will come in and in the ICU typically has about 10, two patients, excuse me, on an acute care floor in the range of 4 to 6 ish. And so we'll receive handoff from, from the night shift nurse and, and then once that process is done, probably within the hour, go into their patient's room. Handoff sometimes is at the bedside, you know, with the night shift nurse. But then once that handoff process is over, then go into the patient's room and do a head to toe assessment of the patient. And this assessment, you know, certainly is specific to the conditions and the reasons the patient is hospitalized. If you know you're on a neuro floor, there's a much more detailed neuro assessment done for a patient, for example, and the same likewise for a cardiac floor in a cardiac patient. But that head to toe assessment's done if the patient's sitting up in bed, talking with the patient, getting a sense of okay, how do they respond to me, how are they, you know, what's their cognition like, how are they mentating, putting those pieces together, does it align with the handoff report that I got or if I previously cared for this patient, does it align with how they looked yesterday? Typically, you know, after that head to toe assessment, a lot of medications are often due in the morning. Now on an acute care floor, you're checking your patient's vital signs maybe every four hours. It depends on the standard of care of that unit in the ICU every hour. So the nurse is going into the patient's room in kind of these regular times. And every time they go in the room they're assessing the patient. But I'll use another example. They'll record those assessments in the morning, the head to toe assessment when they need to, vital signs when they're due, within an approximate time range. But they're always assessing the patient. When I walk into a room or, or seeing the two of you on the screen right now, I can pretty much interpret that your Respiratory rate is within normal rate. You know, within reason. Right, within reason. So I use this example because this is a manual process. Right. You know, you're supposed to actually, you know, count the respirations within, you know, however many, you know, seconds and compute it to, to within a minute. Right. Um, the nurse is doing that when they walk in the room. They don't always record it in the EHR though. Right. Because, you know, okay, I looked at the patient, I could tell that they were within normal. But when we start to see those extra recordings in the ehr, that's telling you, okay, there's something here that the nurse is giving extra attention to, that assessment that doesn't always warrant the actual documentation of it. And so when we start to see those escalations, that's a strong signal that the nurse is worried about something. Worried early on. And actually sometimes it is hard to pinpoint exactly what that worry might be. But it's the reassessment that the nurse does because they're trying to figure out why is this patient not looking right. To me, there's something about how they're looking or how they're talking about something about what they said, are they not mentating? Right. And I'm going to keep reassessing them to, to see if my concern should be escalated to the care team. And it's this early process which nurses do day in and day out in hospitals that we're able to tap into. And that's why our prediction can be that much earlier than any vital sign change, which is a vital sign change. You know, moving from your normal state to your abnormal is a late indicator and the same with lab values. The other point I'll make just briefly here is it's not always that your vital sign is abnormal. It might be oftentimes in our model. Actually another strong signal is when a nurse holds a medication. And so if you have, you're going to a cardiac example, a lot of patients might be on a blood pressure lowering medication because typically their blood pressure is high. But the nurse may note, oh, their blood pressure is lower than their normal baseline. It's within normal. Right. But for that patient, it's not normal. I'm not comfortable giving this blood pressure lowering medication because why are they so much lower than their baseline right now? What's going on? So when you see a nurse make a decision to hold a medication, such as one that can influence a clinical condition such as that, or bottom them out, as we might say, that's an indicator too. Also giving PRN medications. This, you can, you can, you can see how that might be the case. You know, you give Tylenol for, you know, suspected infection or something. You can see that. I think that's an area we still really want to dig into. What are these medications that have the strong signal and why, you know, what are the decisions that are being made?
[20:36]
C
I think that's super helpful. And the examples are, I think are really evocative. I mean, you're describing the nurse as a super sensitive sensor that's deeply aware of everything that's happening with the patient and is an early detection system of things that won't show up in other ways for a period of time. So I think that. Thank you for helping us sort of set the ground and really help us understand what's happening on the floor. So let's go to the move that you made, which is genuinely interesting and unusual. Right, So a lot of people studying patients that are getting acute care would, for early warning signals, would try to model the patient or their, and their physiology and say, what's the patient's system? Right. Let's look at the telemetry, let's look at the biology. You instead model the nurse as the sensor system and in fact, specifically looking at the documentation trace that they're leaving as the proxy for what the nurse is thinking. Right. So that's absolutely beautiful, Beautiful, beautiful example of making use of existing observational data. Right. So you got to, you know, take digital exhaust and you know, mind it into something that's really interesting. So help us how you got to the modeling and the architecture. So I get, I think you explained the intuition, right? The nurse, the nurse is seeing stuff that others don't. But what, how did you think about the signal extracting it from all that noise and creating models that are able to turn it into something usable.
[22:00]
D
Yeah, yeah, great question. So we started with vital signs and looking at the frequency and the patterns in which those are documented. And so we have some early publications where we showed there's an association between actually vital signs overall. But when you even look across the different ones, the frequency of them and mortality and actually in cardiac arrest. And so we knew there's a signal there and so we modeled each of those out. We also thought early on would be looking at some narrative information too. And there is a signal in mentions, for example, of MD aware. And it makes sense. Right. So the nurse is documenting that they escalated a concern. The md, the doctor is aware, they communicated that concern. So we looked across the board of nursing documentation to see where the signals might be. We knew early on these signals that looked at frequency were pretty strong. And like I said, those are the strong predictors in our model. I want to acknowledge Ken Ricado, my MPI on this. He had incredible insight to looking at some broader signals too. Really around seasonality is in our model these different types of temporal signals. So we have an ensemble based modeling approach. There are about 1200 models within our system concern system. What it does is it looks at for instance the characteristics of that patient at that time, what hospital day is it for the patient, what day of the week is it, is it night shift, what season is it and what it says is based on those characteristics, then picks the right model that then knows what the common documentation pattern is versus the uncommon. And if it's uncommon, then that tends to pop up as the signal.
[24:29]
C
Yeah. So Sarah, there's, you know, what you described I think reminds us of the line from the rapid response literature from 20 years ago. And I think you mentioned it, right. Where when the nurse is worried about the patient is considered a perfectly valid trigger to escalate, but nobody could measure it. And I think you figured out how to measure it by looking at existing documentation trace. My slightly nerdy question is why the ensemble architecture, right. Why the choice of 1200 separate ML models that look at sort of specific times a day like and understand there's variability. Right. As opposed to some other architecture. Help us understand that architectural choice to
[25:11]
D
put this in context. So we were building the model between 2017 and you know, 2020 ish, and our thinking really was we want to be able to understand what are the features driving this model and so that we can expose those as well to the end users so that, you know, to, to the end users so that they can have, you know, faith and trust in this model. An important point to make is whenever we've, we did a lot of training with our end users, so, so we tested this in, you know, multi site randomized control trial training with the end users before we deployed it. Training with nurses was simple. They got it. This is their documentation pattern and process with physicians. They were certainly on board with it. But we had to explain how this works and why it works because it wasn't intuitive to how they may interpret documentation and certainly not consistent with other types of predictive models and you know, and how it would be arriving at that prediction. So we do provide the, you know, kind of ranked features that were driving the prediction for that specific patient on this kind of drill down screen that is available. We understand clinicians are busy. So with a lot of user centered design, we made it so they can quickly get, you know, get, get the risk level. But if they want, they can look at the drill down.
[26:53]
B
So Sarah, I wanted to sort of like dig into this a little bit further because what I'm trying to understand is as is this a matter of the frequency of documentation or, or is it the values of the documentation? Because what I'm hearing you say is a little bit of both. That the nurses, by the virtue of the fact that they're putting something on the virtual flow sheet which when my day would have been a big piece of paper or giant piece of paper and the frequency of those notes appearing is one kind of signal. The actual value that they're generating is another kind of signal. And you found a way to sort of, maybe you can unpack that. Are they married together or are you looking at one as a lead to the other? Or how is that part of the model?
[27:39]
D
And what do you mean by value?
[27:42]
B
Like you were saying, the six liters of nasal cannula, that's a value. The respiratory rate is a value, which is an objective value. It's something you could measure. But what's something that you're not measuring is like that spidey sense going off in the nurse's head. And the proxy for that is the frequency of the documentation.
[28:03]
D
The signal is in the frequency of the documentation. With that is the common versus uncommon time. Right. The importance of the temporality of that frequency, that's key. The value is not what is driving what is driving the prediction. Now we've put in values because we want to make sure when you get closer to that patient event, we want to be aligned with other, you know, early warning systems. You don't want it to be saying something different than the news. Of course not. You know, so you can add in the actual vital sign values, but they don't give you an early prediction. So our early prediction is driven by frequency and the temporality aspects of this information.
[28:48]
B
And that's like, that's the stroke of brilliance. It's like, you know, Leon mentioned it earlier, Most people, you know, myself included, would look at the patients and try and find commonalities. Like all the sepsis models that are looking for sepsis, risk of sepsis, they tend to look at the patient, they tend to look at the lab values for the patient, they tend to look at their. But what they don't tend to do is what you've described, which is that patient's looking pretty punky. And there's like five notes in an hour. And that in and of itself is the signal.
[29:15]
D
Yes.
[29:16]
B
And that is. That's. That's. That's a stroke of bruise. So you did this. This study is huge. You did this over an enormous swath of institutions. How did. Can you unpack that a little? Because that's, you know, aside from the AI stuff, which we try to really dig into in the podcast, which we're doing a bit, I'm also enamored by the fact that you do this over an enormous number of institutions. How did you pull that off?
[29:39]
D
Yeah, I think there's just one phrase. Team science. I mean, it really, really, you know, we had an incredible team. So we did the trial during 20 some interesting years.
[29:57]
B
What was going on then? I don't remember.
[30:00]
D
I know, I don't remember.
[30:02]
C
It was so boring. We forgot.
[30:04]
B
I wish I could forget, honestly.
[30:05]
D
Exactly.
[30:06]
B
Yeah.
[30:06]
D
Well, here's the other interesting tidbit about that. So at Columbia, we went live with EPIC in February of 2020. So it was an exciting time. So we actually. So knowing that date, you know, those go live dates well in advance. So we delayed the trial at Columbia, knowing that, and it ended up being a good decision, all in all. So we actually first. So our sites were Columbia, the two hospitals at Columbia University Medical center, and then two up in Boston, Brigham and Women's Hospital and Newton Wellesley Hospital. So each had an academic one with the partnered community hospital. And so we went live up in Boston a year before, in late 2020. I want to acknowledge Patty Dykes. She was our site PI up there, and she made that happen. If you ever want to know something about implementation science, talk to her. She had laid that groundwork. Everything was ready. Covid came, and it was clear we were ready to go. And this might help. And she's a genius in that regard and was able to make that happen because you can imagine it could easily have been not feasible. And then the year later, we went live at the Columbia site.
[31:39]
B
So it was 74 units, not 74 independent sites. 74 units.
[31:43]
D
74 units.
[31:44]
B
But it's still a lot across the
[31:45]
D
four hospitals, just over 60,000 patients.
[31:49]
B
Yeah, that's still a ton of places to take it. So the acronym, I think, stands for Communicating Narrative Concerns entered by Registered nurses. That's the acronym for.
[31:59]
D
That's the acronym. Yeah.
[32:00]
B
And, you know, walk us through, you know, I guess kind of we've covered it to some degree. The things that you found were basically what we've Been talking about the frequency of the signals, the frequency of the concerns, and that's what really raised the issue. And then that resulted in some pretty amazing findings like 35%, 35.6% reduction in instantaneous mortality risk and 11 plus percent in shorter length of stay. Those are spectacular interventions. I mean, astounding. Can you unpack that a bit?
[32:30]
D
Yeah, absolutely. We were surprised by the increase in ICU transfer. So it was this great decrease in mortality risk, a decrease in and risk of sepsis of 7 and a half percent overall. Half a day, just over half a day, shorter length of stay, increased transfers to the icu. So this right off the bat, you know, as we're, as we're, as we completed the trial, that was our first sub analysis, you know, what was going on with these ICU transfers. So what we realized is we can, we can measure deterioration because we now have so concern runs hourly. So we have hourly scores of that patient's status prior to that. You can't really measure deterioration. We measure outcomes, we measure that they had mortality, we measured they had sepsis. But what happened in this whole process the day before, the day leading up to the two days leading up to. So we now had these hourly scores. And so we were able to look at for our intervention units what those hourly scores were when it first changed. Our score is red, yellow, green. So when it first changed from green to yellow, how long until the patient was transferred to the icu for patients that needed to be transferred, what happened to them? And so we're able to show that the early, the patients transferred early, they had much better outcomes. Highly statistically significant. Those late transfers did not do as well. Those had those, those were the patients that, you know that, that did die. This, if you talk to a clinician, you know, you guys would know, of course an early ICU transfer is going to go much better than a late ICU transfer. But the literature doesn't have great evidence about this because we can't measure early versus late ICU transfer. And so we were able to unpack that those increased ICU transfers in our trial on our concerned intervention units really were a mechanism that was leading to better outcomes.
[34:36]
B
And I gotta guess that initially there must have been some people out there who said, oh my God, you're sending patients to the ICU center, more ICU time. But it's exactly the opposite. By virtue of the fact that you've intervened those outcomes long term, the patients are going to do better. And it makes perfect clinical sense. But if you're A bean counter. Looking at this, it might make a bean counter get a little panicked and say, oh my God, we're raising the number of patients going to the unit. And ironically, it's exactly the opposite, because you're doing that, you're utilizing the ICU in a far better way and getting the real value, which is early intervention, early, you know, early treatment, and that leads to better outcomes. And that's the whole idea of the icu.
[35:21]
D
And in and of itself, absolutely. We've actually recently partnered with colleagues from Columbia's business school on doing a cost benefit analysis. And it is definitely a net benefit for hospitals.
[35:36]
B
Yeah, I'll bet. That's an interesting finding.
[35:40]
C
So, Sarah, I want to probe on a methodological issue that you raised on a prep call yourself. I think it's worth discussing and it would be important to the audience to understand. So you said when you're doing deterioration work, you don't have a measure of deterioration. Right. That's a deep epistemic problem. Right. It's a kind of a proxy measure, index measure problem that we struggle with in clinical research all the time. You know, you're predicting an event that doesn't have a clean ground truth and may never have a clean ground truth. So, so how do you know your alarm bell fired correctly versus that a sicker patient just got sicker for unrelated reasons? So walk us through how the team handled the absence of a gold standard.
[36:23]
D
Yeah, sure. Yeah. So a few things there. I mean, you have to be really clear about the outcomes that you're trying to predict. And we had what we call kind of a composite set of outcomes. And then there are the outcomes that you have for creating your predictive model and testing the performance of it. And then to be really clear, then you need clarity on what. For us, we had a clinical trial, what those outcomes are as well. But so, you know, we were focused on ICU transfer, rapid response, cardiac arrest and sepsis and mortality. And you know, we had to be, to be really clear about the fact that, that. Well, I'll answer in two ways. You know, so you build your model and, and you look at those outcomes. We did chart reviews to make sure our outcomes were well defined and validated for sure. But as you, as you actually move towards intervening, you have to be clear about how you're, you know, we did a multi site randomized controlled trial, so we have that difference. But if you actually turn a score on and start to intervene, you then you don't know if that outcome was avoided or if your score was actually having that impact or not. So there's a lot of thought that needs to be put into how you actually, what context are you evaluating these outcomes in and how so us now being able to actually look at these hourly deterioration scores, we think that offers a lot. We're actually working on being able to have those scores that have run silently in the background because then you can infer some other information from that. For any sites that aren't live, what was the score for that patient? And then what, you know, you know that you didn't actually have that live intervention going on that could have moved the needle this way or that way.
[38:42]
C
Super interesting. I think what's valuable for our audience to recognize here is that with clever methodological thinking, right. With stepping back and thinking hard about measurement, you can often develop ML training methods even when it's not obvious that there's a ground truth in a gold standard.
[39:04]
D
Right.
[39:04]
C
That's easier to anchor micron you just, you know, but it just requires really hard thinking and some sophistication. So let me kind of ask you about the kinds of concerns people raise about AI ML methods of the kind that you guys develop, right? So, you know, critics of clinical AI typically complain about three things or they, they worry about three things, sometimes justly. One is alarm fatigue. Right? It's. And it's just like, oh man, another fricking alarm. Come on. Another generalizability beyond the training site, which you partially address by having multiple sites and follow up studies and then equity in the sense of whether the model works the same across populations and care settings. You've addressed one of them at least. And perhaps can you talk more specifically about how your team thinks about those three concerns and what have you done so far?
[39:56]
D
Yeah, we thought about those for years. Absolutely. So important. I also a little side note, I research documentation burden as well for a large part of my work. So alarm fatigue, huge problem. Our system does not have a pop up or fire an alert or interrupt in any way. We worked with end users on what's the user centered design that fits. Where would you want to see this in your workflow? And recognize too that that may not always be the same answer. And so what we built is that we actually write the score back to a flow sheet row. So it's available in the system for anywhere we want to produce it. So that's kind of the architecture there that if we find a different patient population or workflow is important, then we can surface it in the ehr that way. So for example, we put it on the patient list in the ehr, because by and large that's where, you know, this is being displayed to the nurse, the physician, PA and P in the inpatient unit. That's by and large where they go every time they log in. They could see it. There's a small asterisk next to this icon of regular green circle. If there's an asterisk, it changed in the past hour. That's all the information they need to act. Like I said before, they can double click and drill down. Great, that's there. It can enhance trust, but for the most part, busy clinicians, they don't have time for that and they don't need to if they can just see if it changed in the past hour. But we could build this in a different place in the EHR for a rapid response team, for example. And so we had that vision in mind. So key. Absolutely. Working with end users. One other point, you talked about alarm fatigue, but around documentation burden, we do not ask the nurse to input any additional data. So this is not a score where their morning assessment needs to be done in order to get some discrete field. And no way. We were not going to go about it that way.
[42:01]
C
Yeah, it's part of the genius of your approach. It's like, look, you're writing stuff down. That's the signal.
[42:07]
D
Yeah, exactly. In terms of equity, hugely important question. So we did look at bias within our data outputs. We know that there's bias within EHRs, we know that there are bias patterns. So we looked at our outputs to see if there were differences in the distribution of outputs by race. And we did see that there were differences. And so we did account for that in our, in our, in our model that we deployed, we also looked at differences in comment documentation. I talked about nurses comments a little bit. So in the frequency of comment documentation by patients who are English speakers vs non English speakers in terms of primary language. And we found differences there too. So these differences are in there in the patterns and the data we leverage patterns. So we need to look for these. It's just scratching the surface. So I want to be clear about that. This is a really important area within AI to figure out what are all the ways that we need to look at bias and mitigate bias in our outputs.
[43:16]
C
So let me take tackle a more positive side. Right. So I mean, let's, you know, I'm imagining being a CIO of a health system with CMO and saying, man, I want this. The concern algorithm is ip, right? So somebody who owns the copyright as far as I know, it's not. It's not in the public release. The identified outcomes are all going up on physionet. But imagine that I'm a health system that's listening to this episode and going, oh, my God, we got to implement this tomorrow. What's the path? Do they replicate from the methods? I mean, your paper is Nature Medicine. Congratulations. Should they wait for commercial product? Should they license it from you? What does the next year look like for diffusion of this very, very important finding?
[43:58]
D
Yeah, so we are implementing it other sites with funding from American Nurses Foundation. So that's Vanderbilt University Medical center in Washing University, St. Louis. And then we're building out. I'll just mention a pediatric model with Children's Hospital Colorado. So there is work to expand right now in terms of other sites that want to expand. We built with funding from American Nurses Foundation Concern Implementation toolkit that's online and that has all the information you need, from how to pitch this to your hospital executive to some of the technical documentation, to how to train your end users. What it doesn't have is the actual software code. And so for that, we would enter into an academic research license and would be more, you know, would be really happy to engage in conversations to see if this is a fit and if we should proceed with that. And I know this is how we both feel, so I feel comfortable speaking for Kendra Cato, the CO lead with me on this. We want to see this in every hospital in the country because we know this is how nurses practice and this is just valuable for patient outcomes.
[45:09]
B
Yeah. And it's, you know, the fact that you've been able to demonstrate this in a reproducible way in multiple institutions across multiple different populations, and it holds, and it holds up really well. That's about as good as it gets. I can't imagine anything beyond that that would make it even more credible. You know, I do want to ask you a little bit of a provocative question. EHR documentation, you said you've studied this. It's, it's, it's become the bane of the existence of a lot of clinicians. Certainly my friends who are still practicing find EHR documentation to be terrible. Does that, you know, what degree is that going to impact what you're doing? But you've already said it's the frequency of the documentation, not the actual notes, not the actual information. So I'm just trying to get a sense of, does the documentation burden have an impact on your algorithm or is it just completely, you know, does it
[46:06]
C
not matter Steve, if I can jump in. I think the real crux for us for our podcast is the enormous transformation with ambient AI.
[46:14]
D
Without a doubt, it describes.
[46:16]
C
Right. Like is that going to make the job of nurses capture creating the signal easier or is it going to completely mess it up?
[46:22]
D
Right.
[46:23]
B
Yeah, exactly. Thanks, Leah.
[46:26]
D
Yeah, I love this question and of course the researcher in me, I couldn't be more excited. I'll answer specific to one thing about documentation burden because I think it informs what we might see in the future is that in some other work that I've done, we've seen during the COVID pandemic, requirements for documentation were relaxed. Nurses and others just had to document medications. So we performed a natural extra experiment of what did nurses document at that time, then when they could choose, and they captured what we then with mixed methods confirmed to be essential documentation. They maintained high levels of documentation that was informing clinical care coordination, communication. Those are what we're tapping into in the concern model too. Right. These signals, they maintain that even when we say you don't have to document these other things that are kind of for secondary purposes or all of the reporting requirements that we place on nurses. So I say that because as we transition to, you know, these new modes of data input, I'm really excited to be able to in an even bigger way model nurses behavior because at the end of the day that's what we're doing. We have these proxy measures so maybe we're one step away from that actual behavior. It's the behavior of inputting into the ehr. So now we could be even closer, you know, so with ambient hearing what they're doing, how frequently are they in the room, we're closer to that actual detection of this interaction with the bedside and then, you know, computer vision technology, I mean to actually see, see what's encountering. So, so I'm, I'm, I'm really excited to model nurses behavior in a, in an even more granular way than, than we could imagine right now.
[48:14]
C
Yeah, there's such cool opportunities emerging. And the method reminds me of very old social science finding and I'm completely blanking the reference, but it was a juvenile delinquency prediction model that found the best predictor of juvenile delinquency was the, the size of the file on the kid. So just measured in inches and you're like, yeah, that's a big file. Man, that kid's gonna, that's kid's trouble, right? Brilliant, right? Absolutely. You know, brilliant use of kind of created derived artifacts and you know, and signals in the system. So I think this is hiding in plain sight. In plain sight and, you know, quantitative measures from, you know, kind of created out of the exhaust. So very clever stuff. But, you know, let's try to bring this to a landing. I think we, you know, imagine that a health system CIO is listening to, you know, listening to this and then has 60 seconds in the elevator with their CMO and the CNIO tomorrow. And what's the one thing about concern or about modeling clinicians instead of patients that you want them to walk away knowing?
[49:17]
D
Yeah. That these nursing behaviors have an outsized influence on patient outcomes. I think clinically, we've always known that. You pointed out the rapid response literature for years that if a nurse is concerned, then act, but we can now actually measure it. And so this is a use case and a really valuable use case for it. Absolutely. And I think that we can learn even more and apply, you know, this method and this approach and what we can now measure about nursing care and I think healthcare processes in general, you know, we, I think, I think we can think about even more use cases to understand what our expert clinicians are doing in a way that, that we haven't been able to measure that expert decision making before.
[50:12]
C
Yeah, I love that. So the system as a whole is generating signals, not just what you're measuring about the patient. Right, Absolutely. It's such a profound insight. So to push that just a little bit further, what's the first concrete step that a health system could take if they wanted to start working in this space? Not necessarily implement concern. Right. Maybe they're not ready for that. But what kind of capacities, capabilities? Recognize that nursing data is signal and not noise.
[50:40]
D
Oh, great. I would say let's remove some of the burden that we impose on nurses because we take up too much of their time and time is limited. And so it's the nursing surveillance that is influencing these patient outcomes. And if we fill it with tasks, then nurses don't have time to engage in that. And it's really important to be able to engage in that. Think through what your patient needs, go in and reassess the patient. And this is limited time that when AI saves time, we add on a task and keep doing that.
[51:24]
C
Yeah. And I think the profound insight is a nurse who is filling out a form is not eyeballing the patient. Right. And saying like, boy, that's funny. And that turns out to be a critical, critical function. So just to wrap it up, where can people learn more about your work? The trial, the 25 by 5 task force that you're leading for AMIA, which we didn't get to talk about. And how do we reach out to you and the team? Where do we look?
[51:50]
D
Yeah, great. So our toolkit is@concerntoolkit.org, and then we also have our study website on Columbia's Department of Biomedical Informatics website. And it's under Concern Study there. And happy to provide the exact links to you.
[52:10]
C
Yeah, yeah, we'll post it in the show notes for people who I'm sure will want to look it up. Well, with that, Sarah, I wish we had more time to dig into the many other exciting things you're doing, but you've been such a wonderful guest and reminded us why we absolutely have to have nurses as part of this conversation. So thank you so much. Steve, thank you for, as usual, for co hosting and I want to thank our audience and invite everyone to join us for another exciting episode next week of Practical AI in Healthcare.
[52:36]
D
Thank you so much for having me. This was wonderful.
[52:46]
A
Thank you for joining us this week on Practical AI in Healthcare.
[52:50]
B
If you're ready to go beyond buzzwords and hype and explore how AI is
[52:53]
A
truly transforming healthcare, stay tuned for more conversations that get us to what works. Until next time, stay practical.