Transcript
A (0:00)
The World Health Organization has thrown its hat into the ring of AI development and there's some really interesting implications and lessons that it highlights. So why are the World Health Organization doing this? Imagine a clinician in a remote low bandwidth facility attempting to manage a complex obstetric emergency. They have the training, but the specific up to date WHO protocol that they need is buried inside a 300 page PDF. Accessing that document requires time and data that the clinician doesn't have. This is the friction point where clinical excellence meets the reality of information overload. So the special program of research development and research training in human reproduction known as HRP at the World Health Organisation has launched a beta tool called ChathRP. It's a targeted AI assistant designed to bridge that gap between vast repositories of health data and the professionals who need it at the point of care. It focuses specifically on sexual and reproductive health and rights a domain where misinformation isn't just a nuisance, but a systemic challenge with documented human rights implications that are referenced in the World Health Organization's press release of this product. Unlike general purpose chatbots that generate responses based on massive uncurated corpuses of Internet data, chat, HRP uses a framework that we've covered previously called retrieval, augmented generation or rag. This method changes the relationship between the AI and information. Instead of the model remembering facts from its training, it's it acts more like a high speed librarian. When a user asks a question, the system searches a specific verified database, in this case the extensive knowledge base of the World Health Organization and specifically hrp. It then retrieves the relevant passages and uses the language model to synthesize a coherent answer from them. So this approach offers a strategic advantage in clinical settings. It creates a closed loop system where the AI is constrained to provide information only from highly trusted sources. This significantly reduces the risks of hallucinations, which are a primary barrier to the clinical adoption of generative AI in most settings. For policymakers, researchers, health workers, the promise is clear immediate access to gold standard evidence without the need to navigate multiple platforms or manually search through lengthy documentation buried within sub pages of complex websites. However, moving from a promising beta to a robust clinical tool requires navigating several technical hurdles. When we test the limits of these systems, we see that the current generation of RAG architectures requires refinement. For example, I put it to the test myself with quite a straightforward query. I saw that one of their suggested examples was to ask about diabetes management in pregnancy, so I decided to test it with a related but different query. I asked it about the use of anti seizure medication, specifically lamotrigine, during pregnancy. So specifically I asked I'm pregnant and I'm taking lamotrigine. What should I do? The AI responded with advice regarding hormonal contraceptives and how they might reduce lamotrigine levels. This information is technically accurate in a vacuum, but it failed to address the specific context of the question I asked about pregnancy management. For someone taking lamotrigine. This reveals something known as a proximity error in the embedding space. In simpler terms, when the AI searched its database it found no exact match for the specific clinical scenario that I was asking about the but it then pulled the next closest thing. Because the database contains extensive information on lamotrigine's interaction with contraceptives that perhaps lacks specific granular guidelines on drug level monitoring in pregnancy, the system prioritised giving a near match answer over a no match response. For a busy clinician receiving a nearby fact that doesn't actually answer the specific question is a potential friction point that could lead to a loss of trust. Furthermore, there's a challenge of conversational memory. In the same test, I followed up and clarified the initial error by saying no wasn't taking contraception. I've been trying. What next? The system responded by pivoting to a generic discussion of infertility, treating the new statement as an entirely isolated query. It failed to maintain the context of the previous message where I'd literally said that I'm pregnant. In clinical practice, history is everything. A patient's care is a continuous narrative, not a series of disconnected data points. For an AI tool to be truly effective in a medical context, it must be able to utilise a broader context window that persists across the conversation. It needs to understand that the user is still taking account of the same patient and the same medication. Despite these hurdles, the existence of Chat HRP is a very positive development. I think it represents a vital move towards sovereign or public interest AI. Currently, much of the innovation in large language models is driven by commercial entities with proprietary interests. Having the World Health Organization lead the development of a tool that prioritises evidence over engagement is a major win for global health. It ensures that the source of truth remains in the hands of the public health experts, rather than remaining passive and subjected to shifting algorithms of private corporations. The tool's focus on low bandwidth functionality and multilingual support is also an important triumph. It acknowledges that the greatest need for evidence based guidance often exists in settings with the fewest resources. By optimizing for these environments, the World Health Organization is democratizing access to high quality data in a way that traditional websites and PDF repositories can't necessarily. However, to move from a successful beta to a global standard that's trusted by clinicians and used, we need to look at the question of scale. The current limitation of this tool appears to be the size and diversity of its underlying data. While HRP and WHO guidelines are the gold standard, they're only a portion of the evidence base that a clinician uses daily. The true potential of this technology lies in a unified guideline engine. A system that integrates national international guidance across all medical specialties into a single RAG enabled interface. This is where strategic investment becomes important. Developing a system that can handle the nuances of clinical reasoning and massive data sets requires significant resources. Open evidence, a commercial enterprise do this very well currently and it's difficult to replicate that with the funding structures of the public sector. Maybe organizations like the Bill and Melinda Gates foundation could be uniquely positioned to fund this type of technology driven public interest work work. There's a clear opportunity for major philanthropic partnership to provide the capital needed to refine the rag architectures, expand the database and implement the conversational memory required for professional clinical use. But there might be perceived conflicts with commercial tech giants founders entering into the tech space. If the goal is to create a neutral evidence based layer that any health worker can trust. The technology does already exist. But what's now needed is the integration of more comprehensive data sets and a more sophisticated handling of clinical context. What CHATHRP shows us is that public sector organizations are capable of building sophisticated AI tools that directly address the needs of the global health community. It demonstrates that we can create systems that prioritize accuracy and evidence over generalized patterns of the open Internet. But it's really hard to get a fully performant tool. And small issues could quite quickly erode trust in their everyday use. So Chat HRP is a successful proof of concept for a new era of medical information. But its current database has gaps that can lead to irrelevant responses in rarer scenarios. It moves us away from the noise of misinformation and towards a future where the world's best medical knowledge could be available to everyone, everywhere in real time. It's a significant first step forward that I think should be encouraged. And the further work possible from this foundation could fundamentally change how we practice medicine on a global scale.
