Friday, February 20, 2026

Medical AI Models Face Context Gaps That Limit Clinical Use, Researchers Say

Published on

spot_img

Medical artificial intelligence systems are increasingly being studied, tested, and piloted across health care settings, with applications that include clinical documentation support, literature review, and diagnostic assistance. At the same time, researchers continue to examine why many medical AI models that perform well in research environments encounter challenges when applied in real-world clinical care.

New findings from researchers at Harvard Medical School suggest that one contributing factor is how medical AI systems account for clinical context.

In a paper published February 3 in Nature Medicine, a research team led by Marinka Zitnik, associate professor of biomedical informatics at Harvard Medical School, examines how “contextual errors” can arise when medical AI models are deployed in hospitals, clinics, and other care settings. The researchers describe contextual errors as outputs that may appear clinically appropriate in general terms but do not align with the specific circumstances in which care is delivered.

According to the research team, these limitations are not confined to a single type of model or clinical task but reflect a broader challenge in developing medical AI systems.

Performance in Testing Versus Clinical Settings

Medical AI models are commonly evaluated using benchmark datasets and standardized test cases designed to assess accuracy and consistency. Researchers note that while these evaluations are useful, they may not capture the variability of real-world health care environments.

In clinical practice, patient presentations, workflows, resource availability, and treatment options can differ widely across settings. The researchers report that AI systems trained without exposure to this variability may generate recommendations that are technically correct within a narrow framework but misaligned with clinical realities.

The study identifies contextual errors as a recurring issue that emerges when AI systems encounter factors not represented in their training data. These factors may include differences in medical specialty, geographic location, or social conditions affecting patient care.

Zitnik, speaking with Harvard Medicine News, said that important information required for clinical decision-making is often absent from datasets used to train medical AI models, contributing to this gap between testing and practice.

Specialty-Specific Context

Doctor and nurse reviewing electronic health record at patient bedside

One area highlighted by the researchers involves medical specialization. Patients frequently present with symptoms that involve multiple organ systems, particularly in emergency and acute care environments. In such cases, care may involve clinicians from several specialties, each applying a distinct clinical perspective.

The research team describes scenarios in which AI models trained primarily on data from a single specialty may mirror this narrow focus. As a result, such models may prioritize one organ system or fail to recognize that a combination of symptoms could indicate a multisystem condition.

Researchers note that this limitation may affect how AI systems interpret complex presentations, particularly when symptoms cross traditional specialty boundaries. They suggest that training models across multiple specialties and enabling them to adjust context dynamically could help address this issue.

Geographic Variation in Health Care

Geographic context is another factor identified as influencing the performance of medical AI models. Disease prevalence, clinical guidelines, regulatory approvals, and treatment availability vary across countries and regions.

The researchers report that an AI model producing identical recommendations across different geographic settings may not be accounting for these differences. For example, a treatment approach that is common or approved in one country may be unavailable or inappropriate in another.

Zitnik noted that geographic context can influence both clinical risk assessment and treatment planning, particularly for conditions that vary in prevalence or severity across regions. The research team describes ongoing efforts to explore ways of integrating geographic information into AI systems to support more location-specific outputs.

Social and Economic Context

Supportive doctor soothing a worried female patient, discussing test results in the emergency room. Compassionate physician supporting a stressed patient.

The study also examines how socioeconomic and cultural factors can influence patient care but remain largely absent from AI training data. While electronic health records capture clinical information such as diagnoses and laboratory values, they often do not include details related to transportation access, work obligations, caregiving responsibilities, or financial constraints.

Researchers describe situations in which patients do not follow through on referrals or treatment plans due to barriers that are not visible in clinical records. AI systems that rely solely on available health record data may generate recommendations that do not reflect these realities.

According to the research team, AI systems that fail to consider social and economic context may unintentionally reinforce existing disparities in access to care. They suggest that incorporating such information could enable models to offer alternatives that are more feasible for patients.

Research-Based Recommendations

Based on their findings, the researchers outline several approaches aimed at improving how medical AI systems handle context. These recommendations are intended for developers and researchers working on medical AI rather than for clinicians using these tools.

One approach involves expanding training datasets to include contextual information relevant to clinical decision-making. Another focuses on developing evaluation benchmarks that reflect real-world clinical scenarios rather than idealized testing conditions.

The researchers also emphasize the importance of model design. They suggest that AI systems should be structured in ways that allow them to recognize and adapt to different contexts, rather than treating clinical inputs as interchangeable across settings.

Zitnik said these steps could help identify limitations earlier in the development process, before AI systems are deployed in patient-care environments.

Trust and Transparency

In addition to technical performance, the study highlights trust as a key consideration in the adoption of medical AI. Researchers note that clinicians and patients may be reluctant to rely on systems that do not provide insight into how recommendations are generated.

The research team emphasizes the importance of transparency and interpretability, including the ability of AI systems to indicate uncertainty when appropriate. According to Zitnik, models that acknowledge limitations may be more likely to support safe use in clinical settings.

Human–AI Interaction

The study also addresses how medical AI systems interact with users. Many existing systems rely on one-way interactions, such as providing responses without seeking clarification or additional information.

Researchers argue that more effective human–AI collaboration would involve bidirectional communication, allowing AI systems to request information and tailor outputs based on user expertise. This approach, they suggest, could support more accurate and relevant recommendations.

Current Applications and Future Directions

Medical AI systems are already being used in certain areas of health care, particularly to support administrative tasks and research workflows. Examples cited by researchers include tools that assist with clinical documentation and systems that help identify relevant scientific literature.

Looking ahead, the research team describes potential applications for context-aware AI systems that adapt as patient care progresses. Such systems could support clinicians by shifting focus from symptom analysis to treatment evidence to medication considerations, depending on the stage of care.

Zitnik said that models capable of adjusting context could be particularly useful for patients with complex conditions or multiple medications that fall outside standard treatment pathways.

Responsible Development and Oversight

Researchers emphasize that medical AI is already part of health care delivery and that continued oversight is necessary to ensure safe and effective use. They highlight the importance of testing systems in real-world settings, monitoring performance after deployment, and developing guidelines for implementation.

Zitnik expressed optimism that collaborative efforts across the medical AI community could help identify and address challenges early in the development process.

Outlook

According to the researchers, addressing contextual limitations will be an important step in determining how widely medical AI systems can be adopted in clinical practice. While challenges remain, they describe opportunities for AI to support clinicians and improve care when developed with attention to real-world conditions.

The study concludes that continued research into context-aware medical AI may help bridge the gap between experimental success and practical clinical use.


Source: Harvard Medicine News; Zitnik M. et al., Nature Medicine, February 2026

Alice Benjamin
Alice Benjamin
Alice Benjamin, MSN, ACNS-BC, FNP-C is a board certified nurse practitioner & clinical nurse specialist, mom, health and wellness advocate affectionately known as America's favorite nurse. She is also the Chief Executive Officer & Publisher of the Nurse Approved Network.

Latest articles

What Is a SANE Nurse? The Pitt Highlights a Powerful Nursing Specialty

In The Pitt Season 2, Episode — Titled “1:00 P.M” — HBO Max Charge...

Nurse Entrepreneurship Isn’t a Trend. It’s the Future of Nursing

Nursing is undergoing a quiet revolution — and nurses are leading it. The traditional path...

Cardi B’s Viral Tour Moment Puts Medical Tourism Back in the Spotlight

Cardi B knows how to go viral, even backstage. During the Los Angeles show stop...

Medical Students Launch Bedside Arts Program for Pediatric Patients at University Hospital Newark

Medical students at Rutgers New Jersey Medical School have introduced a structured bedside arts...

More like this

What Is a SANE Nurse? The Pitt Highlights a Powerful Nursing Specialty

In The Pitt Season 2, Episode — Titled “1:00 P.M” — HBO Max Charge...

Cardi B’s Viral Tour Moment Puts Medical Tourism Back in the Spotlight

Cardi B knows how to go viral, even backstage. During the Los Angeles show stop...

Medical Students Launch Bedside Arts Program for Pediatric Patients at University Hospital Newark

Medical students at Rutgers New Jersey Medical School have introduced a structured bedside arts...