Clinical AI and Interoperability: Why the Standards Matter

Most conversations about AI in healthcare are asking the wrong question. The question is not whether to use AI. It is which type of AI, in which part of a clinical workflow, governed by what standards, and connected to what data infrastructure.

‍

Get that right and AI reduces provider burden, improves documentation quality, and helps data flow where it needs to go. Get it wrong and you have a system that hallucinates, produces inconsistent outputs, and creates liability no SOAP note can protect you from.

‍

On an episode of the Next Orbit podcast, Mike Lang, a 30-year healthcare technology veteran and Senior Vice President at BINGLI, joined Leap Orbit's Ben Wade and Mike Hunter for a conversation that moved past the hype and into the mechanics. What emerged was a practical framework for thinking about how AI, clinical data standards, and interoperability have to work together to produce outcomes worth having.

‍

AI in Healthcare Has Always Been a Standards Problem in Disguise

Before the current wave of large language models (LLMs), AI in healthcare wore different names: algorithms, predictive analytics, machine learning, artificial neural networks. Many of those tools have been embedded in clinical workflows for decades. What changed with generative AI is the surface area of potential misapplication, and with it, the urgency of getting the governance right.

‍

Mike Hunter, whose work as Leap Orbit's Federal Practice Director spans federal partners including the CDC, VA, HRSA, and SAMHSA, has written about AI's intersection with data and interoperability standards. His framing on the podcast captures the underlying tension:

‍

"The future is one that's based on standards which will evolve to meet our needs. It's based on using AI in a way that enables the people who have the wisdom to do the work to do the work effectively and efficiently."

‍

That framing puts standards at the center rather than the technology itself. It is a distinction that matters enormously in healthcare, where the difference between a useful tool and a dangerous one often comes down to whether the right governance structures are in place.

‍

Deterministic vs. Non-Deterministic AI: The Most Important Topic Nobody's Talking About

Here is the core problem with applying LLMs to clinical workflows: LLMs are not deterministic. Run the same inputs twice and you may get two different outputs. For most industries, that is a manageable tradeoff. In medicine, it is a patient safety issue.

‍

Lang articulated four properties that make deterministic AI the appropriate choice for regulated clinical settings:

Consistency. The same inputs produce the same outputs, every time. In clinical triage, deviation is not acceptable.
Auditability. Because a deterministic model follows the same logic path on every run, clinicians and administrators can trace exactly how a recommendation was reached. LLM outputs change in nuanced ways that are not reproducible or auditable in the same way. "If LLMs create hallucinations," Lang noted, "that means that the work it's doing to come out with the different outcomes are not really auditable because they change in each output."
Regulated workflows. Evidence-based clinical protocols, like the Schmitt-Thompson protocols used in nurse triage, are regulated and peer-reviewed. Deterministic models can enforce those protocols faithfully. LLMs cannot.
Scalability. Deterministic models scale reliably. As volume increases, consistent outputs become more important, not less.

‍

This does not mean LLMs have no role. They are well suited for the parts of a clinical workflow that do not require clinical certainty: converting "my tummy hurts" into "abdominal pain," surfacing relevant assessment forms based on what a patient describes, or generating a narrative report after a deterministic model has produced a clinical outcome. The key is knowing the boundaries.

‍

As Lang put it: "We use LLMs but we use it in non-clinical outcome-oriented workflows."

‍

The same principle Mike Hunter raised about ontology applies here. Clinical terminologies like SNOMED, ICD-10, and LOINC exist precisely because consistency in how data is labeled determines whether that data can be trusted, shared, and acted on downstream. AI that undermines that consistency at the point of capture creates problems that ripple through every downstream system.

‍

The Human-in-the-Loop Requirement Is Not a Limitation. It's the Standard.

A recurring theme in the conversation was what happens when AI removes clinical judgment from the loop rather than supporting it.

‍

Lang described a post from a physician whose patient's family member had looked up abnormal lab results, arrived at a frightening self-diagnosis via a consumer symptom checker, and called to report the conclusion. The physician had to unwind a belief that technology had helped create. "We end up with hypochondriacs. We end up with patients that believe that they're the physician and it causes further strain on our system."

‍

This is not an argument against patient-facing digital tools. It is an argument for designing them correctly, which means building systems where clinical experts remain in the loop as the final authority, not an optional step.

‍

Hunter connected this to a broader point about the role of human expertise in any data-driven system:

‍

"Human beings that understand what that ontology is in an organization are in essence the human standard that you need to build these other tools around, in addition to the standards themselves."

‍

The published algorithms and peer-reviewed clinical protocols that responsible AI vendors put in place are a direct extension of this principle. They make the logic visible, which is what clinicians and compliance officers need to trust a system.

‍

What Good AI-to-EHR Data Flow Actually Looks Like

One of the most practically relevant parts of the conversation was the technical discussion of how AI-generated clinical data needs to connect with existing health data infrastructure.

‍

The workflow Lang described produces structured clinical output in SNOMED, generates SOAP notes automatically, and can transmit that data to EHRs or other data environments via FHIR, HL7, custom API, or JSON. Another examples is RxConnections from Leap Orbit, which is a zero-click EHR integration for PDMP data. Incoming data from third-party systems, such as existing medications or prior diagnoses, can inform how the AI tailors its questions to the individual patient rather than asking everyone the same static intake questions.

‍

This is exactly the interoperability problem that FHIR was designed to address. When clinical data is generated in a structured, standardized format from the point of capture and transmitted via open standards, it flows into downstream systems cleanly. When it is generated as unstructured or proprietary output, it creates integration debt that compounds over time.

‍

The broader implication for any health system evaluating AI tools is this: the interoperability story should be part of your evaluation criteria from the start, not a consideration after you have already committed to a platform. An AI solution that produces clinically useful outputs but delivers them in a format that does not connect to your data environment has solved half the problem and left the harder half untouched.

‍

The Workforce Argument Makes This Urgent, Not Just Important

By 2036, HRSA's National Center for Health Workforce Analysis projects a combined shortage of more than one million healthcare workers across nursing, behavioral health, primary care, and other disciplines.. That gap cannot be closed through hiring alone. The clinical workforce problem is structural, and any meaningful response to it involves reducing the administrative burden that currently consumes a significant share of clinician time and attention.

‍

Clinician burnout is part of the same equation. A nationwide study of more than 43,000 healthcare workers found that nearly half met the criteria for burnout, with nurses reaching 56%. Burnout doubles the risk of adverse patient events. If reducing documentation burden and administrative friction demonstrably reduces burnout, the downstream effect on patient outcomes is not a soft benefit. It is a measurable one.

‍

This is the context in which AI-assisted clinical workflows and interoperability standards are not just technology initiatives. They are workforce sustainability strategies. A nurse who spends less time on documentation and more time listening to a patient is a better clinician and a less burned-out one. The data infrastructure that makes that possible, the standards, the integrations, the structured outputs that flow cleanly into downstream systems, is what makes the AI useful rather than merely impressive.

‍

"We have rising demand and we have stagnant supply," Lang said. "And we're going to continue to deal with this gap unless we leverage digital health, AI, etc. to fill in the gaps."

‍

What to Ask When Evaluating AI Solutions for Clinical Environments

For health system leaders and state agency technology officers evaluating AI solutions in clinical contexts, the current market requires adjusting some traditional evaluation criteria. The most impactful clinical AI is largely coming from earlier-stage companies, which means the vendor selection playbook that worked when you were buying from established enterprise vendors needs updating.

‍

The questions worth asking are less about customer count and more about the following:

Where does the AI modeling come from, and is it based on evidence-based clinical principles? Are the algorithms published and available for clinical review? Does the founding team have deep experience in the specific clinical workflows the technology is meant to support? And critically: how does the system's output connect to existing data environments and standards?

‍

Lang offered a useful heuristic: "I would want to make sure that it was a clinically-driven organization that was leveraging AI and technology to solve real-world problems based on deep experience in being in those workflows." Technology companies entering healthcare from adjacent industries may bring capable tools; they face a steeper barrier in understanding the actual pain points.

‍

Hunter added the standards dimension: the most durable AI implementations will be the ones built around open standards and designed for interoperability from the start, not retrofitted to connect to existing systems after the fact. "LLMs seem to provide a very nice interface between human beings and systems and if they're used that way, they can be very effective. But if you try to put them in the wrong place where they're going to hallucinate and do something strange... in healthcare it's not, you know, oh, we have to pay $300 million or whatever it was. It's someone's life might be lost."

‍

The Question Worth Asking Before You Buy

The right frame for evaluating AI in clinical and administrative healthcare workflows is not whether it uses AI. It is whether the right type of AI is applied in the right places, grounded in published clinical standards, connected to the data infrastructure your organization already runs, and designed with humans as the final authority on clinical decisions.

‍

That combination of AI governance, clinical standards alignment, and interoperability is where the real value is. And it is where the real risk is if any one piece is missing.

‍

Thinking About AI and Data Infrastructure for Your Organization?

Leap Orbit works with state agencies, health plans, and healthcare enterprises on the data sharing infrastructure that makes technologies like this work at scale. From FHIR-compliant integration architecture to provider data management and prescription monitoring systems, our team brings the technical depth and federal experience to help you build something that lasts.

Contact us to talk through where your organization is and where it needs to go.

‍

FAQs

What is the difference between deterministic AI and a large language model in clinical settings?

‍A deterministic model produces the same output every time the same inputs are provided, which is essential in regulated clinical workflows. Large language models generate responses that can vary between uses, making them better suited for tasks like natural language conversion or narrative summarization rather than clinical decision support.

‍

Why do clinical AI systems need to support FHIR and HL7?

‍Clinical AI that generates structured data needs to transmit that data to EHRs and other downstream systems. FHIR and HL7 are the open standards that make that transmission reliable and consistent across different platforms. Solutions that do not support these standards create integration debt and limit the reuse of the data they generate.

‍

What does "human-in-the-loop" mean in AI-assisted clinical care?

‍It means a qualified clinician remains the decision-maker in the encounter even when AI is handling documentation, intake, or triage support. AI reduces burden and surfaces information; clinical judgment stays with the human provider.

‍

Why does publishing AI algorithms matter in healthcare?

‍Clinicians are trained to rely on peer-reviewed, evidence-based medicine. Published algorithms allow clinical staff to evaluate the logic behind AI recommendations, build institutional trust, and meet the transparency expectations that regulated clinical environments require.

‍

How does administrative burden connect to patient outcomes?

‍Clinician burnout is strongly associated with adverse patient events. Documentation and administrative tasks are among the top contributors to burnout in clinical settings. AI that reduces that burden in a standards-compliant way has a measurable downstream effect on care quality, not just efficiency.

Back to the Blog

The AI Question Healthcare Leaders Aren't Asking

Team Member