Lumen Health · Healthcare

A clinician-facing AI assistant that survived clinical review

Built a clinical AI assistant that passed regulatory review and earned clinician trust.

Client: Lumen Health
Industry: Healthcare
Duration: 6 months
Team size: 4 engineers + 1 ML specialist
Outcome: 32% time saved

The challenge

Lumen Health operates a network of mid-sized clinics. Their physicians spend on average 14 minutes per patient encounter, of which 8 minutes is spent on documentation. Lumen's product team had a thesis: an AI assistant that listened to the encounter (with consent) and produced a structured note for the physician to review and edit could meaningfully reduce that documentation overhead.

The challenge wasn't building a transcription-and-summarization pipeline — those are increasingly straightforward. The challenge was producing notes that clinicians actually trusted, that survived regulatory review, and that didn't introduce new risks (most importantly: hallucinated medical content).

Our approach

We started with the regulatory and clinical review process, not the model. The Lumen compliance team had three blocking concerns: any clinical content the AI produced had to be traceable to the source audio, hallucinations had to be detectable, and the system had to fail gracefully when the audio was unclear or contained protected information.

We designed the system around those concerns. The notes the AI produces aren't a single generated paragraph — they're a structured document where every clinical claim is anchored to a timestamp range in the audio. A clinician reviewing a note can click on a finding and play back the seconds of audio that produced it. Hallucinations become discoverable: if a finding has no anchor, it's flagged.

The model architecture is, deliberately, the boring part. We use an off-the-shelf transcription provider, Anthropic's Claude for the structured note generation, and a thin layer of evals we wrote in-house to score notes against a reference set produced by Lumen's senior physicians. The scoring isn't fancy — it's an LLM-as-judge approach with a custom rubric — but it's measurable, and we re-run it on every model or prompt change.

What we shipped

A web and tablet application clinicians use during patient encounters, with consent capture, audio recording, and a clean interface for reviewing and editing the generated note.
A note generation pipeline that produces structured, anchored notes with claim-level traceability.
An evaluation harness with a reference set of 240 anonymized encounters, run automatically on every change to the model or prompt configuration.
A monitoring layer that tracks note quality scores, edit rates, and the rate of "hallucination" findings flagged by the system or by clinicians.
Documentation that the Lumen compliance team used in their submission to the relevant regulatory body — and that survived review.

What we deliberately didn't ship

We didn't fine-tune a custom model. We could have. The team was tempted to. The trade-off — owning a model rotation, maintaining a serving infrastructure, validating each new version through clinical review — wasn't worth the marginal quality gain we measured in our experiments.

We didn't build a fully automated workflow. The clinician always reviews and edits the note before it's saved to the patient record. Removing that step would have changed the regulatory category of the product and the level of liability Lumen would have to absorb. The product is physician-augmenting, not physician-replacing, and that distinction is structural to how it was built.

The outcome

Twelve weeks after launch to the first cohort of 80 physicians, average documentation time per encounter dropped from 8 minutes to 5.4 minutes — a 32% reduction. Edit distance between the AI's draft note and the clinician's final note averages 17%, which is comparable to the inter-clinician edit rate measured in studies of physician documentation variability.

Clinician adoption rate after the first month is 89%. The median clinician's verbatim feedback, summarized: "It's not magic, it's just less paperwork." That sentence is on a poster in the Lumen office.

Services

AI/ML IntegrationWeb & Mobile Applications

Technologies

AnthropicTypeScriptReactPostgreSQLpgvectorAWS

"The level of architectural rigor was the difference. They challenged our assumptions on day one and we shipped a platform that's held up through 18x traffic growth."
David Chen · CTO, Lumen Health

Next case study

Rebuilding logistics dispatch for a 50x scale jump

Read it