THE WORK
Research
Five themes that organize what the lab publishes. Each theme is anchored to peer-reviewed and preprinted work — the showcase is published-only.
FOUNDATION MODELS FOR MEDICINE
Foundation models for medicine
We build foundation models for medicine that span imaging, video, biology, and clinical reasoning. One model that can read a chest x-ray and answer a clinical question. One model for DNA, RNA, proteins, and cellular systems. The next generation of medical AI is not built one task at a time — it's built as systems that generalize across the work of medicine.
CLINICAL ENVIRONMENT SIMULATION & EVALUATION
Clinical environment simulation & evaluation
Static benchmarks don't predict how clinical AI behaves in the real world. We build simulated environments — emergency departments, patient trajectories, longitudinal decisions — where agents and models are stress-tested before they touch patients. Evaluation is itself a research problem; we treat it that way.
-
NATURE MEDICINE · 2026
A clinical environment simulator for dynamic AI evaluation↗
-
HEALING 2026 · 2026
Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?↗
-
HEALING 2026 · 2026
The Doctor Will Agree With You Now: Sycophancy of Large Language Models in Multi-Turn Medical Conversations↗
-
NATURE MEDICINE · 2025
An evaluation framework for clinical use of large language models in patient interaction tasks↗
PROCEDURAL UNDERSTANDING
Procedural understanding
Medicine isn't only images and text. We work on procedural understanding: ultrasound as a procedure, endoscopy as a sequence, surgery as collaboration. Models that recognize anatomy, instruments, image quality, task state, and procedural phase — the foundation for AI that can assist clinicians during procedures, not just after them.
CLINICAL APPLICATIONS AT SCALE
Clinical applications at scale
Foundation models meet clinical deployment in published, validated applications: opportunistic cardiovascular risk from routine head CTs, continual learning across dozens of hospitals for endotracheal tube placement, retrospective validation against clinical outcomes. The frontier of clinical AI is measured at the bedside.
OPEN INFRASTRUCTURE
Open infrastructure
We ship infrastructure that the medical AI community uses: ReXrank as the public leaderboard for radiology report generation, RadGraph as the entity-relation extraction standard, CheXzero for zero-shot classification. Open benchmarks, open code, open evaluation. The field moves faster when the tools are shared.