Introducing

Polaris 3.0 Safety Constellation Architecture

Our most advanced and safest model yet

New Features Built Upon Insights from Over 1.85M Real-World Patient Calls

The Real World Evidence (RWE) allowed us drive new features and get closer to product perfection

Hippocratic AI Releases Polaris 3.0: A 4.2 Trillion Parameter Suite of 22 LLMs, Enhancing Patient Safety and Experience By Leveraging Real World Evidence

In celebration of our two year milestone, we’re excited to announce the launch of Polaris 3.0, our most advanced and safest healthcare LLM constellation to date.

Polaris 3.0 represents a groundbreaking leap forward, featuring 4.2 trillion parameters across 22 specialized LLM models. This powerful constellation achieves a clinical accuracy rate of 99.38%, a notable improvement over Polaris 2.0’s 98.75% and Polaris 1.0’s 96.79%.

Polaris 3.0 was developed with extensive real-world feedback gathered directly from patients and their healthcare providers. This collaboration ensured that the models are not only highly accurate but also finely tuned to the nuanced, real-world interactions. As a result, Polaris 3.0 has driven improvements in patient engagement metrics, most notably increasing patient satisfaction scores from 8.72/10 with Polaris 2.0 to 8.95/10 with Polaris 3.0.

“Since the founding of Hippocratic AI, we partnered with health systems and clinicians to ensure our AI agents were safe and effective enough to use in patient-facing clinical operations. After hiring 6,234 US licensed clinicians to test our product with over 307,038 test calls, and incorporating the learnings of real world evidence from over 1.85 million patient calls made in Polaris 1.0 and 2.0, we have achieved a new milestone in safety unmatched by any other Generative AI healthcare agent.”

Munjal Shah, Co-founder and Chief Executive Officer.

Real World Evaluation of Large Language Models in Healthcare
(RWE-LLM ): A New Realm of AI Safety & Validation

Alongside the launch of Polaris 3.0, we published our innovative Real-World Evaluation of Large Language Models (RWE-LLM), a pioneering safety framework for generative AI for healthcare at scale. By sharing this the methodology, we aim to empower the broader healthcare and AI communities, enabling others to build upon this work and collectively advance the safety and effectiveness of AI in healthcare.

Read the full paper here

What’s New in Polaris 3.0?
Polaris 3.0 introduces numerous enhancements and innovative features developed from real-world observations across more than 1.85 million patient calls handled by Polaris 1.0 and 2.0. These improvements directly address practical challenges faced during actual patient interactions, further solidifying our commitment to patient safety.
Deep Thinking Models
Enhanced models that triple check labs, medications, and escalations. These new “offline” thinking capabilities make a significant contribution to removing the long tail errors occurring in prior Polaris versions.
Improved Clinical Documentation:
Models that ensure health forms including Health Risk Assessments (HRAs) and follow-up items, are documented accurately even when patients’ inputs are unclear. For example, Polaris 2.0 had a 90.5% HRA documentation accuracy. Polaris 3.0 is 98.5%.
Advanced Emotional Quotient:
Unique features like reading between the lines, multi-call memory, or suggestions for finishing the sentence for a patient if they cannot articulate quite what they are feeling, likeability, unique patient emotional adaptation, and appropriate assertiveness all helped to lift patient’s comfort in confiding with the AI agent from 88.93% in Polaris 1.0 to 94.60% with Polaris 3.0. Average call duration increased from 5.5 minutes to 9.5 minutes with the introduction of these new features showing stronger patient engagement.
New Dialer Features:
Successfully connecting with patients is required to complete patient objectives. Polaris 3.0 adds the ability to leave voicemails, pause to allow patients to complete blood pressure readings, resume calls if a call is dropped, send text messages, call back at a given time, passing all context to any human we escalate the call to (ANI), and making warm and cold call transfers.
Robust Audio Handling:

Polaris 3.0 significantly enhances speech recognition accuracy in challenging real-world phone calls by addressing background noise, unclear speech, and critical medical information:

  • Background Noise Engine: Improved speech isolation (9.3% with Polaris 2.0 → 2.3% with Polaris 3.0 error rate)
  • Speech Detector Engine: Focuses on primary speakers despite loud environments (15.0% with Polaris 2.0 → 2.4% with Polaris 3.0 error rate)
  • Single-Word Engine: Enhances single-word recognition accuracy (2.4% with Polaris 2.0→ 0.2% with Polaris 3.0 error rate).
  • Entity Transcription Engine: Precisely captures medications and numerical data (4.2% with Polaris 2.0 → 0.5%with Polaris 3.0 error rate).
  • Clarification Engine: Gracefully clarifies patient speech to reduce misunderstandings (16.3% with Polaris 2.0 → 2.0% with Polaris 3.0 error rate).
Multi-lingual Safety Equivalency for Spanish:
The Spanish version is now at an 99.83% accuracy of giving the right answer. Overall across nine non-English languages – Arabic, French, Hindi, Japanese, Korean, Mandarin, Portuguese, Russian, Spanish – the accuracy is 99.09%. The company has also added novel features like multi-lingual auto switch. The feature allows the AI agent to start speaking in Spanish if the patient does, even if Spanish is not listed as the patient’s primary language.
Orchestration Features:
Besides the actual patient call, the company has added many features needed by health systems, payors, or life sciences companies to ensure Hippocratic AI agent calls integrate with clinical workflows. These include: navigating IVRs of other providers, labs, or pharmacies; accurately quoting policy documents like explanation of benefits (Polaris 2.0 is 86.4%; Polaris 3.0 is 99.4% of the time); scheduling of complex appointment scenarios (error rate of Polaris 2.0 is 8%; Polaris 3.0 is 0.5%); and handling of adverse event reporting and ensuring no conversation of off-label use for pharmaceutical clients.
Deeper integrations with EMRs:
Polaris 3.0 has now been successfully integrated with health care systems of records such as Epic, Cerner, and Salesforce with the ability to integrate with other major and specialty systems Athenahealth, eClinicalWorks, Nextgen, Modernizing Medicine, Allscripts, Meditech and more.

Patient Engagement Results

These new features have significantly improved patient interactions, raising engagement metrics to all-time highs in Polaris 3.0. With enhanced accuracy and adaptability, our AI agents deliver more seamless and meaningful conversations.

Continuing Our Commitment to Excellence

Polaris 3.0 isn’t just another update—it’s the next critical step in our ongoing pursuit of delivering AI solutions that genuinely enhance healthcare experiences for patients and clinicians alike. As we look ahead, our dedication to creating specialized AI agents designed explicitly for the nuances and complexities of healthcare remains unwavering.

“Vertical AI agents require features that are unique to that specific environment and handle the long tail of issues. Our goal for Polaris is a level of product perfection to ensure that our products meet or exceed the rigorous requirements of real-world clinical and patient environments, and are not just a novel AI tool.” said Subho Mukherjee, Chief Scientist of Hippocratic AI. “While Polaris 3.0 release gets us much closer to that goal, it is one we will continue to relentlessly pursue.”

Subho Mukherjee, Co-founder and Chief Scientist of Hippocratic AI

Read More Information About Our Safety Frameworks