Haske Labs logo

01 Introduction

Prior authorization is one of the most hated processes in American healthcare. A physician orders a procedure. The payer requires approval before it can proceed. A clinical reviewer — often a nurse, sometimes a physician — manually evaluates whether the procedure is medically necessary for this patient.

The numbers are staggering. 53 million prior auth requests per year in the US. Each one costs roughly $11 in administrative overhead. Average turnaround: 48 hours. Some take weeks. Patients wait. Physicians burn out on paperwork. Payers hire armies of reviewers.

Here's the thing: not all of these requests are hard. A routine knee MRI for a patient with documented knee pain and an orthopedic referral is straightforward. A complex cardiac surgery for a patient with 12 comorbidities is genuinely difficult. They require fundamentally different levels of scrutiny — but they go through the same manual queue.

The bottleneck isn't medical complexity. It's that the system can't tell the easy cases from the hard ones.

This is a confidence problem. And it's the same confidence problem we've solved in e-commerce and ad personalization: detect the pattern, measure how confident you are, route based on that confidence.

Our prior authorization triage system does exactly this:

Detect the care pathway from diagnosis codes, procedure codes, and clinical context
Measure confidence in that detection using a 4-signal scoring system
Route accordingly: auto-approve the clear cases, send ambiguous cases to nurse review, escalate complex cases to physician review
Explain every decision with the specific clinical evidence that drove it

Validated on 10,000 real hospital encounters from MIMIC-IV (Beth Israel Deaconess Medical Center), the system auto-approves 38.3% of encounters at high confidence — with zero violations in the confidence-accuracy relationship across the entire threshold range.

02 Why Prior Auth Is Stuck

The Scale of the Problem

The American Medical Association's 2023 survey found that 88% of physicians describe the burden of prior authorization as "high" or "extremely high." One in three physicians report that PA has led to a serious adverse event for a patient in their care.

The operational numbers:

Metric	Value
Annual PA requests (US)	~53 million
Average cost per manual review	$11.00
Physician review cost	$25.00
Average turnaround time	48 hours
Requests requiring physician-level review	~15-20%
Requests that are ultimately approved	~85%

That last number is the key insight: 85% of prior auth requests are eventually approved. The manual review process exists to catch the 15% that shouldn't be — but it subjects the other 85% to the same delay and cost.

Why Existing Automation Fails

Most PA automation attempts fall into two categories:

Rules-based systems use hard-coded clinical criteria (e.g., "approve knee MRI if diagnosis includes knee pain AND patient is over 40 AND referral is from orthopedics"). These are brittle: they either approve too little (overly strict rules leave most cases for manual review) or too much (loose rules approve cases that shouldn't be). They can't handle the long tail of clinical variation — the patient with 8 diagnosis codes who doesn't fit any single rule cleanly.

Black-box ML models train a classifier to predict "approve" or "deny" directly. These face an adoption problem: clinicians and regulators won't trust a model that says "deny" without explaining why. And they face a safety problem: a model trained on historical approve/deny data inherits every bias in that data — including denials that were later overturned on appeal.

Our Approach: Don't Predict the Decision. Triage the Review.

We reframe the problem. Instead of trying to predict whether a request should be approved or denied, we ask: how much human review does this request need?

A routine request with clear clinical evidence and a well-matched pathway needs minimal review — auto-approve it.
An ambiguous request with mixed signals needs a nurse reviewer to check the clinical fit.
A complex request with high-acuity codes, multiple comorbidities, or conflicting evidence needs a physician to evaluate it.

This reframing has three advantages:

It's safer. We're not making the clinical decision — we're routing it to the right level of review. The worst case is that a case that could have been auto-approved goes to nurse review. That's a cost problem, not a safety problem.
It's explainable. Every routing decision comes with the specific clinical features that drove it — which pathway was detected, what evidence supported it, and why the confidence level triggered this particular route.
It aligns with regulation. CMS-0057-F (effective January 2027) requires payers to provide transparent, timely PA decisions with appeal paths. Our system generates CMS-compliant determination explanations for every request.

03 How the Pipeline Works

The system processes a prior authorization request through six stages:

PA Request (diagnosis codes, procedure codes, demographics, provider info)
         |
    [ 1. Signal Collection ]     Gather all clinical signals from the request
         |
    [ 2. Pathway Detection ]     Map codes to a care pathway (e.g., Surgical, Cardiac, Chronic)
         |
    [ 3. Confidence Scoring ]    How confident are we in this pathway assignment?
         |
    [ 4. Safety Checks ]         Is this a protected case that always needs human review?
         |
    [ 5. Routing Decision ]      Auto-approve / Nurse review / Physician review
         |
    [ 6. Explanation ]           Generate audit-ready determination letter

Stage 1: Signal Collection

The system extracts structured signals from the PA request. These are the same kinds of fields that appear on a standard PA form:

Clinical signals:

Diagnosis codes (ICD-10): What's wrong with the patient?
Procedure codes (CPT): What's being requested?
Comorbidities: What other conditions does the patient have?
Medications: Is there polypharmacy or drug interaction risk?

Context signals:

Service type: Inpatient, outpatient, imaging, pharmacy?
Urgency: Standard, urgent, emergent?
Provider specialty: Does the requesting provider's specialty match the procedure?
Prior history: Previous PA requests, prior procedures, prior denials?

Documentation signals:

Are clinical notes attached?
How many supporting documents?
Is there a provider letter of medical necessity?

The system doesn't need all of these for every request. The more signals available, the higher the confidence. A request with 3 diagnosis codes, a procedure code, and clinical notes gives a much stronger signal than a request with just a procedure code.

Stage 2: Pathway Detection

This is the core clinical intelligence step. The system maps the submitted codes to one of 12 latent care pathways — clusters of clinically related cases discovered automatically from historical encounter data.

How pathway discovery works:

The system trains on a large dataset of clinical encounters (10,000 from MIMIC-IV in our validation). Each encounter has a set of diagnosis codes. The system uses Non-negative Matrix Factorization (NMF) — essentially a pattern-finding algorithm — to discover groups of codes that frequently co-occur. These groups become the pathways.

For example, the algorithm might discover:

Pathway 3 (Cardiac): Codes for heart failure, atrial fibrillation, hypertension, and coronary artery disease cluster together
Pathway 7 (Surgical): Codes for joint disorders, fractures, and procedural complications cluster together
Pathway 11 (Respiratory): Codes for pneumonia, COPD, and respiratory failure cluster together

These pathways aren't hand-coded. They emerge from the data — which means they reflect the actual clinical patterns in the payer's population, not a committee's assumptions about how medicine should be organized.

For a new PA request, the system looks up the submitted codes in the learned pathway patterns and computes a probability distribution: "This request is 72% likely to be Cardiac, 15% Surgical, 13% Other."

On MIMIC-IV data, pathway detection achieves 5.0x accuracy over random baseline — meaning it correctly identifies the pathway 5 times more often than chance. This is the foundation that everything else builds on.

Stage 3: Confidence Scoring

Pathway detection alone isn't enough. The system also needs to know how confident it is in that detection. A request with 6 matching diagnosis codes and a clear procedure is very different from a request with 1 ambiguous code.

The confidence score combines four signals:

1. Margin — How much does the top pathway dominate? If the top pathway has 85% probability and the second has 8%, the margin is large — the system is decisive. If two pathways are neck-and-neck (45% vs 38%), the margin is small — the system is uncertain.

2. Peakedness — How concentrated is the overall distribution? A sharp peak (one pathway dominates) means high confidence. A flat distribution (probability spread across many pathways) means the system can't decide.

3. Evidence volume — How many clinical signals actually fired? A request with 8 diagnosis codes, procedure codes, and clinical notes provides abundant evidence. A request with a single code provides very little.

4. Feature agreement — Do the clinical signals agree with each other? If the diagnosis codes, the procedure type, the provider specialty, and the urgency level all point to the same pathway, agreement is high. If they point in different directions, something unusual is going on.

Tier assignment: A session scores HIGH if at least 3 of 4 signals pass their thresholds, MEDIUM if 2 pass, LOW if fewer than 2 pass.

On MIMIC-IV: 71.6% of encounters score HIGH confidence. This is dramatically higher than the 10.8% on synthetic data (SynPUF) — real clinical data is much richer in diagnostic signals than synthetic approximations.

Stage 4: Safety Checks

Before routing, the system checks for protected cases that should always go to human review regardless of confidence:

High-acuity codes: Critical care, high-severity emergency, complex procedures — these are flagged for physician review even if the pathway is clear
Emergent requests: Time-sensitive cases bypass the auto-approve path
Low clinical necessity: If the evidence for medical necessity is weak (score < 0.60), a human should evaluate
Complex cases: Requests with more than 8 total codes are clinically complex enough to warrant review

These are hard safety rails. No amount of confidence in the pathway detection overrides them.

Stage 5: Routing Decision

With confidence tier and safety checks in hand, the system routes each request:

Tier	Safety Check	Route	What Happens
HIGH	Passes	Auto-approve	Approved instantly, no human review
HIGH	Fails (protected case)	Nurse review	Human reviews despite high confidence
MEDIUM	Any	Nurse review	Nurse evaluates with system's pathway analysis as context
LOW	Any	Physician review	Full clinical evaluation by a physician reviewer

Auto-approve criteria (all must be met):

HIGH confidence tier
Top pathway probability ≥ 75%
Clinical necessity score ≥ 0.60
Not a protected case

Gold card override: Providers with ≥90% historical approval rate across ≥20 prior requests get automatic approval — the system recognizes that their requests are almost always appropriate. This aligns with "gold carding" legislation now active in several US states.

On MIMIC-IV: 38.3% auto-approve, 52.1% nurse review, 9.7% physician review. That means nearly 4 in 10 encounters could be processed without any human reviewer. In a system handling 100,000 PA requests per month, that's 38,300 requests that take seconds instead of hours.

Stage 6: Explanation Generation

Every routing decision produces a structured explanation:

Care pathway: Surgical (confidence: HIGH, probability: 78.5%)

Key clinical indicators:
  - urgency_urgent (weight: 0.50)
  - specialty_orthopedics (weight: 0.80)
  - px_group_surgery (weight: 0.90)

Clinical necessity score: 0.85

Decision: Auto-approved
Reason: Pathway alignment 78.5% exceeds threshold 75%
        with HIGH confidence. No protected case flags.

This isn't a black box. Every decision shows:

Which pathway was detected and with what probability
The top 3 clinical features that drove the detection, with their evidence weights
The clinical necessity score
The specific reason for the routing decision, including which thresholds were met or missed

This level of transparency is critical for regulatory compliance (CMS-0057-F requires it), for clinician trust (reviewers can see why the system made a recommendation), and for appeals (patients and providers can understand and challenge the logic).

04 Results: Real Patient Data

We validated on two clinical datasets: CMS SynPUF (synthetic Medicare claims) and MIMIC-IV (real de-identified hospital records from Beth Israel Deaconess Medical Center). The real data is dramatically richer and tells a much more convincing story.

MIMIC-IV: 10,000 Real Hospital Encounters

MIMIC-IV is a publicly available dataset of de-identified clinical records from a major teaching hospital. It contains real diagnosis codes, procedure codes, demographics, and outcomes. Our validation used 10,000 hospitalized encounters spanning 3,461 unique ICD-10 diagnosis codes.

•Confidence Tier Distribution

Tier	Encounters	Percentage	Mean Pathway Probability
HIGH	7,160	71.6%	0.766
MEDIUM	1,872	18.7%	0.437
LOW	968	9.7%	0.329

71.6% of encounters have HIGH confidence. Real clinical data — with multiple diagnosis codes, procedure codes, and rich clinical context — gives the system far more evidence to work with than synthetic approximations.

•Routing Distribution

Route	Encounters	Percentage
Auto-approve	3,830	38.3%
Nurse review	5,202	52.1%
Physician review	968	9.7%

38.3% auto-approve rate. At scale, this means massive workload reduction. For a payer processing 100,000 requests per month:

38,300 requests handled automatically (seconds, not hours)
52,100 requests go to nurse review with the system's pathway analysis as context (faster review)
9,700 requests escalated to physician review (the genuinely complex cases)

•Pathway Detection Accuracy

The system correctly identifies the care pathway 5.0x more often than random (41.78% vs 8.33% random baseline with 12 pathways). This doesn't sound high in absolute terms, but pathway detection isn't a classification task — it's a soft assignment. The system produces a probability distribution, and the confidence in that distribution is what drives routing.

•Evidence Improvement

This is one of the strongest findings. The system starts with a "prior" estimate based purely on the submitted codes, then refines it using the clinical evidence model (2,604 features, 13,016 learned feature-to-pathway connections):

Stage	Mean Top Pathway Probability
Prior (codes only)	0.424
Posterior (codes + evidence)	0.662
Improvement	+56%

The evidence model — which incorporates provider specialty, urgency, service type, comorbidity burden, documentation quality, and diagnosis group features — substantially sharpens the pathway assignment. This is the step that transforms raw codes into a clinically informed pathway confidence.

•Confidence-Accuracy Relationship

The critical question: does higher confidence actually mean better pathway assignments? We tested this using the Confidence Gate Theorem's C1 and C2 conditions.

C1 (rank correlation): Spearman rho = 0.349 (p = 5.2 × 10⁻²⁸⁴). Highly significant — confidence correctly rank-orders pathway accuracy.

C2 (no reversals): Zero reversals across 5 confidence zones:

Zone	Confidence Range	Encounters	Mean Accuracy
0	0.12 – 0.30	5,561	0.231
1	0.30 – 0.47	2,913	0.359
2	0.47 – 0.65	869	0.648
3	0.65 – 0.82	424	0.861
4	0.82 – 1.00	233	0.939

Accuracy rises cleanly from 23% to 94% as confidence increases. No reversals. No danger zones. The confidence signal is trustworthy across the entire range.

At the highest confidence zone (0.82–1.00), the system is correct 94% of the time — approaching the reliability needed for fully automated clinical routing.

•Calibration

Expected Calibration Error (ECE) = 0.032 — meaning the system's confidence scores closely match actual accuracy. When the system says "I'm 80% confident," it's right about 80% of the time. This matters for clinical adoption: clinicians need to trust that the confidence numbers mean what they say.

SynPUF: Synthetic Medicare Claims (Comparison)

CMS SynPUF is a synthetic dataset designed to mimic Medicare claims data. It's useful for development but dramatically sparser than real data.

Metric	SynPUF	MIMIC-IV	Why the Difference
Unique codes	284	3,461	Real data is 12x more clinically diverse
Evidence features	46	2,604	Real data supports 57x more clinical signals
HIGH tier	10.8%	71.6%	Sparse synthetic data → low confidence
Auto-approve	3.4%	38.3%	Too few HIGH-confidence cases to auto-approve
NMF lift	4.1x	5.0x	More signal → better pathway detection
Evidence improvement	+4.3%	+56%	Real clinical context is dramatically more informative

The lesson: Synthetic data systematically understates the system's capability. Real clinical data has far richer diagnostic codes, more comorbidities, more documentation — all of which feed the evidence model. SynPUF shows the system works in principle; MIMIC-IV shows it works in practice.

What the System Gets Right — and What It Gets Wrong

•Validated claims (5 of 6 pass on MIMIC-IV):

Pathways exist in clinical data. NMF discovers meaningful care pathway clusters. 5.0x accuracy lift over random. ✓
Evidence improves confidence. The evidence model improves pathway assignment by 56% over codes alone. ✓
Confidence tiers predict complexity. HIGH-tier cases have 2.3x higher pathway probability than LOW-tier. Strictly ordered (HIGH > MEDIUM > LOW). ✓
Gated routing beats ungated. ✗ Fails on MIMIC-IV. See below.
Monotonic abstention. Accuracy increases monotonically as the confidence threshold increases. Zero violations across 20 bins. ✓
Explainability. Every decision has a structured explanation with top evidence features and routing rationale. ✓

•The Claim 4 failure — and why it's actually fine

On MIMIC-IV, gated routing produces the same average quality as ungated routing (0.9055 vs 0.9055). The confidence gate adds no quality improvement.

Why? Because 71.6% of encounters are already HIGH confidence. When most cases are already confident, the gate has very little to filter. This is a ceiling effect — the data is so rich that nearly everything passes the confidence bar.

This is actually good news for deployment. It means that on real clinical data:

The system is confident about most cases (71.6% HIGH)
The cases it's not confident about (9.7% LOW) are genuinely complex — they belong in physician review
The gate's value isn't in improving average quality; it's in identifying the 9.7% that need the most expensive review and protecting them from automation

•The discharge concordance finding — and what it means

We tested whether confidence tiers predict clinical outcomes (specifically, discharge disposition — whether patients go home vs. to a skilled nursing facility). They don't. HIGH-confidence encounters have lower home-discharge rates (35.1%) than LOW-confidence encounters (42.1%).

This makes sense. PA routing complexity and clinical severity are different things. A straightforward hip replacement (HIGH confidence, clear pathway) still requires significant post-operative care (low home-discharge probability). A simple outpatient medication review (LOW confidence, ambiguous pathway) often results in the patient going home.

The confidence score measures how well the system can identify the care pathway — not how sick the patient is. This distinction matters: the system is designed to triage administrative review, not clinical outcomes.

05 Regulatory Alignment: CMS-0057-F

The January 2027 Mandate

CMS-0057-F (Interoperability and Patient Access Final Rule) requires health plans to implement electronic prior authorization using FHIR APIs by January 1, 2027. This is not optional — it applies to Medicare Advantage, Medicaid, CHIP, and qualified health plan issuers.

Key requirements:

Electronic PA processing via FHIR APIs (no more fax-based workflows)
Decision transparency — payers must provide specific reasons for PA decisions
Timeliness — standard requests within 7 days, urgent within 72 hours
Reporting — payers must report approval rates, processing times, and appeal outcomes

Several states have also passed "gold carding" legislation that exempts high-performing providers from PA requirements entirely.

How the Pipeline Aligns

CMS-0057-F Requirement	Pipeline Implementation
Electronic processing via FHIR API	FastAPI endpoint (/v1/triage) with standard request/response models
Decision transparency	Per-decision explanation with top 3 evidence features + routing rationale
Specific denial reasons	Pathway detection + confidence scoring provides structured clinical basis
Timely decisions	<100ms per request (vs 48-hour manual average)
Appeal pathway	All LOW-confidence and protected cases go to physician review by default
Gold carding	Built-in: providers with ≥90% approval rate + ≥20 PA history get auto-approval
Audit trail	Full pathway distribution + confidence scores + feature weights logged per request

The Gold Card Feature

Gold carding is increasingly mandated by state law. Texas (HB 3459), Louisiana, and West Virginia have all passed legislation requiring payers to exempt providers with high approval rates from PA requirements.

The pipeline implements this directly:

Gold card eligibility:
  - Provider historical approval rate ≥ 90%
  - At least 20 prior PA requests (sufficient track record)
  → Result: Auto-approve regardless of confidence tier

This means high-volume, high-quality providers see instant approvals — which is exactly the regulatory intent. The system tracks provider performance automatically and applies gold card status at the request level.

Determination Letter Format

Every decision generates a structured explanation suitable for CMS-compliant determination letters:

Care pathway: Cardiac (confidence: HIGH, probability: 82.3%)

Key clinical indicators:
  - dx_group_circulatory (weight: 0.92)
  - comorbidity_4280 [heart failure] (weight: 0.78)
  - service_inpatient (weight: 0.65)

Clinical necessity score: 0.88

Decision: Auto-approved
Reason: Pathway alignment 82.3% exceeds threshold 75%
        with HIGH confidence tier.
        Clinical necessity score 0.88 exceeds minimum 0.60.
        No protected case flags triggered.

For denied or escalated cases, the same format explains why the case was routed to review — which specific signals were missing, ambiguous, or flagged. This gives the reviewing clinician a structured starting point rather than a blank chart.

What's Not Implemented Yet

The pipeline provides the routing and explanation infrastructure but does not yet include:

InterQual/MCG guideline integration — the clinical necessity scorer uses a heuristic baseline, not commercial clinical criteria databases. A production system would need to integrate with these industry-standard guidelines.
Real-time claims adjudication — the system routes requests but doesn't connect to payer adjudication systems.
Member benefit verification — the system evaluates clinical appropriateness, not whether the member's plan covers the requested service.
Real PA outcome validation — all validation uses pathway accuracy as a proxy, not actual approve/deny concordance with human reviewers.

These are integration tasks, not algorithmic gaps. The core confidence-gated routing logic is validated; the surrounding infrastructure needs payer-specific integration.

06 Why This Architecture Works Across Domains

The Same Pattern, Three Times

The prior auth pipeline uses the same three-component architecture as our e-commerce and ad personalization systems:

Component	E-Commerce	Ads (Cookieless)	Prior Auth
Pattern detection	Shopping intent from clicks and cart events	Session intent from behavioral signals	Care pathway from diagnosis and procedure codes
Confidence gate	Is the intent signal clear enough to personalize?	Should we trust this session's signal?	Is the pathway assignment confident enough to auto-route?
Governed action	Rerank products by intent	Steer ads by intent with budget control	Route to auto-approve / nurse / physician review

The domain changes. The signals change. The architecture doesn't.

Why Confidence Gating Generalizes

The Confidence Gate Theorem (validated in the CGT paper) explains why this works:

Structural uncertainty is actionable. When the system lacks data (a new user with 2 clicks, a PA request with 1 ambiguous code), the confidence score correctly identifies the uncertainty, and gating reliably improves outcomes.

The clinical domain is primarily structural. On MIMIC-IV, 79% of the explained variance in confidence scores comes from structural features (number of codes, code frequency, clinical coverage) rather than contextual features (demographics, admission type). This means the confidence signal is measuring what it should — data adequacy, not confounding context.

Zero monotonicity violations on clinical data. Across 20 confidence threshold bins on MIMIC-IV, accuracy increases monotonically. Every threshold you try gives equal or better results. This is the cleanest validation across all three domains.

The Evidence Model Difference

What separates this from a simple code-lookup is the evidence model. On MIMIC-IV:

Without evidence model (codes only): mean top pathway probability = 0.424
With evidence model (codes + 2,604 clinical features): mean top pathway probability = 0.662
Improvement: +56%

The evidence model learns which clinical signals are predictive of which pathways. Provider specialty, urgency level, comorbidity burden, documentation quality — these all carry information about which pathway a case belongs to. The model learns these associations from data rather than encoding them as rules.

On SynPUF (synthetic data with only 46 features), this improvement was marginal (+4.3%). On MIMIC-IV (real data with 2,604 features), it's transformative. The evidence model needs rich data to work — and real clinical data is rich.

Getting Started

The pipeline is structured for integration:

API: FastAPI endpoint at /v1/triage accepts structured PA requests, returns routing decisions with explanations
Models: Pretrained pathway embeddings + evidence graph weights, loadable from standard artifact files
Configuration: Confidence thresholds, auto-approve criteria, and protected case rules are all configurable per payer
Tests: 151/151 tests pass, including end-to-end pipeline tests, confidence gate logic, and all 6 validation claims

What's needed for a payer pilot:

Access to historical claims data for pathway training (minimum ~10K encounters)
Integration with the payer's PA intake system (FHIR-compatible)
Clinical review of pathway labels and auto-approve thresholds
Parallel run: route requests through both the system and existing manual process, compare concordance

Resources

Paper: Confidence Gate Theorem (arXiv 2603.09947)
governed-rank: github.com/rdoku/governed-rank
Contact: ronald@haskelabs.com

07 Conclusion

Prior authorization is a $583 million annual administrative cost in the US — and most of it is spent reviewing cases that are ultimately approved. The problem isn't that review is unnecessary; it's that every case gets the same level of review regardless of complexity.

Our confidence-gated triage system addresses this by matching review intensity to case complexity:

On 10,000 real hospital encounters (MIMIC-IV):

38.3% auto-approved — clear pathway, high confidence, no safety flags → processed in milliseconds instead of hours
52.1% nurse review — some ambiguity, needs a human check but with the system's analysis as context → faster, more focused review
9.7% physician review — genuinely complex cases with low confidence or safety flags → full clinical evaluation, as it should be

The confidence signal is trustworthy:

71.6% of encounters achieve HIGH confidence (clear, decisive pathway signal)
Zero monotonicity violations — every confidence threshold produces equal or better accuracy
Calibration error of 0.032 — when the system says 80% confident, it's right about 80% of the time
The evidence model improves pathway detection by 56% over raw codes alone

The system is transparent:

Every decision produces a structured explanation citing specific clinical features and their evidence weights
Routing rationale includes the thresholds met or missed
Aligned with CMS-0057-F requirements for decision transparency and timely processing
Built-in gold carding for high-performing providers

Honest limitations:

All validation is on historical encounter data, not actual PA decisions. We measure pathway accuracy, not approve/deny concordance with human reviewers.
Claim 4 (gated beats ungated) fails on MIMIC-IV due to ceiling effect — the data is so rich that most cases are already HIGH confidence. This is good for deployment but means the gate's value is in catching the 9.7% that need escalation, not in improving average quality.
Clinical necessity scoring uses a heuristic baseline, not InterQual/MCG guidelines. Production deployment requires guideline integration.
No payer pilot yet. The next step is a parallel run with a real payer, comparing system routing against manual review decisions.

The architecture is the same one validated across e-commerce (4.9x conversion lift on RetailRocket), advertising (1.9x on Criteo), and now healthcare (5.0x pathway accuracy on MIMIC-IV). The confidence gate works because the dominant uncertainty in all three domains is structural — the system doesn't have enough data to be sure — and structural uncertainty is exactly what confidence scoring is designed to measure.

The prior auth bottleneck won't be solved by better rules or blacker boxes. It will be solved by a system that knows what it knows, knows what it doesn't, and routes accordingly.

The prior auth bottleneck is a confidence problem

Prior authorization delays care because every request goes through the same manual review — whether it's a routine knee MRI or a complex multi-organ transplant. Our system detects the care pathway, measures how confident it is, and routes accordingly: auto-approve the clear cases, escalate the ambiguous ones. Validated on 10,000 real hospital encounters from MIMIC-IV.

Read the Paper Contact

Confidence-Gated Prior Authorization Automating Healthcare Triage at the Pathway Level