01 Introduction
Prior authorization is one of the most hated processes in American healthcare. A physician orders a procedure. The payer requires approval before it can proceed. A clinical reviewer — often a nurse, sometimes a physician — manually evaluates whether the procedure is medically necessary for this patient.
The numbers are staggering. 53 million prior auth requests per year in the US. Each one costs roughly $11 in administrative overhead. Average turnaround: 48 hours. Some take weeks. Patients wait. Physicians burn out on paperwork. Payers hire armies of reviewers.
Here's the thing: not all of these requests are hard. A routine knee MRI for a patient with documented knee pain and an orthopedic referral is straightforward. A complex cardiac surgery for a patient with 12 comorbidities is genuinely difficult. They require fundamentally different levels of scrutiny — but they go through the same manual queue.
The bottleneck isn't medical complexity. It's that the system can't tell the easy cases from the hard ones.
This is a confidence problem. And it's the same confidence problem we've solved in e-commerce and ad personalization: detect the pattern, measure how confident you are, route based on that confidence.
Our prior authorization triage system does exactly this:
- Detect the care pathway from diagnosis codes, procedure codes, and clinical context
- Measure confidence in that detection using a 4-signal scoring system
- Route accordingly: auto-approve the clear cases, send ambiguous cases to nurse review, escalate complex cases to physician review
- Explain every decision with the specific clinical evidence that drove it
Validated on 10,000 real hospital encounters from MIMIC-IV (Beth Israel Deaconess Medical Center), the system auto-approves 38.3% of encounters at high confidence — with zero violations in the confidence-accuracy relationship across the entire threshold range.
02 Why Prior Auth Is Stuck
The Scale of the Problem
The American Medical Association's 2023 survey found that 88% of physicians describe the burden of prior authorization as "high" or "extremely high." One in three physicians report that PA has led to a serious adverse event for a patient in their care.
The operational numbers:
| Metric | Value |
|---|---|
| Annual PA requests (US) | ~53 million |
| Average cost per manual review | $11.00 |
| Physician review cost | $25.00 |
| Average turnaround time | 48 hours |
| Requests requiring physician-level review | ~15-20% |
| Requests that are ultimately approved | ~85% |
That last number is the key insight: 85% of prior auth requests are eventually approved. The manual review process exists to catch the 15% that shouldn't be — but it subjects the other 85% to the same delay and cost.
Why Existing Automation Fails
Most PA automation attempts fall into two categories:
Rules-based systems use hard-coded clinical criteria (e.g., "approve knee MRI if diagnosis includes knee pain AND patient is over 40 AND referral is from orthopedics"). These are brittle: they either approve too little (overly strict rules leave most cases for manual review) or too much (loose rules approve cases that shouldn't be). They can't handle the long tail of clinical variation — the patient with 8 diagnosis codes who doesn't fit any single rule cleanly.
Black-box ML models train a classifier to predict "approve" or "deny" directly. These face an adoption problem: clinicians and regulators won't trust a model that says "deny" without explaining why. And they face a safety problem: a model trained on historical approve/deny data inherits every bias in that data — including denials that were later overturned on appeal.
Our Approach: Don't Predict the Decision. Triage the Review.
We reframe the problem. Instead of trying to predict whether a request should be approved or denied, we ask: how much human review does this request need?
- A routine request with clear clinical evidence and a well-matched pathway needs minimal review — auto-approve it.
- An ambiguous request with mixed signals needs a nurse reviewer to check the clinical fit.
- A complex request with high-acuity codes, multiple comorbidities, or conflicting evidence needs a physician to evaluate it.
This reframing has three advantages:
- It's safer. We're not making the clinical decision — we're routing it to the right level of review. The worst case is that a case that could have been auto-approved goes to nurse review. That's a cost problem, not a safety problem.
- It's explainable. Every routing decision comes with the specific clinical features that drove it — which pathway was detected, what evidence supported it, and why the confidence level triggered this particular route.
- It aligns with regulation. CMS-0057-F (effective January 2027) requires payers to provide transparent, timely PA decisions with appeal paths. Our system generates CMS-compliant determination explanations for every request.
03 How the Pipeline Works
The system processes a prior authorization request through six stages:
PA Request (diagnosis codes, procedure codes, demographics, provider info)
|
[ 1. Signal Collection ] Gather all clinical signals from the request
|
[ 2. Pathway Detection ] Map codes to a care pathway (e.g., Surgical, Cardiac, Chronic)
|
[ 3. Confidence Scoring ] How confident are we in this pathway assignment?
|
[ 4. Safety Checks ] Is this a protected case that always needs human review?
|
[ 5. Routing Decision ] Auto-approve / Nurse review / Physician review
|
[ 6. Explanation ] Generate audit-ready determination letterStage 1: Signal Collection
The system extracts structured signals from the PA request. These are the same kinds of fields that appear on a standard PA form:
Clinical signals:
- Diagnosis codes (ICD-10): What's wrong with the patient?
- Procedure codes (CPT): What's being requested?
- Comorbidities: What other conditions does the patient have?
- Medications: Is there polypharmacy or drug interaction risk?
Context signals:
- Service type: Inpatient, outpatient, imaging, pharmacy?
- Urgency: Standard, urgent, emergent?
- Provider specialty: Does the requesting provider's specialty match the procedure?
- Prior history: Previous PA requests, prior procedures, prior denials?
Documentation signals:
- Are clinical notes attached?
- How many supporting documents?
- Is there a provider letter of medical necessity?
The system doesn't need all of these for every request. The more signals available, the higher the confidence. A request with 3 diagnosis codes, a procedure code, and clinical notes gives a much stronger signal than a request with just a procedure code.
Stage 2: Pathway Detection
This is the core clinical intelligence step. The system maps the submitted codes to one of 12 latent care pathways — clusters of clinically related cases discovered automatically from historical encounter data.
How pathway discovery works:
The system trains on a large dataset of clinical encounters (10,000 from MIMIC-IV in our validation). Each encounter has a set of diagnosis codes. The system uses Non-negative Matrix Factorization (NMF) — essentially a pattern-finding algorithm — to discover groups of codes that frequently co-occur. These groups become the pathways.
For example, the algorithm might discover:
- Pathway 3 (Cardiac): Codes for heart failure, atrial fibrillation, hypertension, and coronary artery disease cluster together
- Pathway 7 (Surgical): Codes for joint disorders, fractures, and procedural complications cluster together
- Pathway 11 (Respiratory): Codes for pneumonia, COPD, and respiratory failure cluster together
These pathways aren't hand-coded. They emerge from the data — which means they reflect the actual clinical patterns in the payer's population, not a committee's assumptions about how medicine should be organized.
For a new PA request, the system looks up the submitted codes in the learned pathway patterns and computes a probability distribution: "This request is 72% likely to be Cardiac, 15% Surgical, 13% Other."
On MIMIC-IV data, pathway detection achieves 5.0x accuracy over random baseline — meaning it correctly identifies the pathway 5 times more often than chance. This is the foundation that everything else builds on.
Stage 3: Confidence Scoring
Pathway detection alone isn't enough. The system also needs to know how confident it is in that detection. A request with 6 matching diagnosis codes and a clear procedure is very different from a request with 1 ambiguous code.
The confidence score combines four signals:
1. Margin — How much does the top pathway dominate? If the top pathway has 85% probability and the second has 8%, the margin is large — the system is decisive. If two pathways are neck-and-neck (45% vs 38%), the margin is small — the system is uncertain.
2. Peakedness — How concentrated is the overall distribution? A sharp peak (one pathway dominates) means high confidence. A flat distribution (probability spread across many pathways) means the system can't decide.
3. Evidence volume — How many clinical signals actually fired? A request with 8 diagnosis codes, procedure codes, and clinical notes provides abundant evidence. A request with a single code provides very little.
4. Feature agreement — Do the clinical signals agree with each other? If the diagnosis codes, the procedure type, the provider specialty, and the urgency level all point to the same pathway, agreement is high. If they point in different directions, something unusual is going on.
Tier assignment: A session scores HIGH if at least 3 of 4 signals pass their thresholds, MEDIUM if 2 pass, LOW if fewer than 2 pass.
On MIMIC-IV: 71.6% of encounters score HIGH confidence. This is dramatically higher than the 10.8% on synthetic data (SynPUF) — real clinical data is much richer in diagnostic signals than synthetic approximations.
Stage 4: Safety Checks
Before routing, the system checks for protected cases that should always go to human review regardless of confidence:
- High-acuity codes: Critical care, high-severity emergency, complex procedures — these are flagged for physician review even if the pathway is clear
- Emergent requests: Time-sensitive cases bypass the auto-approve path
- Low clinical necessity: If the evidence for medical necessity is weak (score < 0.60), a human should evaluate
- Complex cases: Requests with more than 8 total codes are clinically complex enough to warrant review
These are hard safety rails. No amount of confidence in the pathway detection overrides them.
Stage 5: Routing Decision
With confidence tier and safety checks in hand, the system routes each request:
| Tier | Safety Check | Route | What Happens |
|---|---|---|---|
| HIGH | Passes | Auto-approve | Approved instantly, no human review |
| HIGH | Fails (protected case) | Nurse review | Human reviews despite high confidence |
| MEDIUM | Any | Nurse review | Nurse evaluates with system's pathway analysis as context |
| LOW | Any | Physician review | Full clinical evaluation by a physician reviewer |
Auto-approve criteria (all must be met):
- HIGH confidence tier
- Top pathway probability ≥ 75%
- Clinical necessity score ≥ 0.60
- Not a protected case
Gold card override: Providers with ≥90% historical approval rate across ≥20 prior requests get automatic approval — the system recognizes that their requests are almost always appropriate. This aligns with "gold carding" legislation now active in several US states.
On MIMIC-IV: 38.3% auto-approve, 52.1% nurse review, 9.7% physician review. That means nearly 4 in 10 encounters could be processed without any human reviewer. In a system handling 100,000 PA requests per month, that's 38,300 requests that take seconds instead of hours.
Stage 6: Explanation Generation
Every routing decision produces a structured explanation:
Care pathway: Surgical (confidence: HIGH, probability: 78.5%)
Key clinical indicators:
- urgency_urgent (weight: 0.50)
- specialty_orthopedics (weight: 0.80)
- px_group_surgery (weight: 0.90)
Clinical necessity score: 0.85
Decision: Auto-approved
Reason: Pathway alignment 78.5% exceeds threshold 75%
with HIGH confidence. No protected case flags.This isn't a black box. Every decision shows:
- Which pathway was detected and with what probability
- The top 3 clinical features that drove the detection, with their evidence weights
- The clinical necessity score
- The specific reason for the routing decision, including which thresholds were met or missed
This level of transparency is critical for regulatory compliance (CMS-0057-F requires it), for clinician trust (reviewers can see why the system made a recommendation), and for appeals (patients and providers can understand and challenge the logic).
04 Results: Real Patient Data
We validated on two clinical datasets: CMS SynPUF (synthetic Medicare claims) and MIMIC-IV (real de-identified hospital records from Beth Israel Deaconess Medical Center). The real data is dramatically richer and tells a much more convincing story.
MIMIC-IV: 10,000 Real Hospital Encounters
MIMIC-IV is a publicly available dataset of de-identified clinical records from a major teaching hospital. It contains real diagnosis codes, procedure codes, demographics, and outcomes. Our validation used 10,000 hospitalized encounters spanning 3,461 unique ICD-10 diagnosis codes.
•Confidence Tier Distribution
| Tier | Encounters | Percentage | Mean Pathway Probability |
|---|---|---|---|
| HIGH | 7,160 | 71.6% | 0.766 |
| MEDIUM | 1,872 | 18.7% | 0.437 |
| LOW | 968 | 9.7% | 0.329 |
71.6% of encounters have HIGH confidence. Real clinical data — with multiple diagnosis codes, procedure codes, and rich clinical context — gives the system far more evidence to work with than synthetic approximations.
•Routing Distribution
| Route | Encounters | Percentage |
|---|---|---|
| Auto-approve | 3,830 | 38.3% |
| Nurse review | 5,202 | 52.1% |
| Physician review | 968 | 9.7% |
38.3% auto-approve rate. At scale, this means massive workload reduction. For a payer processing 100,000 requests per month:
- 38,300 requests handled automatically (seconds, not hours)
- 52,100 requests go to nurse review with the system's pathway analysis as context (faster review)
- 9,700 requests escalated to physician review (the genuinely complex cases)
•Pathway Detection Accuracy
The system correctly identifies the care pathway 5.0x more often than random (41.78% vs 8.33% random baseline with 12 pathways). This doesn't sound high in absolute terms, but pathway detection isn't a classification task — it's a soft assignment. The system produces a probability distribution, and the confidence in that distribution is what drives routing.
•Evidence Improvement
This is one of the strongest findings. The system starts with a "prior" estimate based purely on the submitted codes, then refines it using the clinical evidence model (2,604 features, 13,016 learned feature-to-pathway connections):
| Stage | Mean Top Pathway Probability |
|---|---|
| Prior (codes only) | 0.424 |
| Posterior (codes + evidence) | 0.662 |
| Improvement | +56% |
The evidence model — which incorporates provider specialty, urgency, service type, comorbidity burden, documentation quality, and diagnosis group features — substantially sharpens the pathway assignment. This is the step that transforms raw codes into a clinically informed pathway confidence.
•Confidence-Accuracy Relationship
The critical question: does higher confidence actually mean better pathway assignments? We tested this using the Confidence Gate Theorem's C1 and C2 conditions.
C1 (rank correlation): Spearman rho = 0.349 (p = 5.2 × 10⁻²⁸⁴). Highly significant — confidence correctly rank-orders pathway accuracy.
C2 (no reversals): Zero reversals across 5 confidence zones:
| Zone | Confidence Range | Encounters | Mean Accuracy |
|---|---|---|---|
| 0 | 0.12 – 0.30 | 5,561 | 0.231 |
| 1 | 0.30 – 0.47 | 2,913 | 0.359 |
| 2 | 0.47 – 0.65 | 869 | 0.648 |
| 3 | 0.65 – 0.82 | 424 | 0.861 |
| 4 | 0.82 – 1.00 | 233 | 0.939 |
Accuracy rises cleanly from 23% to 94% as confidence increases. No reversals. No danger zones. The confidence signal is trustworthy across the entire range.
At the highest confidence zone (0.82–1.00), the system is correct 94% of the time — approaching the reliability needed for fully automated clinical routing.
•Calibration
Expected Calibration Error (ECE) = 0.032 — meaning the system's confidence scores closely match actual accuracy. When the system says "I'm 80% confident," it's right about 80% of the time. This matters for clinical adoption: clinicians need to trust that the confidence numbers mean what they say.
SynPUF: Synthetic Medicare Claims (Comparison)
CMS SynPUF is a synthetic dataset designed to mimic Medicare claims data. It's useful for development but dramatically sparser than real data.
| Metric | SynPUF | MIMIC-IV | Why the Difference |
|---|---|---|---|
| Unique codes | 284 | 3,461 | Real data is 12x more clinically diverse |
| Evidence features | 46 | 2,604 | Real data supports 57x more clinical signals |
| HIGH tier | 10.8% | 71.6% | Sparse synthetic data → low confidence |
| Auto-approve | 3.4% | 38.3% | Too few HIGH-confidence cases to auto-approve |
| NMF lift | 4.1x | 5.0x | More signal → better pathway detection |
| Evidence improvement | +4.3% | +56% | Real clinical context is dramatically more informative |
The lesson: Synthetic data systematically understates the system's capability. Real clinical data has far richer diagnostic codes, more comorbidities, more documentation — all of which feed the evidence model. SynPUF shows the system works in principle; MIMIC-IV shows it works in practice.
What the System Gets Right — and What It Gets Wrong
•Validated claims (5 of 6 pass on MIMIC-IV):
- Pathways exist in clinical data. NMF discovers meaningful care pathway clusters. 5.0x accuracy lift over random. ✓
- Evidence improves confidence. The evidence model improves pathway assignment by 56% over codes alone. ✓
- Confidence tiers predict complexity. HIGH-tier cases have 2.3x higher pathway probability than LOW-tier. Strictly ordered (HIGH > MEDIUM > LOW). ✓
- Gated routing beats ungated. ✗ Fails on MIMIC-IV. See below.
- Monotonic abstention. Accuracy increases monotonically as the confidence threshold increases. Zero violations across 20 bins. ✓
- Explainability. Every decision has a structured explanation with top evidence features and routing rationale. ✓
•The Claim 4 failure — and why it's actually fine
On MIMIC-IV, gated routing produces the same average quality as ungated routing (0.9055 vs 0.9055). The confidence gate adds no quality improvement.
Why? Because 71.6% of encounters are already HIGH confidence. When most cases are already confident, the gate has very little to filter. This is a ceiling effect — the data is so rich that nearly everything passes the confidence bar.
This is actually good news for deployment. It means that on real clinical data:
- The system is confident about most cases (71.6% HIGH)
- The cases it's not confident about (9.7% LOW) are genuinely complex — they belong in physician review
- The gate's value isn't in improving average quality; it's in identifying the 9.7% that need the most expensive review and protecting them from automation
•The discharge concordance finding — and what it means
We tested whether confidence tiers predict clinical outcomes (specifically, discharge disposition — whether patients go home vs. to a skilled nursing facility). They don't. HIGH-confidence encounters have lower home-discharge rates (35.1%) than LOW-confidence encounters (42.1%).
This makes sense. PA routing complexity and clinical severity are different things. A straightforward hip replacement (HIGH confidence, clear pathway) still requires significant post-operative care (low home-discharge probability). A simple outpatient medication review (LOW confidence, ambiguous pathway) often results in the patient going home.
The confidence score measures how well the system can identify the care pathway — not how sick the patient is. This distinction matters: the system is designed to triage administrative review, not clinical outcomes.
05 Regulatory Alignment: CMS-0057-F
The January 2027 Mandate
CMS-0057-F (Interoperability and Patient Access Final Rule) requires health plans to implement electronic prior authorization using FHIR APIs by January 1, 2027. This is not optional — it applies to Medicare Advantage, Medicaid, CHIP, and qualified health plan issuers.
Key requirements:
- Electronic PA processing via FHIR APIs (no more fax-based workflows)
- Decision transparency — payers must provide specific reasons for PA decisions
- Timeliness — standard requests within 7 days, urgent within 72 hours
- Reporting — payers must report approval rates, processing times, and appeal outcomes
Several states have also passed "gold carding" legislation that exempts high-performing providers from PA requirements entirely.
How the Pipeline Aligns
| CMS-0057-F Requirement | Pipeline Implementation |
|---|---|
| Electronic processing via FHIR API | FastAPI endpoint (/v1/triage) with standard request/response models |
| Decision transparency | Per-decision explanation with top 3 evidence features + routing rationale |
| Specific denial reasons | Pathway detection + confidence scoring provides structured clinical basis |
| Timely decisions | <100ms per request (vs 48-hour manual average) |
| Appeal pathway | All LOW-confidence and protected cases go to physician review by default |
| Gold carding | Built-in: providers with ≥90% approval rate + ≥20 PA history get auto-approval |
| Audit trail | Full pathway distribution + confidence scores + feature weights logged per request |
The Gold Card Feature
Gold carding is increasingly mandated by state law. Texas (HB 3459), Louisiana, and West Virginia have all passed legislation requiring payers to exempt providers with high approval rates from PA requirements.
The pipeline implements this directly:
Gold card eligibility:
- Provider historical approval rate ≥ 90%
- At least 20 prior PA requests (sufficient track record)
→ Result: Auto-approve regardless of confidence tierThis means high-volume, high-quality providers see instant approvals — which is exactly the regulatory intent. The system tracks provider performance automatically and applies gold card status at the request level.
Determination Letter Format
Every decision generates a structured explanation suitable for CMS-compliant determination letters:
Care pathway: Cardiac (confidence: HIGH, probability: 82.3%)
Key clinical indicators:
- dx_group_circulatory (weight: 0.92)
- comorbidity_4280 [heart failure] (weight: 0.78)
- service_inpatient (weight: 0.65)
Clinical necessity score: 0.88
Decision: Auto-approved
Reason: Pathway alignment 82.3% exceeds threshold 75%
with HIGH confidence tier.
Clinical necessity score 0.88 exceeds minimum 0.60.
No protected case flags triggered.For denied or escalated cases, the same format explains why the case was routed to review — which specific signals were missing, ambiguous, or flagged. This gives the reviewing clinician a structured starting point rather than a blank chart.
What's Not Implemented Yet
The pipeline provides the routing and explanation infrastructure but does not yet include:
- InterQual/MCG guideline integration — the clinical necessity scorer uses a heuristic baseline, not commercial clinical criteria databases. A production system would need to integrate with these industry-standard guidelines.
- Real-time claims adjudication — the system routes requests but doesn't connect to payer adjudication systems.
- Member benefit verification — the system evaluates clinical appropriateness, not whether the member's plan covers the requested service.
- Real PA outcome validation — all validation uses pathway accuracy as a proxy, not actual approve/deny concordance with human reviewers.
These are integration tasks, not algorithmic gaps. The core confidence-gated routing logic is validated; the surrounding infrastructure needs payer-specific integration.
06 Why This Architecture Works Across Domains
The Same Pattern, Three Times
The prior auth pipeline uses the same three-component architecture as our e-commerce and ad personalization systems:
| Component | E-Commerce | Ads (Cookieless) | Prior Auth |
|---|---|---|---|
| Pattern detection | Shopping intent from clicks and cart events | Session intent from behavioral signals | Care pathway from diagnosis and procedure codes |
| Confidence gate | Is the intent signal clear enough to personalize? | Should we trust this session's signal? | Is the pathway assignment confident enough to auto-route? |
| Governed action | Rerank products by intent | Steer ads by intent with budget control | Route to auto-approve / nurse / physician review |
The domain changes. The signals change. The architecture doesn't.
Why Confidence Gating Generalizes
The Confidence Gate Theorem (validated in the CGT paper) explains why this works:
Structural uncertainty is actionable. When the system lacks data (a new user with 2 clicks, a PA request with 1 ambiguous code), the confidence score correctly identifies the uncertainty, and gating reliably improves outcomes.
The clinical domain is primarily structural. On MIMIC-IV, 79% of the explained variance in confidence scores comes from structural features (number of codes, code frequency, clinical coverage) rather than contextual features (demographics, admission type). This means the confidence signal is measuring what it should — data adequacy, not confounding context.
Zero monotonicity violations on clinical data. Across 20 confidence threshold bins on MIMIC-IV, accuracy increases monotonically. Every threshold you try gives equal or better results. This is the cleanest validation across all three domains.
The Evidence Model Difference
What separates this from a simple code-lookup is the evidence model. On MIMIC-IV:
- Without evidence model (codes only): mean top pathway probability = 0.424
- With evidence model (codes + 2,604 clinical features): mean top pathway probability = 0.662
- Improvement: +56%
The evidence model learns which clinical signals are predictive of which pathways. Provider specialty, urgency level, comorbidity burden, documentation quality — these all carry information about which pathway a case belongs to. The model learns these associations from data rather than encoding them as rules.
On SynPUF (synthetic data with only 46 features), this improvement was marginal (+4.3%). On MIMIC-IV (real data with 2,604 features), it's transformative. The evidence model needs rich data to work — and real clinical data is rich.
Getting Started
The pipeline is structured for integration:
- API: FastAPI endpoint at
/v1/triageaccepts structured PA requests, returns routing decisions with explanations - Models: Pretrained pathway embeddings + evidence graph weights, loadable from standard artifact files
- Configuration: Confidence thresholds, auto-approve criteria, and protected case rules are all configurable per payer
- Tests: 151/151 tests pass, including end-to-end pipeline tests, confidence gate logic, and all 6 validation claims
What's needed for a payer pilot:
- Access to historical claims data for pathway training (minimum ~10K encounters)
- Integration with the payer's PA intake system (FHIR-compatible)
- Clinical review of pathway labels and auto-approve thresholds
- Parallel run: route requests through both the system and existing manual process, compare concordance
Resources
- Paper: Confidence Gate Theorem (arXiv 2603.09947)
- governed-rank: github.com/rdoku/governed-rank
- Contact: ronald@haskelabs.com
07 Conclusion
Prior authorization is a $583 million annual administrative cost in the US — and most of it is spent reviewing cases that are ultimately approved. The problem isn't that review is unnecessary; it's that every case gets the same level of review regardless of complexity.
Our confidence-gated triage system addresses this by matching review intensity to case complexity:
On 10,000 real hospital encounters (MIMIC-IV):
- 38.3% auto-approved — clear pathway, high confidence, no safety flags → processed in milliseconds instead of hours
- 52.1% nurse review — some ambiguity, needs a human check but with the system's analysis as context → faster, more focused review
- 9.7% physician review — genuinely complex cases with low confidence or safety flags → full clinical evaluation, as it should be
The confidence signal is trustworthy:
- 71.6% of encounters achieve HIGH confidence (clear, decisive pathway signal)
- Zero monotonicity violations — every confidence threshold produces equal or better accuracy
- Calibration error of 0.032 — when the system says 80% confident, it's right about 80% of the time
- The evidence model improves pathway detection by 56% over raw codes alone
The system is transparent:
- Every decision produces a structured explanation citing specific clinical features and their evidence weights
- Routing rationale includes the thresholds met or missed
- Aligned with CMS-0057-F requirements for decision transparency and timely processing
- Built-in gold carding for high-performing providers
Honest limitations:
- All validation is on historical encounter data, not actual PA decisions. We measure pathway accuracy, not approve/deny concordance with human reviewers.
- Claim 4 (gated beats ungated) fails on MIMIC-IV due to ceiling effect — the data is so rich that most cases are already HIGH confidence. This is good for deployment but means the gate's value is in catching the 9.7% that need escalation, not in improving average quality.
- Clinical necessity scoring uses a heuristic baseline, not InterQual/MCG guidelines. Production deployment requires guideline integration.
- No payer pilot yet. The next step is a parallel run with a real payer, comparing system routing against manual review decisions.
The architecture is the same one validated across e-commerce (4.9x conversion lift on RetailRocket), advertising (1.9x on Criteo), and now healthcare (5.0x pathway accuracy on MIMIC-IV). The confidence gate works because the dominant uncertainty in all three domains is structural — the system doesn't have enough data to be sure — and structural uncertainty is exactly what confidence scoring is designed to measure.
The prior auth bottleneck won't be solved by better rules or blacker boxes. It will be solved by a system that knows what it knows, knows what it doesn't, and routes accordingly.
The prior auth bottleneck is a confidence problem
Prior authorization delays care because every request goes through the same manual review — whether it's a routine knee MRI or a complex multi-organ transplant. Our system detects the care pathway, measures how confident it is, and routes accordingly: auto-approve the clear cases, escalate the ambiguous ones. Validated on 10,000 real hospital encounters from MIMIC-IV.