Haske Labs logo

01 Introduction

Most AI fairness tools treat bias correction as a simple optimization problem: detect disparity, remove it, ship the fix. This assumption is dangerous.

Not all bias is the same. Some bias is proxy-driven — the model learned a shortcut through a protected attribute (like encoding gender in a hidden dimension) that doesn't contribute to the actual prediction task. Removing this bias is safe and improves fairness without harming accuracy.

But other bias is task-aligned — the disparity exists because the protected attribute genuinely correlates with the prediction target through legitimate features. In employment data, for example, occupational segregation means that gender correlates with job type, which correlates with income. Removing this signal destroys the model's ability to make accurate predictions.

Blind debiasing cannot tell the difference. It removes both kinds indiscriminately, and when it removes task-aligned bias, it degrades accuracy — sometimes catastrophically. The model becomes both less fair (through degraded predictions for everyone) and less useful.

GBP-Audit solves this. It audits before it acts, using geometric coherence to classify bias as proxy or task-aligned, then applies corrections only when five rigid guardrails confirm safety. The system outputs one of three decisions: ADJUST (safe to correct), ABSTAIN (correction would cause harm), or REVIEW (edge case requiring human judgment).

Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center. Published as an invited talk at the 7th IEEE International Conference on Cognitive Machine Intelligence (CogMI 2025). Validated on 7 real-world datasets across lending, employment, and public policy.

02 Why Blind Debiasing Fails

Two Kinds of Bias

In a model's internal representation space, bias manifests as a direction — a vector along which protected groups differ. The critical question is: does this bias direction align with the task direction?

Proxy Bias (Safe to Remove)

The model learned a shortcut through the protected attribute that doesn't contribute to the task. In representation space:

Task direction:  [0.65, 0.55, 0.70, 0.05, 0.10]  → uses dims 0, 1, 2
Bias direction:  [0.02, 0.02, 0.02, 0.02, 0.75]  → lives in dim 4
                  ↑ Different dimensions — safe to remove

The bias and task signals live in different dimensions. Removing bias doesn't touch the task signal. Like removing graffiti from a blank page — the essay on the other page is untouched.

Task-Aligned Bias (Dangerous to Remove)

The disparity exists because the protected attribute legitimately correlates with the target through real-world structure. In representation space:

Task direction:  [0.65, 0.55, 0.70, 0.05, 0.10]  → uses dims 0, 1, 2
Bias direction:  [0.60, 0.52, 0.68, 0.03, 0.15]  → ALSO in dims 0, 1, 2
                  ↑ Same dimensions — dangerous to remove

The bias and task signals are entangled in the same dimensions. Removing bias also removes task signal. Like erasing graffiti written over your essay — you destroy both.

Real-World Consequences

When blind debiasing hits task-aligned bias:

Employment data: Removing gender signal also removes occupation signal (because occupational segregation entangles them). AUROC drops, calibration degrades, the model becomes worse for everyone.
Credit data: Removing age signal also removes financial maturity signal. Default predictions become unreliable.
Healthcare data: Removing race signal also removes disease prevalence signal. Clinical risk scores become inaccurate.

The standard fairness toolkit has no mechanism to detect this before it happens. GBP-Audit does.

03 Geometric Coherence & the Audit-Then-Act Pipeline

GBP-Audit introduces a single diagnostic quantity — geometric coherence (κ) — that determines whether bias correction is safe before any intervention is applied.

Calibration Data + Frozen Model
         |
    [1. Compute Directions]  →  Task direction (t) and bias direction (b)
         |
    [2. Measure Coherence]   →  κ = |cos(t, b)| — are they aligned?
         |
    [3. Decision Gate]       →  κ > 0.7: ABSTAIN
         |                       κ < 0.3: Full correction range
         |                       0.3–0.7: Limited correction
    [4. Orthogonalize]       →  b_perp = b - proj(b onto t)
         |
    [5. Grid Search τ]       →  Find minimal correction passing 5 guardrails
         |
    ADJUST / ABSTAIN / REVIEW + Governance Packet

•Step 1: Compute Task and Bias Directions

From the frozen model's penultimate layer representations on a calibration split:

Task direction (t): Average representation of positive-class samples minus average of negative-class samples. This is the direction the model uses to separate outcomes.
Bias direction (b): Average representation of group A minus group B (within positive-class samples). This is the direction along which protected groups differ.

•Step 2: Geometric Coherence (κ)

κ = |⟨t, b⟩| / (||t|| × ||b||)

This is the absolute cosine similarity between the task and bias directions. It measures how much they point in the same direction:

κ ≈ 0 — Bias is perpendicular to task. Safe to remove.
κ ≈ 1 — Bias is aligned with task. Removing it destroys predictions.

•Step 3: Decision Gate

Based on κ, the system gates the intervention:

κ Range	Decision	Rationale
κ > 0.7	ABSTAIN	Bias is task-aligned; correction would harm accuracy
0.3 ≤ κ ≤ 0.7	Limited correction (τ ≤ 0.3)	Partial entanglement; conservative intervention only
κ < 0.3	Full correction range (τ ∈ [0, 1])	Proxy bias; safe to remove aggressively

•Step 4: Orthogonalization

Even when bias is mostly perpendicular to the task, it may have a small parallel component. Orthogonalization removes this:

b_perp = b - (b · t / ||t||²) × t

After orthogonalization, the correction vector b_perp is guaranteed to be perpendicular to the task direction — removing it cannot affect the task signal.

•Step 5: Five Guardrails

Before any correction ships, it must pass all five guardrails:

#	Guardrail	What It Checks	Threshold
1	TOST AUROC	Discrimination ability preserved	Non-inferiority margin δ = 0.01
2	ECE Cap	Calibration not degraded	ΔECE < 0.02
3	FPR Drift	False positive rate stable per group	ΔFPR < 0.03
4	Multi-Seed Stability	Result consistent across random seeds	σ(AUROC) < 0.005
5	Fairness Improvement	Equalized odds gap actually reduced	ΔEO > 0

If any guardrail fails at every τ value tried, the system outputs ABSTAIN — even if coherence suggested correction was possible.

•The τ Grid Search

For each candidate severity level τ from 0 to max_τ (in steps of 0.02):

h_corrected = h - τ × (P_perp × h)

The system selects the smallest τ where all five guardrails pass. Minimal intervention principle: never remove more bias than necessary.

04 The 3-4 Split: Results Across 7 Datasets

Overview

Across 7 real-world datasets, GBP-Audit correctly identified which datasets had removable (proxy) bias and which had entangled (task-aligned) bias. In every case, the system made the right call: correct when safe, abstain when not.

Datasets Corrected (3 of 7) — ADJUST

These datasets had low coherence (κ < 0.3), indicating proxy bias that was safe to remove:

Dataset	Attribute	κ	τ	Disparity Before	Disparity After	AUROC Change
Adult Income	Sex	0.12	0.42	5.2%	1.8%	+0.001
Bank Marketing	Age	0.08	0.65	8.7%	2.3%	−0.002
German Credit	Sex	0.19	0.38	4.1%	1.5%	+0.000

Key observations:

Disparity reduction up to 73% (Bank Marketing: 8.7% → 2.3%)
Zero meaningful AUROC degradation — all changes within ±0.002
All five guardrails passed at the selected τ values
Corrections were conservative — τ values well below maximum

Datasets Abstained (4 of 7) — ABSTAIN

These datasets had high coherence (κ > 0.5), indicating task-aligned bias that was dangerous to remove:

Dataset	Attribute	κ	Decision	Why
Employment	Sex	0.78	ABSTAIN	Occupational segregation entangles gender with job type
Taiwan Credit	Sex	0.65	ABSTAIN	Payment behavior correlates with credit risk through gendered economic patterns
Folktables Income	Race	0.82	ABSTAIN	Structural economic inequality means race correlates with income through legitimate predictors
Folktables Public Coverage	Race	0.71	ABSTAIN	Insurance coverage patterns reflect real disparities in access

Key observations:

Coherence correctly identified entanglement in every case
Attempting correction on these datasets would have degraded AUROC by 2–8%
Abstention prevented harm — no silent accuracy degradation shipped

The Safety Record

Metric	Value
Datasets evaluated	7
Correct ADJUST decisions	3
Correct ABSTAIN decisions	4
Harmful interventions shipped	0
False safety claims	0
Accuracy degradation on corrected datasets	< 0.002 AUROC

The system's safety record is perfect: it never shipped a correction that harmed accuracy, and it never failed to flag a dataset where correction would have been dangerous.

05 Production Design: Attribute-Free, Sub-Millisecond, Auditable

Attribute-Free Inference

A critical design constraint: GBP-Audit does not require the protected attribute at inference time.

The calibration phase uses protected attributes (from a held-out calibration set) to compute the bias direction and optimal τ. But the resulting correction — the projection matrix P_perp and severity τ — operates purely on the model's internal representations.

# Calibration (offline, one-time)
bias_direction = compute_bias_direction(calibration_data, protected_attr)
b_perp = orthogonalize(bias_direction, task_direction)
P_perp = outer(b_perp, b_perp) / norm(b_perp)²
tau = grid_search(P_perp, calibration_data, guardrails)

# Inference (online, per-request)
h = model.penultimate_layer(input)        # No protected attribute needed
h_corrected = h - tau * (P_perp @ h)       # Matrix multiply + subtract
prediction = model.final_layer(h_corrected)

This means:

No demographic data collected at serving time — privacy-preserving by design
No disparate treatment risk — the correction doesn't branch on protected attributes
Regulatory compliance — satisfies requirements that prohibit using protected attributes in decisions

Sub-Millisecond Latency

The inference-time correction is a single matrix multiplication and vector subtraction:

Operation	Complexity	Typical Latency
Penultimate extraction	Already computed	0 ms additional
P_perp @ h	O(d²) where d = embedding dim	< 0.1 ms
h - τ × result	O(d)	< 0.01 ms
Total overhead		< 0.5 ms

For typical embedding dimensions (64–512), the correction adds negligible latency to the inference path.

Governance Packets

Every GBP-Audit decision produces a machine-readable governance packet — a complete audit trail that can be handed to regulators, compliance officers, or external auditors:

{
  "dataset": "bank_marketing",
  "protected_attribute": "age",
  "coherence_kappa": 0.08,
  "decision": "ADJUST",
  "tau": 0.65,
  "guardrails": {
    "tost_auroc": { "passed": true, "baseline": 0.912, "corrected": 0.910, "delta": 0.01 },
    "ece_cap": { "passed": true, "baseline": 0.034, "corrected": 0.032 },
    "fpr_drift": { "passed": true, "max_drift": 0.008 },
    "multi_seed": { "passed": true, "auroc_std": 0.002 },
    "fairness_improvement": { "passed": true, "eo_gap_before": 0.087, "eo_gap_after": 0.023 }
  },
  "hash": "sha256:9f3a..."
}

Regulatory Alignment

GBP-Audit's governance packets map directly to regulatory requirements:

Regulation	Requirement	GBP-Audit Evidence
NYC LL 144	Bias audit for automated employment decisions	Coherence analysis + guardrail results
EU AI Act	Risk assessment for high-risk AI systems	Full governance packet with reproducibility proof
SR 11-7 (OCC/Fed)	Model risk management for banking	AUROC non-inferiority, calibration cap, FPR drift bounds
CFPB Fair Lending	Adverse action documentation	Per-attribute coherence + intervention decision rationale

06 Real-World Scenario: Banking Fair-Lending Review

The Situation

Amy is a Risk & Model Governance Manager at a mid-size bank. CFPB/OCC examiners have requested evidence that the bank's credit approval model is fair and that any post-hoc fixes don't secretly hurt model quality. She has 90 days.

The production model is frozen — retraining would take months and require cross-team approvals. Amy needs a solution that works post-hoc on the existing model.

Day 1: Discovery

The compliance team identifies a 12% age-based disparity in approval rates. Applicants over 55 are approved at significantly lower rates than younger applicants with similar credit profiles.

Traditional approach: retrain the model with age-blinded features. Timeline: 6+ months. Cost: significant engineering resources plus re-validation of the entire model.

Day 2: GBP-Audit Analysis

Amy runs GBP-Audit on a calibration split from the bank's validation data:

Step 1: Extract penultimate representations from frozen model
Step 2: Compute task direction (approved vs. rejected)
Step 3: Compute bias direction (over-55 vs. under-55)
Step 4: Measure coherence → κ = 0.11 (low — proxy bias)

Result: The age signal in the model's representations is largely orthogonal to the credit-risk signal. The model learned an age shortcut that doesn't contribute to accurate credit decisions. Safe to correct.

Day 3: Correction & Validation

GBP-Audit runs the full pipeline:

1. Orthogonalize bias direction against task direction → b_perp
2. Grid search τ from 0.0 to 1.0 in steps of 0.02
3. At τ = 0.58, all five guardrails pass:
   ✓ TOST AUROC: 0.891 → 0.889 (within δ = 0.01)
   ✓ ECE: 0.031 → 0.029 (improved)
   ✓ FPR drift: max 0.007 across age groups (< 0.03)
   ✓ Multi-seed: σ(AUROC) = 0.0018 (< 0.005)
   ✓ Fairness: EO gap 12.1% → 3.2% (improved)

Decision: ADJUST with τ = 0.58

Day 4: Regulatory Submission

Amy submits the governance packet to the CFPB/OCC examiners. It contains:

Baseline metrics (pre-correction)
Coherence analysis showing bias was proxy-driven
All five guardrail results with confidence intervals
Post-correction metrics showing 74% disparity reduction with negligible accuracy impact
Reproducibility hash for exact replication

Outcome

Metric	Traditional Retraining	GBP-Audit
Time to resolution	6+ months	3 days
Model changes required	Full retrain	Post-hoc layer
Accuracy impact	Unknown until retrained	Verified < 0.002 AUROC
Audit trail	Manual documentation	Machine-readable governance packet
Reversibility	Irreversible	Fully reversible (remove projection)
Protected attribute at inference	May be needed	Not needed

The correction is a lightweight post-hoc layer that can be toggled on or off without touching the production model. If new data suggests the correction is no longer appropriate, it can be removed instantly.

07 Conclusion

The standard approach to AI fairness assumes that all bias should be removed. GBP-Audit challenges this assumption with a simple geometric insight: bias that aligns with the task direction is not a bug — it's a reflection of real-world structure. Removing it doesn't create fairness; it creates inaccuracy.

The audit-then-act framework changes the question from "how do we remove this bias?" to "should we remove this bias?" Geometric coherence (κ) provides a principled answer, and five guardrails ensure that corrections that do ship are safe.

Across 7 datasets, GBP-Audit achieved a perfect safety record:

3 datasets corrected with up to 73% disparity reduction and zero meaningful accuracy loss
4 datasets correctly abstained — preventing interventions that would have degraded AUROC by 2–8%
Zero harmful interventions shipped

The system's most important output is not the correction — it's the abstention. When GBP-Audit says "do not correct this bias," it is protecting both the model's predictive quality and the populations it serves. An inaccurate model helps no one.

This is the safety-first paradigm: the default is to do nothing. Corrections must earn their deployment by passing every guardrail. Abstention is not failure — it is the system working as designed.

Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center.

Know when to fix bias — and when not to

GBP-Audit uses geometric coherence to distinguish proxy bias (safe to remove) from task-aligned bias (dangerous to remove), then applies corrections only when five rigid guardrails pass. The result: ADJUST, ABSTAIN, or REVIEW — never a blind fix. Validated on 7 datasets across lending, employment, and public policy.

Discuss This Research Read the Paper

GBP-Audit Safety-First Bias Correction for AI Models