01 Introduction
Most AI fairness tools treat bias correction as a simple optimization problem: detect disparity, remove it, ship the fix. This assumption is dangerous.
Not all bias is the same. Some bias is proxy-driven — the model learned a shortcut through a protected attribute (like encoding gender in a hidden dimension) that doesn't contribute to the actual prediction task. Removing this bias is safe and improves fairness without harming accuracy.
But other bias is task-aligned — the disparity exists because the protected attribute genuinely correlates with the prediction target through legitimate features. In employment data, for example, occupational segregation means that gender correlates with job type, which correlates with income. Removing this signal destroys the model's ability to make accurate predictions.
Blind debiasing cannot tell the difference. It removes both kinds indiscriminately, and when it removes task-aligned bias, it degrades accuracy — sometimes catastrophically. The model becomes both less fair (through degraded predictions for everyone) and less useful.
GBP-Audit solves this. It audits before it acts, using geometric coherence to classify bias as proxy or task-aligned, then applies corrections only when five rigid guardrails confirm safety. The system outputs one of three decisions: ADJUST (safe to correct), ABSTAIN (correction would cause harm), or REVIEW (edge case requiring human judgment).
Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center. Published as an invited talk at the 7th IEEE International Conference on Cognitive Machine Intelligence (CogMI 2025). Validated on 7 real-world datasets across lending, employment, and public policy.
02 Why Blind Debiasing Fails
Two Kinds of Bias
In a model's internal representation space, bias manifests as a direction — a vector along which protected groups differ. The critical question is: does this bias direction align with the task direction?
Proxy Bias (Safe to Remove)
The model learned a shortcut through the protected attribute that doesn't contribute to the task. In representation space:
Task direction: [0.65, 0.55, 0.70, 0.05, 0.10] → uses dims 0, 1, 2
Bias direction: [0.02, 0.02, 0.02, 0.02, 0.75] → lives in dim 4
↑ Different dimensions — safe to removeThe bias and task signals live in different dimensions. Removing bias doesn't touch the task signal. Like removing graffiti from a blank page — the essay on the other page is untouched.
Task-Aligned Bias (Dangerous to Remove)
The disparity exists because the protected attribute legitimately correlates with the target through real-world structure. In representation space:
Task direction: [0.65, 0.55, 0.70, 0.05, 0.10] → uses dims 0, 1, 2
Bias direction: [0.60, 0.52, 0.68, 0.03, 0.15] → ALSO in dims 0, 1, 2
↑ Same dimensions — dangerous to removeThe bias and task signals are entangled in the same dimensions. Removing bias also removes task signal. Like erasing graffiti written over your essay — you destroy both.
Real-World Consequences
When blind debiasing hits task-aligned bias:
- Employment data: Removing gender signal also removes occupation signal (because occupational segregation entangles them). AUROC drops, calibration degrades, the model becomes worse for everyone.
- Credit data: Removing age signal also removes financial maturity signal. Default predictions become unreliable.
- Healthcare data: Removing race signal also removes disease prevalence signal. Clinical risk scores become inaccurate.
The standard fairness toolkit has no mechanism to detect this before it happens. GBP-Audit does.
03 Geometric Coherence & the Audit-Then-Act Pipeline
GBP-Audit introduces a single diagnostic quantity — geometric coherence (κ) — that determines whether bias correction is safe before any intervention is applied.
Calibration Data + Frozen Model
|
[1. Compute Directions] → Task direction (t) and bias direction (b)
|
[2. Measure Coherence] → κ = |cos(t, b)| — are they aligned?
|
[3. Decision Gate] → κ > 0.7: ABSTAIN
| κ < 0.3: Full correction range
| 0.3–0.7: Limited correction
[4. Orthogonalize] → b_perp = b - proj(b onto t)
|
[5. Grid Search τ] → Find minimal correction passing 5 guardrails
|
ADJUST / ABSTAIN / REVIEW + Governance Packet•Step 1: Compute Task and Bias Directions
From the frozen model's penultimate layer representations on a calibration split:
- Task direction (t): Average representation of positive-class samples minus average of negative-class samples. This is the direction the model uses to separate outcomes.
- Bias direction (b): Average representation of group A minus group B (within positive-class samples). This is the direction along which protected groups differ.
•Step 2: Geometric Coherence (κ)
κ = |⟨t, b⟩| / (||t|| × ||b||)This is the absolute cosine similarity between the task and bias directions. It measures how much they point in the same direction:
- κ ≈ 0 — Bias is perpendicular to task. Safe to remove.
- κ ≈ 1 — Bias is aligned with task. Removing it destroys predictions.
•Step 3: Decision Gate
Based on κ, the system gates the intervention:
| κ Range | Decision | Rationale |
|---|---|---|
| κ > 0.7 | ABSTAIN | Bias is task-aligned; correction would harm accuracy |
| 0.3 ≤ κ ≤ 0.7 | Limited correction (τ ≤ 0.3) | Partial entanglement; conservative intervention only |
| κ < 0.3 | Full correction range (τ ∈ [0, 1]) | Proxy bias; safe to remove aggressively |
•Step 4: Orthogonalization
Even when bias is mostly perpendicular to the task, it may have a small parallel component. Orthogonalization removes this:
b_perp = b - (b · t / ||t||²) × tAfter orthogonalization, the correction vector b_perp is guaranteed to be perpendicular to the task direction — removing it cannot affect the task signal.
•Step 5: Five Guardrails
Before any correction ships, it must pass all five guardrails:
| # | Guardrail | What It Checks | Threshold |
|---|---|---|---|
| 1 | TOST AUROC | Discrimination ability preserved | Non-inferiority margin δ = 0.01 |
| 2 | ECE Cap | Calibration not degraded | ΔECE < 0.02 |
| 3 | FPR Drift | False positive rate stable per group | ΔFPR < 0.03 |
| 4 | Multi-Seed Stability | Result consistent across random seeds | σ(AUROC) < 0.005 |
| 5 | Fairness Improvement | Equalized odds gap actually reduced | ΔEO > 0 |
If any guardrail fails at every τ value tried, the system outputs ABSTAIN — even if coherence suggested correction was possible.
•The τ Grid Search
For each candidate severity level τ from 0 to max_τ (in steps of 0.02):
h_corrected = h - τ × (P_perp × h)The system selects the smallest τ where all five guardrails pass. Minimal intervention principle: never remove more bias than necessary.
04 The 3-4 Split: Results Across 7 Datasets
Overview
Across 7 real-world datasets, GBP-Audit correctly identified which datasets had removable (proxy) bias and which had entangled (task-aligned) bias. In every case, the system made the right call: correct when safe, abstain when not.
Datasets Corrected (3 of 7) — ADJUST
These datasets had low coherence (κ < 0.3), indicating proxy bias that was safe to remove:
| Dataset | Attribute | κ | τ | Disparity Before | Disparity After | AUROC Change |
|---|---|---|---|---|---|---|
| Adult Income | Sex | 0.12 | 0.42 | 5.2% | 1.8% | +0.001 |
| Bank Marketing | Age | 0.08 | 0.65 | 8.7% | 2.3% | −0.002 |
| German Credit | Sex | 0.19 | 0.38 | 4.1% | 1.5% | +0.000 |
Key observations:
- Disparity reduction up to 73% (Bank Marketing: 8.7% → 2.3%)
- Zero meaningful AUROC degradation — all changes within ±0.002
- All five guardrails passed at the selected τ values
- Corrections were conservative — τ values well below maximum
Datasets Abstained (4 of 7) — ABSTAIN
These datasets had high coherence (κ > 0.5), indicating task-aligned bias that was dangerous to remove:
| Dataset | Attribute | κ | Decision | Why |
|---|---|---|---|---|
| Employment | Sex | 0.78 | ABSTAIN | Occupational segregation entangles gender with job type |
| Taiwan Credit | Sex | 0.65 | ABSTAIN | Payment behavior correlates with credit risk through gendered economic patterns |
| Folktables Income | Race | 0.82 | ABSTAIN | Structural economic inequality means race correlates with income through legitimate predictors |
| Folktables Public Coverage | Race | 0.71 | ABSTAIN | Insurance coverage patterns reflect real disparities in access |
Key observations:
- Coherence correctly identified entanglement in every case
- Attempting correction on these datasets would have degraded AUROC by 2–8%
- Abstention prevented harm — no silent accuracy degradation shipped
The Safety Record
| Metric | Value |
|---|---|
| Datasets evaluated | 7 |
| Correct ADJUST decisions | 3 |
| Correct ABSTAIN decisions | 4 |
| Harmful interventions shipped | 0 |
| False safety claims | 0 |
| Accuracy degradation on corrected datasets | < 0.002 AUROC |
The system's safety record is perfect: it never shipped a correction that harmed accuracy, and it never failed to flag a dataset where correction would have been dangerous.
05 Production Design: Attribute-Free, Sub-Millisecond, Auditable
Attribute-Free Inference
A critical design constraint: GBP-Audit does not require the protected attribute at inference time.
The calibration phase uses protected attributes (from a held-out calibration set) to compute the bias direction and optimal τ. But the resulting correction — the projection matrix P_perp and severity τ — operates purely on the model's internal representations.
# Calibration (offline, one-time)
bias_direction = compute_bias_direction(calibration_data, protected_attr)
b_perp = orthogonalize(bias_direction, task_direction)
P_perp = outer(b_perp, b_perp) / norm(b_perp)²
tau = grid_search(P_perp, calibration_data, guardrails)
# Inference (online, per-request)
h = model.penultimate_layer(input) # No protected attribute needed
h_corrected = h - tau * (P_perp @ h) # Matrix multiply + subtract
prediction = model.final_layer(h_corrected)This means:
- No demographic data collected at serving time — privacy-preserving by design
- No disparate treatment risk — the correction doesn't branch on protected attributes
- Regulatory compliance — satisfies requirements that prohibit using protected attributes in decisions
Sub-Millisecond Latency
The inference-time correction is a single matrix multiplication and vector subtraction:
| Operation | Complexity | Typical Latency |
|---|---|---|
| Penultimate extraction | Already computed | 0 ms additional |
| P_perp @ h | O(d²) where d = embedding dim | < 0.1 ms |
| h - τ × result | O(d) | < 0.01 ms |
| Total overhead | < 0.5 ms |
For typical embedding dimensions (64–512), the correction adds negligible latency to the inference path.
Governance Packets
Every GBP-Audit decision produces a machine-readable governance packet — a complete audit trail that can be handed to regulators, compliance officers, or external auditors:
{
"dataset": "bank_marketing",
"protected_attribute": "age",
"coherence_kappa": 0.08,
"decision": "ADJUST",
"tau": 0.65,
"guardrails": {
"tost_auroc": { "passed": true, "baseline": 0.912, "corrected": 0.910, "delta": 0.01 },
"ece_cap": { "passed": true, "baseline": 0.034, "corrected": 0.032 },
"fpr_drift": { "passed": true, "max_drift": 0.008 },
"multi_seed": { "passed": true, "auroc_std": 0.002 },
"fairness_improvement": { "passed": true, "eo_gap_before": 0.087, "eo_gap_after": 0.023 }
},
"hash": "sha256:9f3a..."
}Regulatory Alignment
GBP-Audit's governance packets map directly to regulatory requirements:
| Regulation | Requirement | GBP-Audit Evidence |
|---|---|---|
| NYC LL 144 | Bias audit for automated employment decisions | Coherence analysis + guardrail results |
| EU AI Act | Risk assessment for high-risk AI systems | Full governance packet with reproducibility proof |
| SR 11-7 (OCC/Fed) | Model risk management for banking | AUROC non-inferiority, calibration cap, FPR drift bounds |
| CFPB Fair Lending | Adverse action documentation | Per-attribute coherence + intervention decision rationale |
06 Real-World Scenario: Banking Fair-Lending Review
The Situation
Amy is a Risk & Model Governance Manager at a mid-size bank. CFPB/OCC examiners have requested evidence that the bank's credit approval model is fair and that any post-hoc fixes don't secretly hurt model quality. She has 90 days.
The production model is frozen — retraining would take months and require cross-team approvals. Amy needs a solution that works post-hoc on the existing model.
Day 1: Discovery
The compliance team identifies a 12% age-based disparity in approval rates. Applicants over 55 are approved at significantly lower rates than younger applicants with similar credit profiles.
Traditional approach: retrain the model with age-blinded features. Timeline: 6+ months. Cost: significant engineering resources plus re-validation of the entire model.
Day 2: GBP-Audit Analysis
Amy runs GBP-Audit on a calibration split from the bank's validation data:
Step 1: Extract penultimate representations from frozen model
Step 2: Compute task direction (approved vs. rejected)
Step 3: Compute bias direction (over-55 vs. under-55)
Step 4: Measure coherence → κ = 0.11 (low — proxy bias)Result: The age signal in the model's representations is largely orthogonal to the credit-risk signal. The model learned an age shortcut that doesn't contribute to accurate credit decisions. Safe to correct.
Day 3: Correction & Validation
GBP-Audit runs the full pipeline:
1. Orthogonalize bias direction against task direction → b_perp
2. Grid search τ from 0.0 to 1.0 in steps of 0.02
3. At τ = 0.58, all five guardrails pass:
✓ TOST AUROC: 0.891 → 0.889 (within δ = 0.01)
✓ ECE: 0.031 → 0.029 (improved)
✓ FPR drift: max 0.007 across age groups (< 0.03)
✓ Multi-seed: σ(AUROC) = 0.0018 (< 0.005)
✓ Fairness: EO gap 12.1% → 3.2% (improved)Decision: ADJUST with τ = 0.58
Day 4: Regulatory Submission
Amy submits the governance packet to the CFPB/OCC examiners. It contains:
- Baseline metrics (pre-correction)
- Coherence analysis showing bias was proxy-driven
- All five guardrail results with confidence intervals
- Post-correction metrics showing 74% disparity reduction with negligible accuracy impact
- Reproducibility hash for exact replication
Outcome
| Metric | Traditional Retraining | GBP-Audit |
|---|---|---|
| Time to resolution | 6+ months | 3 days |
| Model changes required | Full retrain | Post-hoc layer |
| Accuracy impact | Unknown until retrained | Verified < 0.002 AUROC |
| Audit trail | Manual documentation | Machine-readable governance packet |
| Reversibility | Irreversible | Fully reversible (remove projection) |
| Protected attribute at inference | May be needed | Not needed |
The correction is a lightweight post-hoc layer that can be toggled on or off without touching the production model. If new data suggests the correction is no longer appropriate, it can be removed instantly.
07 Conclusion
The standard approach to AI fairness assumes that all bias should be removed. GBP-Audit challenges this assumption with a simple geometric insight: bias that aligns with the task direction is not a bug — it's a reflection of real-world structure. Removing it doesn't create fairness; it creates inaccuracy.
The audit-then-act framework changes the question from "how do we remove this bias?" to "should we remove this bias?" Geometric coherence (κ) provides a principled answer, and five guardrails ensure that corrections that do ship are safe.
Across 7 datasets, GBP-Audit achieved a perfect safety record:
- 3 datasets corrected with up to 73% disparity reduction and zero meaningful accuracy loss
- 4 datasets correctly abstained — preventing interventions that would have degraded AUROC by 2–8%
- Zero harmful interventions shipped
The system's most important output is not the correction — it's the abstention. When GBP-Audit says "do not correct this bias," it is protecting both the model's predictive quality and the populations it serves. An inaccurate model helps no one.
This is the safety-first paradigm: the default is to do nothing. Corrections must earn their deployment by passing every guardrail. Abstention is not failure — it is the system working as designed.
Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center.
Know when to fix bias — and when not to
GBP-Audit uses geometric coherence to distinguish proxy bias (safe to remove) from task-aligned bias (dangerous to remove), then applies corrections only when five rigid guardrails pass. The result: ADJUST, ABSTAIN, or REVIEW — never a blind fix. Validated on 7 datasets across lending, employment, and public policy.