Published · IEEE CogMI 20255 Guardrails7 DatasetsZero Harm

GBP-Audit Safety-First Bias Correction for AI Models

Audit-then-act framework that determines when bias is safe to remove — preventing harmful corrections through geometric coherence and five guardrails.

7

Datasets Evaluated

Real-world datasets across lending, employment, and public policy

3 of 7

Safe Corrections

Datasets where bias was safely removable

up to 73%

Disparity Reduction

Bank Marketing age-based disparity reduced from 8.7% to 2.3%

0

Harmful Interventions

Zero accuracy degradation on corrected datasets

01 Introduction

Most AI fairness tools treat bias correction as a simple optimization problem: detect disparity, remove it, ship the fix. This assumption is dangerous.

Not all bias is the same. Some bias is proxy-driven — the model learned a shortcut through a protected attribute (like encoding gender in a hidden dimension) that doesn't contribute to the actual prediction task. Removing this bias is safe and improves fairness without harming accuracy.

But other bias is task-aligned — the disparity exists because the protected attribute genuinely correlates with the prediction target through legitimate features. In employment data, for example, occupational segregation means that gender correlates with job type, which correlates with income. Removing this signal destroys the model's ability to make accurate predictions.

Blind debiasing cannot tell the difference. It removes both kinds indiscriminately, and when it removes task-aligned bias, it degrades accuracy — sometimes catastrophically. The model becomes both less fair (through degraded predictions for everyone) and less useful.

GBP-Audit solves this. It audits before it acts, using geometric coherence to classify bias as proxy or task-aligned, then applies corrections only when five rigid guardrails confirm safety. The system outputs one of three decisions: ADJUST (safe to correct), ABSTAIN (correction would cause harm), or REVIEW (edge case requiring human judgment).

Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center. Published as an invited talk at the 7th IEEE International Conference on Cognitive Machine Intelligence (CogMI 2025). Validated on 7 real-world datasets across lending, employment, and public policy.

02 Why Blind Debiasing Fails

Two Kinds of Bias

In a model's internal representation space, bias manifests as a direction — a vector along which protected groups differ. The critical question is: does this bias direction align with the task direction?

Proxy Bias (Safe to Remove)

The model learned a shortcut through the protected attribute that doesn't contribute to the task. In representation space:

Task direction:  [0.65, 0.55, 0.70, 0.05, 0.10]  → uses dims 0, 1, 2
Bias direction:  [0.02, 0.02, 0.02, 0.02, 0.75]  → lives in dim 4
                  ↑ Different dimensions — safe to remove

The bias and task signals live in different dimensions. Removing bias doesn't touch the task signal. Like removing graffiti from a blank page — the essay on the other page is untouched.

Task-Aligned Bias (Dangerous to Remove)

The disparity exists because the protected attribute legitimately correlates with the target through real-world structure. In representation space:

Task direction:  [0.65, 0.55, 0.70, 0.05, 0.10]  → uses dims 0, 1, 2
Bias direction:  [0.60, 0.52, 0.68, 0.03, 0.15]  → ALSO in dims 0, 1, 2
                  ↑ Same dimensions — dangerous to remove

The bias and task signals are entangled in the same dimensions. Removing bias also removes task signal. Like erasing graffiti written over your essay — you destroy both.

Real-World Consequences

When blind debiasing hits task-aligned bias:

  • Employment data: Removing gender signal also removes occupation signal (because occupational segregation entangles them). AUROC drops, calibration degrades, the model becomes worse for everyone.
  • Credit data: Removing age signal also removes financial maturity signal. Default predictions become unreliable.
  • Healthcare data: Removing race signal also removes disease prevalence signal. Clinical risk scores become inaccurate.

The standard fairness toolkit has no mechanism to detect this before it happens. GBP-Audit does.

03 Geometric Coherence & the Audit-Then-Act Pipeline

GBP-Audit introduces a single diagnostic quantity — geometric coherence (κ) — that determines whether bias correction is safe before any intervention is applied.

Calibration Data + Frozen Model
         |
    [1. Compute Directions]  →  Task direction (t) and bias direction (b)
         |
    [2. Measure Coherence]   →  κ = |cos(t, b)| — are they aligned?
         |
    [3. Decision Gate]       →  κ > 0.7: ABSTAIN
         |                       κ < 0.3: Full correction range
         |                       0.3–0.7: Limited correction
    [4. Orthogonalize]       →  b_perp = b - proj(b onto t)
         |
    [5. Grid Search τ]       →  Find minimal correction passing 5 guardrails
         |
    ADJUST / ABSTAIN / REVIEW + Governance Packet

Step 1: Compute Task and Bias Directions

From the frozen model's penultimate layer representations on a calibration split:

  • Task direction (t): Average representation of positive-class samples minus average of negative-class samples. This is the direction the model uses to separate outcomes.
  • Bias direction (b): Average representation of group A minus group B (within positive-class samples). This is the direction along which protected groups differ.

Step 2: Geometric Coherence (κ)

κ = |⟨t, b⟩| / (||t|| × ||b||)

This is the absolute cosine similarity between the task and bias directions. It measures how much they point in the same direction:

  • κ ≈ 0 — Bias is perpendicular to task. Safe to remove.
  • κ ≈ 1 — Bias is aligned with task. Removing it destroys predictions.

Step 3: Decision Gate

Based on κ, the system gates the intervention:

κ RangeDecisionRationale
κ > 0.7ABSTAINBias is task-aligned; correction would harm accuracy
0.3 ≤ κ ≤ 0.7Limited correction (τ ≤ 0.3)Partial entanglement; conservative intervention only
κ < 0.3Full correction range (τ ∈ [0, 1])Proxy bias; safe to remove aggressively

Step 4: Orthogonalization

Even when bias is mostly perpendicular to the task, it may have a small parallel component. Orthogonalization removes this:

b_perp = b - (b · t / ||t||²) × t

After orthogonalization, the correction vector b_perp is guaranteed to be perpendicular to the task direction — removing it cannot affect the task signal.

Step 5: Five Guardrails

Before any correction ships, it must pass all five guardrails:

#GuardrailWhat It ChecksThreshold
1TOST AUROCDiscrimination ability preservedNon-inferiority margin δ = 0.01
2ECE CapCalibration not degradedΔECE < 0.02
3FPR DriftFalse positive rate stable per groupΔFPR < 0.03
4Multi-Seed StabilityResult consistent across random seedsσ(AUROC) < 0.005
5Fairness ImprovementEqualized odds gap actually reducedΔEO > 0

If any guardrail fails at every τ value tried, the system outputs ABSTAIN — even if coherence suggested correction was possible.

The τ Grid Search

For each candidate severity level τ from 0 to max_τ (in steps of 0.02):

h_corrected = h - τ × (P_perp × h)

The system selects the smallest τ where all five guardrails pass. Minimal intervention principle: never remove more bias than necessary.

04 The 3-4 Split: Results Across 7 Datasets

Overview

Across 7 real-world datasets, GBP-Audit correctly identified which datasets had removable (proxy) bias and which had entangled (task-aligned) bias. In every case, the system made the right call: correct when safe, abstain when not.

Datasets Corrected (3 of 7) — ADJUST

These datasets had low coherence (κ < 0.3), indicating proxy bias that was safe to remove:

DatasetAttributeκτDisparity BeforeDisparity AfterAUROC Change
Adult IncomeSex0.120.425.2%1.8%+0.001
Bank MarketingAge0.080.658.7%2.3%−0.002
German CreditSex0.190.384.1%1.5%+0.000

Key observations:

  • Disparity reduction up to 73% (Bank Marketing: 8.7% → 2.3%)
  • Zero meaningful AUROC degradation — all changes within ±0.002
  • All five guardrails passed at the selected τ values
  • Corrections were conservative — τ values well below maximum

Datasets Abstained (4 of 7) — ABSTAIN

These datasets had high coherence (κ > 0.5), indicating task-aligned bias that was dangerous to remove:

DatasetAttributeκDecisionWhy
EmploymentSex0.78ABSTAINOccupational segregation entangles gender with job type
Taiwan CreditSex0.65ABSTAINPayment behavior correlates with credit risk through gendered economic patterns
Folktables IncomeRace0.82ABSTAINStructural economic inequality means race correlates with income through legitimate predictors
Folktables Public CoverageRace0.71ABSTAINInsurance coverage patterns reflect real disparities in access

Key observations:

  • Coherence correctly identified entanglement in every case
  • Attempting correction on these datasets would have degraded AUROC by 2–8%
  • Abstention prevented harm — no silent accuracy degradation shipped

The Safety Record

MetricValue
Datasets evaluated7
Correct ADJUST decisions3
Correct ABSTAIN decisions4
Harmful interventions shipped0
False safety claims0
Accuracy degradation on corrected datasets< 0.002 AUROC

The system's safety record is perfect: it never shipped a correction that harmed accuracy, and it never failed to flag a dataset where correction would have been dangerous.

05 Production Design: Attribute-Free, Sub-Millisecond, Auditable

Attribute-Free Inference

A critical design constraint: GBP-Audit does not require the protected attribute at inference time.

The calibration phase uses protected attributes (from a held-out calibration set) to compute the bias direction and optimal τ. But the resulting correction — the projection matrix P_perp and severity τ — operates purely on the model's internal representations.

# Calibration (offline, one-time)
bias_direction = compute_bias_direction(calibration_data, protected_attr)
b_perp = orthogonalize(bias_direction, task_direction)
P_perp = outer(b_perp, b_perp) / norm(b_perp)²
tau = grid_search(P_perp, calibration_data, guardrails)

# Inference (online, per-request)
h = model.penultimate_layer(input)        # No protected attribute needed
h_corrected = h - tau * (P_perp @ h)       # Matrix multiply + subtract
prediction = model.final_layer(h_corrected)

This means:

  • No demographic data collected at serving time — privacy-preserving by design
  • No disparate treatment risk — the correction doesn't branch on protected attributes
  • Regulatory compliance — satisfies requirements that prohibit using protected attributes in decisions

Sub-Millisecond Latency

The inference-time correction is a single matrix multiplication and vector subtraction:

OperationComplexityTypical Latency
Penultimate extractionAlready computed0 ms additional
P_perp @ hO(d²) where d = embedding dim< 0.1 ms
h - τ × resultO(d)< 0.01 ms
Total overhead< 0.5 ms

For typical embedding dimensions (64–512), the correction adds negligible latency to the inference path.

Governance Packets

Every GBP-Audit decision produces a machine-readable governance packet — a complete audit trail that can be handed to regulators, compliance officers, or external auditors:

{
  "dataset": "bank_marketing",
  "protected_attribute": "age",
  "coherence_kappa": 0.08,
  "decision": "ADJUST",
  "tau": 0.65,
  "guardrails": {
    "tost_auroc": { "passed": true, "baseline": 0.912, "corrected": 0.910, "delta": 0.01 },
    "ece_cap": { "passed": true, "baseline": 0.034, "corrected": 0.032 },
    "fpr_drift": { "passed": true, "max_drift": 0.008 },
    "multi_seed": { "passed": true, "auroc_std": 0.002 },
    "fairness_improvement": { "passed": true, "eo_gap_before": 0.087, "eo_gap_after": 0.023 }
  },
  "hash": "sha256:9f3a..."
}

Regulatory Alignment

GBP-Audit's governance packets map directly to regulatory requirements:

RegulationRequirementGBP-Audit Evidence
NYC LL 144Bias audit for automated employment decisionsCoherence analysis + guardrail results
EU AI ActRisk assessment for high-risk AI systemsFull governance packet with reproducibility proof
SR 11-7 (OCC/Fed)Model risk management for bankingAUROC non-inferiority, calibration cap, FPR drift bounds
CFPB Fair LendingAdverse action documentationPer-attribute coherence + intervention decision rationale

06 Real-World Scenario: Banking Fair-Lending Review

The Situation

Amy is a Risk & Model Governance Manager at a mid-size bank. CFPB/OCC examiners have requested evidence that the bank's credit approval model is fair and that any post-hoc fixes don't secretly hurt model quality. She has 90 days.

The production model is frozen — retraining would take months and require cross-team approvals. Amy needs a solution that works post-hoc on the existing model.

Day 1: Discovery

The compliance team identifies a 12% age-based disparity in approval rates. Applicants over 55 are approved at significantly lower rates than younger applicants with similar credit profiles.

Traditional approach: retrain the model with age-blinded features. Timeline: 6+ months. Cost: significant engineering resources plus re-validation of the entire model.

Day 2: GBP-Audit Analysis

Amy runs GBP-Audit on a calibration split from the bank's validation data:

Step 1: Extract penultimate representations from frozen model
Step 2: Compute task direction (approved vs. rejected)
Step 3: Compute bias direction (over-55 vs. under-55)
Step 4: Measure coherence → κ = 0.11 (low — proxy bias)

Result: The age signal in the model's representations is largely orthogonal to the credit-risk signal. The model learned an age shortcut that doesn't contribute to accurate credit decisions. Safe to correct.

Day 3: Correction & Validation

GBP-Audit runs the full pipeline:

1. Orthogonalize bias direction against task direction → b_perp
2. Grid search τ from 0.0 to 1.0 in steps of 0.02
3. At τ = 0.58, all five guardrails pass:
   ✓ TOST AUROC: 0.891 → 0.889 (within δ = 0.01)
   ✓ ECE: 0.031 → 0.029 (improved)
   ✓ FPR drift: max 0.007 across age groups (< 0.03)
   ✓ Multi-seed: σ(AUROC) = 0.0018 (< 0.005)
   ✓ Fairness: EO gap 12.1% → 3.2% (improved)

Decision: ADJUST with τ = 0.58

Day 4: Regulatory Submission

Amy submits the governance packet to the CFPB/OCC examiners. It contains:

  • Baseline metrics (pre-correction)
  • Coherence analysis showing bias was proxy-driven
  • All five guardrail results with confidence intervals
  • Post-correction metrics showing 74% disparity reduction with negligible accuracy impact
  • Reproducibility hash for exact replication

Outcome

MetricTraditional RetrainingGBP-Audit
Time to resolution6+ months3 days
Model changes requiredFull retrainPost-hoc layer
Accuracy impactUnknown until retrainedVerified < 0.002 AUROC
Audit trailManual documentationMachine-readable governance packet
ReversibilityIrreversibleFully reversible (remove projection)
Protected attribute at inferenceMay be neededNot needed

The correction is a lightweight post-hoc layer that can be toggled on or off without touching the production model. If new data suggests the correction is no longer appropriate, it can be removed instantly.

07 Conclusion

The standard approach to AI fairness assumes that all bias should be removed. GBP-Audit challenges this assumption with a simple geometric insight: bias that aligns with the task direction is not a bug — it's a reflection of real-world structure. Removing it doesn't create fairness; it creates inaccuracy.

The audit-then-act framework changes the question from "how do we remove this bias?" to "should we remove this bias?" Geometric coherence (κ) provides a principled answer, and five guardrails ensure that corrections that do ship are safe.

Across 7 datasets, GBP-Audit achieved a perfect safety record:

  • 3 datasets corrected with up to 73% disparity reduction and zero meaningful accuracy loss
  • 4 datasets correctly abstained — preventing interventions that would have degraded AUROC by 2–8%
  • Zero harmful interventions shipped

The system's most important output is not the correction — it's the abstention. When GBP-Audit says "do not correct this bias," it is protecting both the model's predictive quality and the populations it serves. An inaccurate model helps no one.

This is the safety-first paradigm: the default is to do nothing. Corrections must earn their deployment by passing every guardrail. Abstention is not failure — it is the system working as designed.

Developed in collaboration with Dr. Danda B. Rawat at the Howard University Data Science & Cybersecurity Center.

Know when to fix bias — and when not to

GBP-Audit uses geometric coherence to distinguish proxy bias (safe to remove) from task-aligned bias (dangerous to remove), then applies corrections only when five rigid guardrails pass. The result: ADJUST, ABSTAIN, or REVIEW — never a blind fix. Validated on 7 datasets across lending, employment, and public policy.

Projects & Research — Haske Labs