Back to governed-rank
5 Chapters

governed-rank Tutorial

A progressive guide from a 3-line quick start to advanced gap calibration and isotonic projection.

pip install governed-rank

01Quick Start with govern()

govern() is the simplest entry point. Three arguments, one function call, and you have a governed ranking with a full audit trail.

Minimal Example

pythonfrom mosaic import govern

result = govern(
    base_scores={"doc1": 0.95, "doc2": 0.88, "doc3": 0.72, "doc4": 0.60, "doc5": 0.45},
    steering_scores={"doc1": -0.4, "doc2": 0.1, "doc3": 0.9, "doc4": 0.3, "doc5": 0.7},
    budget=0.30,
)

Arguments

ParameterTypeDescription
base_scoresdict, list, or np.ndarrayThe base ranker's scores. Higher = more relevant.
steering_scoresdict, list, or np.ndarrayThe policy signal. Higher = more desirable per your objective.
budgetfloat (0.0-1.0)Fraction of adjacent-pair edges to protect. Default 0.30.

Reading the Result

govern() returns a GovernResult:

pythonprint(result.ranked_items)          # reranked item order
print(result.scores)                # {item: final_score} mapping
print(result.n_protected_edges)     # how many ordering edges were locked
print(result.n_active_constraints)  # how many constraints were binding
print(result.projection_coeff)     # alignment between steering and base

Per-Item Receipts

Every item gets a GovernReceipt — a full audit trail:

pythonfor receipt in result.receipts:
    print(f"Item: {receipt.item}")
    print(f"  Base rank: {receipt.base_rank} -> Final rank: {receipt.final_rank}")
    print(f"  Base score: {receipt.base_score:.3f}")
    print(f"  Steering score: {receipt.steering_score:.3f}")
    print(f"  Orthogonalized steering: {receipt.orthogonalized_steering:.3f}")
    print(f"  Final score: {receipt.final_score:.3f}")

The receipt shows exactly why each item moved (or didn't). orthogonalized_steering is the steering signal after interference with the base score has been removed.

Domain Examples

Content Moderation — demote toxic content without hurting engagement:

pythonresult = govern(
    base_scores=engagement_scores,
    steering_scores=toxicity_penalties,
    budget=0.30,
)

Fairness — promote underrepresented groups without sacrificing quality:

pythonresult = govern(
    base_scores=quality_scores,
    steering_scores=fairness_boosts,
    budget=0.30,
)

RAG Safety — steer retrieval toward grounded documents:

pythonresult = govern(
    base_scores=retrieval_similarity,
    steering_scores=groundedness_scores,
    budget=0.50,
)

02Understanding Orthogonalization

Why base + lambda * policy Fails

The naive approach to multi-objective ranking is:

final_score = base_score + lambda * policy_score

This fails because the policy signal is often correlated with the base score. When you add a correlated signal, you don't steer — you amplify. Items that were already ranked high get ranked higher. Items that need to move don't move.

When the correlation is negative, you actively break the ranking with unpredictable changes.

The Fix: Orthogonalization

orthogonalize_against_base() removes the component of the steering signal that is aligned with the base scores:

u_perp = u - (u . s / s . s) * s

After this projection, the steering signal has zero correlation with the base scores. It can only move items in directions the base ranker has no opinion about.

Standalone Usage

pythonfrom mosaic.orthogonalization import orthogonalize_against_base

base = {"a": 0.9, "b": 0.7, "c": 0.5, "d": 0.3, "e": 0.1}
steering = {"a": 0.8, "b": 0.6, "c": 0.4, "d": 0.2, "e": 0.0}

orth = orthogonalize_against_base(base_scores=base, steering_utilities=steering)

OrthogonalizationResult Fields

pythonprint(orth.projection_coeff)    # alignment of steering with base
print(orth.corr_before)         # Pearson correlation before
print(orth.corr_after)          # Pearson correlation after (~0.0)
print(orth.u_magnitude_before)  # RMS of steering signal before
print(orth.u_magnitude_after)   # RMS after (shrinks if signals were aligned)
print(orth.u_perp)              # {item: orthogonalized_score} dict

Interpreting projection_coeff: A high value means the steering signal was strongly aligned with base scores — most of what it was saying was already captured by the base ranker. Orthogonalization removed that redundant component, leaving only the genuinely new information.

Computing Target Scores

After orthogonalization, combine base scores with the orthogonalized steering:

pythonfrom mosaic.orthogonalization import compute_target_scores

target = compute_target_scores(base_scores=base, u_perp=orth.u_perp)
# target[item] = base_scores[item] + u_perp[item]

These target scores represent where items should rank if the steering signal were fully applied. The isotonic projection step then finds the closest feasible ranking that respects the protected edges.

03The Budget Knob

What Budget Controls

The budget parameter (0.0 to 1.0) controls what fraction of adjacent-pair ordering edges are protected — locked so steering cannot reverse them.

  • budget=0.0 — No edges protected. Full reordering. Maximum policy effect.
  • budget=0.30 — The 30% of edges with the largest base-score gaps are locked. Good default.
  • budget=1.0 — All edges protected. No reordering. Output matches base ranking exactly.

The edges with the largest gaps are where the base ranker is most confident. Protecting these first gives maximum accuracy preservation for any budget level.

Sweeping the Budget

To see how budget affects your ranking, sweep from 0.0 to 1.0:

pythonfrom mosaic import govern

base = {"a": 0.95, "b": 0.85, "c": 0.70, "d": 0.55, "e": 0.40}
steer = {"a": -0.3, "b": 0.1, "c": 0.8, "d": 0.5, "e": 0.9}

for budget in [0.0, 0.10, 0.20, 0.30, 0.50, 0.70, 1.0]:
    r = govern(base, steer, budget=budget)
    print(f"budget={budget:.2f}  ranking={r.ranked_items}  "
          f"protected={r.n_protected_edges}  active={r.n_active_constraints}")

At low budget, the ranking shifts significantly toward the steering objective. As budget increases, more of the base ordering is locked in place.

Standalone Edge Protection

You can compute protected edges without running the full pipeline:

pythonfrom mosaic.gap_calibration import get_protected_edges_by_budget

base_order = ["a", "b", "c", "d", "e"]
base_scores = {"a": 0.95, "b": 0.85, "c": 0.70, "d": 0.55, "e": 0.40}

protected = get_protected_edges_by_budget(
    base_order=base_order,
    base_scores=base_scores,
    budget_pct=0.30,
    max_rank=50,
)
print(f"Protected edge indices: {protected}")

Budget Guidance by Domain

DomainSuggested BudgetRationale
E-commerce recommendations0.20-0.40Moderate protection; engagement is forgiving
Content moderation0.20-0.40Must steer; some disruption is acceptable
Fairness / compliance0.20-0.40Needs real steering; tight budget defeats purpose
Healthcare / clinical0.50-0.70High stakes; protect clinical accuracy aggressively
RAG / retrieval safety0.40-0.60Balance relevance with groundedness
Fraud review queues0.30-0.50Must catch fraud; moderate protection

04Gap Calibration

Why Raw Score Gaps Aren't Probabilities

When the base ranker scores item A at 0.82 and item B at 0.80, the gap is 0.02. But how confident is the ranker that A truly belongs above B? That depends on the ranker's calibration — a 0.02 gap from a well-calibrated model means something different than 0.02 from a poorly-calibrated one.

Gap calibration learns the mapping from score gaps to correctness probabilities: P(A > B | gap).

Building Training Pairs

You need historical baskets (sets of items a user interacted with) and a base scoring function:

pythonfrom mosaic.gap_calibration import extract_pairs_pos_neg

def my_scorer(positives, negatives):
    """Score items given context. Returns {item_id: score}."""
    all_items = positives + negatives
    # ... your model scoring logic ...
    return {item: score for item, score in zip(all_items, scores)}

pairs = extract_pairs_pos_neg(
    baskets=historical_baskets,
    base_scorer=my_scorer,
    n_items=total_item_count,
    n_baskets=5000,
    neg_per_pos=10,
    max_rank=50,
    rng_seed=42,
)
# pairs: List[Tuple[float, bool]] — (score_gap, was_correct_order)

Learning the Calibration Curve

pythonfrom mosaic.gap_calibration import learn_gap_calibration, CalibrationConfig

calibration = learn_gap_calibration(
    pairs=pairs,
    config=CalibrationConfig(
        n_buckets=30,
        min_samples_per_bucket=50,
        monotonic=True,
    ),
)

calibration.print_summary()

CalibrationResult

python# Look up confidence for a specific gap
conf = calibration.gap_to_conf(0.05)
print(f"Gap 0.05 -> {conf:.1%} confidence")

# Inverse: what gap gives 90% confidence?
gap = calibration.conf_to_gap(0.90)
print(f"90% confidence requires gap >= {gap:.4f}")

# Save/load for production
calibration.save("calibration_model.json")
loaded = CalibrationResult.load("calibration_model.json")

Using Calibration with Edge Protection

Threshold mode — protect edges where confidence exceeds rho:

pythonfrom mosaic.gap_calibration import get_protected_edges

protected = get_protected_edges(
    base_order=sorted_items,
    base_scores=scores,
    calibration=calibration,
    rho=0.90,
)

Budget mode — protect a fixed fraction of edges, prioritized by gap size:

pythonfrom mosaic.gap_calibration import get_protected_edges_by_budget

protected = get_protected_edges_by_budget(
    base_order=sorted_items,
    base_scores=scores,
    calibration=calibration,
    budget_pct=0.30,
    max_rank=50,
)

Budget mode is the default in govern(). Threshold mode is available when you have a trained calibration model and want confidence-based protection.

05Isotonic Projection

What It Does

After orthogonalization gives you target scores and edge protection gives you constraints, the isotonic projection finds the closest feasible ranking — the one that maximizes policy effect while respecting every protected edge.

Standalone Usage

pythonfrom mosaic.isotonic_projection import isotonic_project_on_runs

base_order = ["a", "b", "c", "d", "e"]
target_scores = {"a": 0.7, "b": 0.9, "c": 0.6, "d": 0.8, "e": 0.5}
protected_edges = [1, 3]  # protect edges b->c and d->e

proj = isotonic_project_on_runs(
    base_order=base_order,
    target_scores=target_scores,
    protected_edges=protected_edges,
)

ProjectionResult Fields

pythonprint(proj.z)                     # {item: final_score} after projection
print(proj.n_constraints)         # number of protected edges
print(proj.n_active_constraints)  # how many constraints were binding
print(proj.pooled_blocks)         # items pooled together to satisfy constraints
print(proj.n_pre_violations)      # protected edges the target scores violated

Pooled blocks: When the target scores want to reverse a protected edge, the PAV algorithm pools those items together and gives them the weighted average score. More pooling = the steering signal was fighting the base ranking harder.

Computing the Final Ranking

pythonfrom mosaic.isotonic_projection import compute_final_ranking

final_order = compute_final_ranking(z=proj.z, base_order=base_order)
# Sorts by descending z with stable tie-breaking by base order position

Stable tie-breaking means that when two items have the same projected score (because they were pooled), they keep their original base ordering. This ensures deterministic, reproducible rankings.

The PAV Algorithm

The Pool Adjacent Violators (PAV) algorithm is a classic method from isotonic regression. It works in a single O(N) pass:

  1. Walk left to right through items in base order
  2. If the current item's target score violates a protected constraint with its predecessor, pool them together and replace both scores with their weighted average
  3. Continue: the pooled block may now violate a constraint with its predecessor, so merge again if needed
  4. Result: a monotone sequence on every protected run, as close as possible (in squared error) to the original target scores

Ready to govern your rankings?

99 KB. Minimal dependencies. Apache 2.0.