When Display Labels Break Sorting
On the submissions page in the admin panel, operators see a unified “Status” column. Instead of exposing raw pipeline stage codes, the UI shows human-readable labels—“Checking Quality,” “Quality Check Failed,” “Generating Output,” “Expert Review Active,” and so on—so reviewers can scan the table quickly. Sorting seemed like an obvious enhancement: click the header, group rows by that same notion of progress. What shipped first looked plausible: each cell showed the right label. The sort arrow toggled. And yet the rows did not stabilize into meaningful bands. Identical-looking statuses interleaved. Newest tasks did not float to the top within a band. Occasionally, refreshing the page reshuffled who appeared in the viewport at all. This post walks through three separate bugs that compounded into that behavior—and how seventy-eight new tests and a tight set of code changes unwound it.
The symptom: a column that “worked” but did not sort
The task system persists canonical stage and status fields (plus optional failure metadata). Presentation maps those primitives into unified labels shared between list views and detail screens. Separately, sort mode for the status column derives a numeric rank per row so the table can order consistently in memory without pushing presentation strings into SQL. On paper that split is sound: keep storage normalized, keep display friendly, keep sort keys comparable. In practice the rank function and the label function diverged. The label path consulted review outcomes and background job state; the rank path did not. Users experienced the worst possible failure mode: every affordance suggested the feature worked until someone actually tried to use sort to triage work.
Root cause one: rank ignored derived state
The original computeUnifiedRank accepted (stage, status, failureReason). That is sufficient only if each (stage, status) pair maps to exactly one user-visible status. It does not. The same database tuple can mean different things depending on whether an automated review passed, failed, or is still running, and on whether worker runs exist for output generation or expert-quality checks. For example, a pre-quality-assurance stage marked ACTIVE might display as “Checking Quality” while a review is in flight, “Quality Check Failed” after a failed check, or “Quality Check Passed” once the gate clears. All three rows shared one rank, so when sorted they collided and interleaved arbitrarily relative to one another—exactly the shuffle operators reported.
The fix was to thread the same derived inputs the label layer already used. computeUnifiedRank now takes reviewPassed (a boolean or tri-state derived from review records) and workerRuns (minimal info about active or completed background runs). With those in place we assigned fourteen distinct ranks, 0 through 13, so every pipeline status the UI can show maps to a unique ordering key. Pre-QA splits into three separate ranks instead of one. Output generation and expert-quality review each get dedicated ranks driven by whether relevant worker runs are present, so “waiting on automation” and “idle at the same stage” no longer compare equal.
// Before: same rank for every variant of a stage/status pair
function computeUnifiedRankBroken(
stage: PipelineStage,
status: TaskStatus,
failureReason: string | null
): number {
// ...
// PRE_QA + ACTIVE always returned one rank, regardless of review outcome
}
// After: rank matches the same derived state as the display label
function computeUnifiedRank(
stage: PipelineStage,
status: TaskStatus,
failureReason: string | null,
reviewPassed: boolean | null,
workerRuns: { hasOutputJob: boolean; hasExpertCheckJob: boolean }
): number {
// ...
// PRE_QA + ACTIVE branches on reviewPassed → three distinct ranks
// Output generation vs idle uses workerRuns.hasOutputJob
}
Backward compatibility mattered: internal scripts and older API layers still call the rank helper without the new parameters. Optional arguments with conservative defaults preserve reasonable ordering for those callers while the submissions page passes the full derived context.
Root cause two: the tiebreaker was a CUID
After ranks separate coarse buckets, rows in the same bucket should sub-sort by recency—typically newest task first, matching how operators scan “what just moved.” The comparator’s secondary key was accidentally the string task primary key. Those identifiers are opaque CUIDs: lexicographic order has no relationship to creation time. Within a status group, the table looked random even when rank was finally unique. The fix keeps rank as the primary key, uses the task’s createdAt (or domain-equivalent timestamp) as the secondary key—newest first—and falls back to task id only as a stable tiebreaker when two rows share an identical timestamp.
function compareByUnifiedStatus(
a: TaskSortRow,
b: TaskSortRow,
direction: "asc" | "desc"
): number {
const ra = computeUnifiedRank(
a.stage,
a.status,
a.failureReason,
a.reviewPassed,
a.workerRuns
);
const rb = computeUnifiedRank(
b.stage,
b.status,
b.failureReason,
b.reviewPassed,
b.workerRuns
);
const rankCmp = direction === "asc" ? ra - rb : rb - ra;
if (rankCmp !== 0) return rankCmp;
const da = a.createdAt?.getTime() ?? 0;
const db = b.createdAt?.getTime() ?? 0;
const dateCmp = direction === "asc" ? da - db : db - da;
if (dateCmp !== 0) return dateCmp;
return a.id.localeCompare(b.id);
}
Null-dated rows need an explicit policy: we treat missing timestamps as 0 so they sort predictably to one end of the list and tests lock that behavior in.
Root cause three: non-deterministic windowing
Unified status sort is applied in memory with a hard cap—five thousand rows fetched from the database before ranking. That keeps admin queries bounded. The subtle bug was the initial window query: without an orderBy, the database is free to return any five thousand rows when the table is larger than the cap. Different plans, cache pressure, or concurrent writes could change which slice you see, so two refreshes of the same filter produced different apparent orderings at the tail. Adding a deterministic orderBy to the window query—aligned with the product’s default list ordering—makes the cap honest: you always process the same prefix of rows for a given filter, then sort that fixed set by unified status.
Feeding the rank: a targeted Prisma select
Correct ranks require data that was not on the original DTO. We expanded the Prisma select for the admin list load to pull only what rank computation needs: a narrow shape on related reviews (enough to derive reviewPassed) and a lightweight list of workerRuns flags or counts—not full relation graphs, not heavy JSON blobs. That keeps the query cheap while making the sort key honest. The rule is simple: if the label can branch on a field, the rank function must see it or intentionally document a different ordering contract.
const rows = await prisma.task.findMany({
where: listWhere,
take: 5000,
orderBy: { createdAt: "desc" },
select: {
id: true,
stage: true,
status: true,
failureReason: true,
createdAt: true,
reviews: {
select: { passed: true, completedAt: true },
orderBy: { completedAt: "desc" },
take: 1,
},
workerRuns: {
select: { kind: true, state: true },
where: { state: { in: ["PENDING", "RUNNING", "SUCCEEDED"] } },
},
},
});
Application code maps reviews and workerRuns into the compact reviewPassed and workerRuns struct the rank helper expects, mirroring the label pipeline so the two cannot drift silently.
Tests: seventy-eight new guards
Regressions in sort logic are easy to reintroduce because they look like small refactors. The test suite now covers: rank splits for all fourteen unified statuses; backward compatibility for callers that omit new parameters; an invariant that no two distinct visible statuses share a rank; ascending and descending direction; date sub-sort within the same rank; and null handling for missing dates and missing reviews. One representative case encodes the pre-QA split explicitly—two tasks with identical stage and status but opposite review outcomes must produce different ranks and order predictably under the date tiebreaker. Property-style checks iterate every enumerated pair of unified labels to prove the rank injective for display purposes—if two ranks match, they must correspond to the same user-visible bucket. Separate cases flip sort direction end to end so we never regress to “works only descending.” Snapshot tests stayed deliberately out of scope: numeric ranks and comparator outputs are asserted directly so refactors rename strings without rewriting brittle golden files.
it("splits PRE_QA ACTIVE by review outcome", () => {
const base = {
stage: "PRE_QA" as const,
status: "ACTIVE" as const,
failureReason: null,
createdAt: new Date("2026-05-01T12:00:00Z"),
workerRuns: { hasOutputJob: false, hasExpertCheckJob: false },
};
const inProgress = {
...base,
id: "task_in_progress",
reviewPassed: null,
};
const failed = {
...base,
id: "task_failed",
reviewPassed: false,
};
expect(computeUnifiedRankForRow(inProgress)).not.toBe(
computeUnifiedRankForRow(failed)
);
});
After deploy, we asked operators to spot-check a few saved filters; no schema migration was required because the fix is read-path only. Going forward, any pull request that changes how a unified label reads should show the matching diff to computeUnifiedRank or an explicit note that sort semantics are intentionally unchanged—that cheap review habit keeps presentation and ordering from drifting apart again.
The lesson
Display labels and sort order are separate concerns, but they must consume the same facts whenever a label is computed from derived state. If the UI shows “Quality Check Failed” because a review row says so, the sort key must consult that same review signal. When it does not, you get the most insidious class of bug: the feature appears implemented—headers click, arrows flip, cells read well—while the ordering is meaningless. Investing in deterministic data windows, intentional tiebreakers, and exhaustive tests turns that ambiguity into ordinary, reviewable correctness work.