Designing a Task Lifecycle V1 from Scratch
On the workflow platform I contribute to, tasks used to live in a flat mental model: a single status column, metadata scattered across ad-hoc fields, and no durable notion of pipeline stages beyond whatever the UI happened to imply. Evaluations kicked off scripts and dashboards reacted, but the database could not answer simple forensic questions—which model revision produced which artifact, at what point in the lifecycle, with what lineage through each gate. Operators and engineers shared the burden of Slack archaeology and spreadsheet glue.
This post is about designing and shipping a replacement from the ground up: a ten-table relational schema keyed around immutable versions, dual append-only event streams, explicit step-level state machines, and a typed REST surface—landed not as one heroic diff but as six stacked pull requests so each layer stayed reviewable and test-backed.
The problem in practice
The legacy task system modeled “where is this?” as a lone enum-ish string plus timestamps. Version history lived in brittle copies or nowhere. When an evaluation pipeline ran, there was no first-class structure for recording trials, models, or per-stage outcomes in a queryable shape. Debugging meant inferring causality from logs and praying the deploy tag matched reality.
The goal for V1 was deliberately narrow but strict: persist a task’s lifecycle as data you can traverse, correlate prompts and payloads to specific steps and evaluations, separate linear task progression from parallel eval work, and expose every mutation path through validators and handlers that behave the same whether called from cron, admin tools, or a future dashboard.
Ten tables, one coherent story
At the root sits TaskV1, the outward-facing aggregate. Rather than caching a duplicated status enum on the row, the task holds a foreign key pointer currentStepId to whichever StepV1 represents the active stage. “Status” becomes a projection: read the attached step’s state machine value, optionally join for display—but never drift from truth by forgetting to update two places.
TaskVersionV1 is the authoritative snapshot envelope. Writes that mean “the task advanced” mutate the active version inside a transaction—always paired with events—so dashboards can diff “what changed between version n and n + 1?” without relying on brittle before/after JSON blobs scattered in application memory.
Within a version, StepV1 rows implement a thirteen-state lifecycle (ordering enforced in TypeScript constants, no fragile position columns that invite reordering accidents). Events land in StepEventV1: append-only, ordered, representing the linear task pipeline—the constraint that exactly one operational step dominates at a time maps naturally to sequential history.
Prompt payloads and lineage need fast listing without exploding joins. PromptV1 carries the text-ish content alongside denormalized model / agent identifiers so dashboards can filter and sort without parsing JSON blobs. Artifact storage stays out of Postgres except for pointers: DataV1 unifies golden, source-side, calibrated, generated, and delivery-shaped blobs behind a typed discriminator and filePath into object storage.
Evaluation crosses a separate axis. EvalV1 anchors a run with trial count k and timestamps. Each concrete attempt is TrialV1, with a composite uniqueness rule on (evalId, k, model) so parallel model entries cannot collide silently. Trials emit into EvalStepEventV1—structurally akin to StepEvent—but owned by the eval subgraph so churn in one pipeline never poisons timelines meant for the other.
Finally, long-running inference style jobs persist as InsightV1: model, agent, provider as first-class varchar columns again (not tucked into JSON), surviving even when a downstream trial row is retracted—so cancellations and rework do not erase the record that work was attempted.
Relationships in Prisma (excerpt)
The centerpiece relationship is triangular: Task points at the live step and “hangs off” exactly one conceptual version envelope; steps belong to versions but carry duplicated taskId for hot paths.
model TaskV1 {
id String @id @default(cuid())
currentStepId String? @unique
activeVersionId String @unique
activeVersion TaskVersionV1 @relation("ActiveEnvelope", fields: [activeVersionId], references: [id])
currentStep StepV1? @relation("TaskCurrentStep", fields: [currentStepId], references: [id])
ownedVersions TaskVersionV1[] @relation("TaskVersions")
steps StepV1[] @relation("StepOwnership")
}
model TaskVersionV1 {
id String @id @default(cuid())
taskId String
task TaskV1 @relation("TaskVersions", fields: [taskId], references: [id])
activeForTask TaskV1? @relation("ActiveEnvelope")
steps StepV1[]
}
model StepV1 {
id String @id @default(cuid())
taskId String
versionId String
task TaskV1 @relation("StepOwnership", fields: [taskId], references: [id])
version TaskVersionV1 @relation(fields: [versionId], references: [id])
pinnedAsCurrentForTask TaskV1? @relation("TaskCurrentStep")
status String // thirteen-state lifecycle, varchar deliberately
}
Prisma receives string fields instead of enums at the persistence layer—we traded compile-time tightening for velocity while the ontology still moved weekly.
Key design decisions
Separate event pipelines. Task steps admit a single authoritative timeline; evaluations fan out concurrent trials with their own branching. Merging both into one *_events table would have forced awkward polymorphism and invited cross-namespace bugs. Parallel eval noise stays out of operational step history.
Transactional write order. On a validated transition we always append history first—only then mutate TaskVersionV1.currentStepId (conceptually aligning the envelope) and finally repoint TaskV1.currentStepId. If any leg fails, the database rolls back; there is never a task pointing at a step whose transition never logged.
Denormalization for read paths. taskId repeats on Step, StepEvent, Prompt, and Data so common API responses avoid mandatory hops through versions for OLTP workloads that only care “what belongs to this task.”
Queryable columns instead of blobs. Model, agent, provider live as discrete columns—the extra width pays for indexing, equality filters, and human-readable CSV exports without brittle JSON path queries.
Transition validation before touch
Every PATCH-style route resolves the targeted step’s current status through a centralized map keyed by semantic step kind. Invalid attempts return HTTP 409 with the list of permissible next tokens so clients can recover without scraping error strings.
const STEP_TRANSITIONS: Record<StepKind, Partial<Record<StepStatus, StepStatus[]>>> = {
ingestion: {
queued: ["running", "failed"],
running: ["awaiting_review", "failed"],
awaiting_review: ["approved", "rejected"],
// ...
},
// ...
};
export function isValidTransition(
kind: StepKind,
from: StepStatus,
to: StepStatus
): boolean {
return STEP_TRANSITIONS[kind]?.[from]?.includes(to) ?? false;
}
export function allowedNextStates(kind: StepKind, from: StepStatus): StepStatus[] {
return STEP_TRANSITIONS[kind]?.[from] ?? [];
}
Handlers call isValidTransition before opening side effects; the 409 body serializes { error: "invalid_transition", allowed: allowedNextStates(...) } for predictable tooling.
Transactional choreography
The following sketch mirrors production shape: nested inside $transaction, inserts and pointer updates succeed atomically.
async function advanceStep(
prisma: PrismaClient,
input: { taskId: string; versionId: string; nextStepId: string; payload: Record<string, unknown> }
) {
return prisma.$transaction(async (tx) => {
await tx.stepEventV1.create({
data: {
taskId: input.taskId,
type: "step_transition_requested",
payload: input.payload,
},
});
await tx.taskVersionV1.update({
where: { id: input.versionId },
data: { currentStepId: input.nextStepId },
});
return tx.taskV1.update({
where: { id: input.taskId },
data: { currentStepId: input.nextStepId },
});
});
}
In the shipped code the active version resolves from the task envelope before the transaction; the salient guarantee is ordering and single-transaction scope.
Handler template pattern
Repeated CRUD-ish routes funnel through createHandler: Zod parses, domain checks run, then Prisma executes. Middleware injects tenancy and actor context uniformly.
import { z } from "zod";
type HandlerCtx = { prisma: PrismaClient; actor: Actor };
export function createHandler<T extends z.ZodType>(opts: {
schema: T;
run: (args: z.infer<T>, ctx: HandlerCtx) => Promise<unknown>;
}) {
return async (rawBody: unknown, ctx: HandlerCtx) => {
const parsed = opts.schema.safeParse(rawBody);
if (!parsed.success) return { ok: false, status: 400, body: parsed.error.flatten() };
try {
const result = await opts.run(parsed.data, ctx);
return { ok: true, status: 200, body: result };
} catch (err) {
if (err instanceof ConflictError)
return { ok: false, status: 409, body: { error: err.message, allowed: err.meta?.allowed } };
throw err;
}
};
}
Golden tests instantiate handlers with ephemeral databases; the schema layer stays the choke point for regressions when new endpoints appear.
Six stacked PRs, forty-plus routes
Rather than one eighteen-thousand-line branch, implementation landed mechanically:
- PR 1 — Pipeline infrastructure: registries for thirteen step states and ten gate states, typed transition matrices, shared Zod building blocks, the
createHandlerscaffolding, plus 119 harness tests guarding invariants. - PR 2 — Tasks + Versions: seven REST routes spanning create/read/update patterns for aggregates, nineteen tests.
- PR 3 — Steps + Events: six routes with transition gates returning 409 when maps disagree, fourteen tests validating edge cases.
- PR 4 — Prompts + Data: four routes for payload CRUD wired to typed paths, nine tests.
- PR 5 — Evals + Trials + Insights: ten routes covering orchestration subgraph concerns, fifteen tests.
- PR 6 — Auto-seed API: six routes bridging demo environments: taxonomy tables, guarded object-storage stubs, and runtime simulation clocks so integration tests approximate production cadence.
Combined: more than forty public routes and north of 176 automated tests—not counting migrations and fixtures. Each PR depended only on predecessors, enabling parallel review chatter even when sequencing merged strictly linearly.
Why stacking beat the mega-merge
A single rollout would have mixed foundational mistakes with superficial handler typos—reviewers skim, critical feedback arrives late. Small PRs anchored discussion: infra first (wrong map shapes break everything downstream), CRUD afterward (mostly boring once patterns exist). Blocking on PR 3 did not prevent colleagues from commenting on schema docs introduced in PR 1. Automated suites stayed green per slice, shrinking bisection pain when regressions surfaced.
Auto-seed integration
Demo tenants needed believable timelines without manual click-through. Auto-seed registers stage identifiers mapping to canned object-storage prefixes, upload ceilings, and legacy alias strings left over from early pilots. Runtime simulation jitter adds realistic pacing for dashboards; optimistic review semantics include an explicit rework path so “accepted then revised” narratives remain consistent with transactional rules.
The registry plugs into the same transition validators as production—demo data is not a special code path bolted sideways.
Closing
Good lifecycle design is boring on paper and dramatic in hindsight: immutable versions explain the past, FK-derived status avoids double-bookkeeping, split event logs respect genuinely different concurrency modes, and a thin REST layer enforces discipline at the boundary. Breaking delivery into stacked PRs cost a little upfront planning and saved weeks of rework when reviews caught mismatched assumptions early.
If you are contemplating a similar reset, sketch the timelines you need separately for linear workflows versus parallel experimentation first—the shape of your event tables—and only then chase CRUD ergonomics. Once the narratives fit the schema, handlers become assembly.