The platform I work on routes training tasks through a trainer review gate before they enter the corpus. For a long time, that gate was entirely manual: a human reviewer looked at the work and clicked approve or reject. There were no automated quality checks. Reviewers caught obvious structural problems, but the process was slow, inconsistent, and entirely dependent on human attention at every submission.
I built a QC orchestration service that sits between submission and human review. When configured, it dispatches work to external AI quality agents for automated validation — structure checks, completeness checks, format compliance — and only advances to human review after the agents pass. When QC is not configured, the pipeline falls back to the original direct approve/reject path with zero behavior change. The implementation spans 27 files and roughly 3,766 lines of TypeScript, centered on an 829-line orchestration service, signed webhook callbacks, and new state machine transitions. This post walks through the flow, the service design, and why I chose config-driven dispatch over hard-coded agent wiring.
The QC orchestration flow
The orchestration follows a six-stage pipeline that mirrors how other async workers in the platform operate — state transitions first, Cloud Tasks for execution, webhooks for external callbacks:
- Submit: The gate-review API receives a trainer submission and calls
loadQcStepConfig()to check whether QC is configured for this step. - Enqueue: If configured, the state machine transitions to
qc_pendingand enqueues a Cloud Tasks worker. - Dispatch: The worker claims the run as
qc_in_progressand sends task data plus artifacts to each configured AI quality agent via its trigger URL. - Webhook: Each agent calls back with results. The webhook route validates the signed URL and enqueues a result worker.
- Result processing: The result worker evaluates agent outcomes and transitions the task to
approvedorqc_error. - Fallback: If QC is not configured for the step, the route proceeds with direct approve/reject — identical to the pre-QC behavior.
Automated QC should be an opt-in layer, not a gate rewrite. When QC is disabled, the pipeline must behave exactly as it did before.
The async design matters because AI quality agents can take seconds to minutes. Dispatch and callback handling run in Cloud Tasks workers, not the user-facing request path. Reviewers never wait on agent inference during submission.
QcOrchestrationService
The core service lives in QcOrchestrationService — 829 lines that handle the full lifecycle from run creation through dispatch, polling, result processing, and error recovery. Two entry points anchor the design:
startTrainerGateQc()— called from the gate-review route when a trainer submits work for review.dispatchRun()— called by the Cloud Tasks worker to send packaged task data and artifacts to external agents.
Configuration is SQL-backed via app_configs.autoSeedQc. Each step can define its own agent list. Every agent has an id, name, triggerUrl, and a required boolean that determines whether a failure blocks approval:
interface QcAgentConfig {
id: string;
name: string;
triggerUrl: string;
required: boolean;
}
interface QcStepConfig {
enabled: boolean;
agents: QcAgentConfig[];
}
async function startTrainerGateQc(
ctx: RequestContext,
params: { taskId: string; version: number; stepId: string },
): Promise<QcRunResult> {
const config = await loadQcStepConfig(params.stepId);
if (!config?.enabled) {
return { skipped: true };
}
const qcRun = await createQcRun(ctx, {
taskId: params.taskId,
version: params.version,
stepId: params.stepId,
agents: config.agents,
});
await transitionManifest(ctx, params, 'qc_pending', {
qc: { runId: qcRun.id, status: 'pending', agents: config.agents },
});
await enqueueQcWorker({ qcRunId: qcRun.id, ...params });
return { skipped: false, qcRunId: qcRun.id };
}
dispatchRun() iterates the configured agents, packages task metadata and artifacts for each, and POSTs to the agent's trigger URL with a signed callback URL embedded in the payload. The service tracks per-agent status — pending, in_progress, completed, failed — and only finalizes the run once all required agents have reported back.
Signed webhook URLs
External agents need a way to POST results back to the platform. A plain callback URL would be trivially spoofable — anyone who guessed the path could inject fake QC results. I implemented HMAC-signed URLs with expiry in signed-url.ts (137 lines).
The webhook route follows a structured path that encodes every identifier needed to locate the run:
// Route: /api/webhook/auto-seed/qc/[taskId]/[version]/[targetType]/
// [stepId]/[qcRunId]/[agentId]/[sig]/[exp]
function buildSignedCallbackUrl(
baseUrl: string,
params: CallbackParams,
secret: string,
expiryDays: number,
): string {
const exp = Math.floor(Date.now() / 1000) + expiryDays * 86400;
const payload = [
params.taskId,
params.version,
params.targetType,
params.stepId,
params.qcRunId,
params.agentId,
exp,
].join(':');
const sig = createHmac('sha256', secret).update(payload).digest('hex');
return `${baseUrl}/api/webhook/auto-seed/qc/${params.taskId}/${params.version}/` +
`${params.targetType}/${params.stepId}/${params.qcRunId}/${params.agentId}/` +
`${sig}/${exp}`;
}
function verifyCallbackSignature(
params: CallbackParams,
sig: string,
exp: number,
secret: string,
): boolean {
if (Date.now() / 1000 > exp) return false;
const payload = [
params.taskId, params.version, params.targetType,
params.stepId, params.qcRunId, params.agentId, exp,
].join(':');
const expected = createHmac('sha256', secret).update(payload).digest('hex');
return timingSafeEqual(Buffer.from(sig), Buffer.from(expected));
}
Signature verification runs before any result processing. Expired or tampered callbacks are rejected with a 403. The default expiry is seven days (callbackExpiryDays), configurable per environment — long enough for slow agents, short enough to limit the replay window.
Task data packaging
Agents need more than a task ID. They need the artifacts, metadata, and context a human reviewer would see. The task-data.ts module (92 lines) reads from GCS manifests and assembles a structured payload for external consumption:
interface AgentTaskPayload {
taskId: string;
version: number;
stepId: string;
targetType: string;
metadata: Record<string, unknown>;
artifacts: ArtifactRef[];
callbackUrl: string;
}
async function packageTaskDataForAgent(
ctx: RequestContext,
params: DispatchParams,
agent: QcAgentConfig,
): Promise<AgentTaskPayload> {
const manifest = await readManifestFromGcs(ctx, params.taskId, params.version);
const artifacts = await resolveArtifactRefs(manifest, params.stepId);
return {
taskId: params.taskId,
version: params.version,
stepId: params.stepId,
targetType: manifest.targetType,
metadata: extractStepMetadata(manifest, params.stepId),
artifacts,
callbackUrl: buildSignedCallbackUrl(
ctx.baseUrl, { ...params, agentId: agent.id, qcRunId: params.qcRunId },
ctx.webhookSecret, ctx.callbackExpiryDays,
),
};
}
Keeping packaging in a dedicated module means the orchestration service stays focused on lifecycle management. When a new artifact type appears in the pipeline, I update one file rather than threading GCS reads through dispatch logic.
State machine integration
QC introduced three new states into the existing gate review state machine: qc_pending, qc_in_progress, and qc_error. The manifest schema gained a qc field on GateReviewSchema to store run metadata — run ID, per-agent status, timestamps, and error details.
Transitions remain atomic with the platform's existing compare-and-swap (CAS) protected manifest updates. A worker cannot claim a run as qc_in_progress unless the manifest is currently in qc_pending. A webhook cannot finalize results unless the run is still active. This prevents double-dispatch and stale callback races — the same class of bug I had previously solved with generation-fenced leases elsewhere in the pipeline.
The state flow looks like this:
| Event | From | To |
|---|---|---|
| QC configured on submit | pending_review |
qc_pending |
| Worker claims run | qc_pending |
qc_in_progress |
| All required agents pass | qc_in_progress |
approved |
| Required agent fails or timeout | qc_in_progress |
qc_error |
| QC not configured | pending_review |
approved or rejected |
Config-driven architecture
Hard-coding agent URLs would have made every new QC check a deploy. Instead, the entire feature is driven by a JSON config stored in app_configs:
{
"enabled": true,
"callbackExpiryDays": 7,
"steps": {
"trainer_gate": {
"enabled": true,
"agents": [
{
"id": "structure",
"name": "Structure QC",
"triggerUrl": "https://...",
"required": true
}
]
}
}
}
Adding a new QC agent to the trainer gate is a config change, not a code change. Disabling QC for a step is flipping enabled to false. Staging can run with QC on while production stays on the manual path until we validate agent accuracy. The required flag lets optional agents run for telemetry without blocking approval — useful when rolling out a new check in shadow mode.
Testing
I wrote tests at three layers to cover the integration surface:
- Gate-review route (51 tests): QC-enabled submission, QC-skipped fallback, state transitions, and error paths when config is malformed.
- Schema tests (77 tests): validation of the new
qcfield onGateReviewSchema, state enum extensions, and manifest round-trips. - QC service tests (9 tests): run creation, dispatch payload construction, result aggregation, required-vs-optional agent failure semantics, and signed URL generation.
Step-checks tests cover the auto-QC integration path end to end — verifying that a configured step triggers orchestration and an unconfigured step preserves legacy behavior. Signed URL tests include expiry rejection and tamper detection, because broken webhook auth would undermine the entire QC gate.
What I learned
Automated quality checks belong in the pipeline, not in reviewers' heads. But the integration has to be opt-in and config-driven — otherwise you are coupling deploy cycles to QC policy changes and forcing every environment into the same validation regime.
The pattern that worked: treat AI quality agents as external async workers with signed callbacks, slot them into the existing state machine with new transitional states, and preserve the zero-QC fallback so rollout is incremental. Reviewers still make the final call on nuanced judgment calls. The agents catch structural and completeness issues before a human ever opens the task — which is exactly where automation earns its place in a review pipeline.