All Articles

When 57K Lines Get Rolled Back

· 8 min read · Humza Tareen
Integration Rollback Architecture TypeScript Git

Shipping a large vertical slice—schemas, persistence, APIs, background work, UI, dashboards, and cluster config—looks straightforward on paper. Split the work into reviewable pull requests, get approvals, merge, and celebrate. Reality had other plans. Eleven stacked pull requests collapsed into one squashed merge of roughly fifty-seven thousand lines across three hundred thirty-four files; the next day, the whole thing was reverted; and the follow-up was not a replay of the same diff but a deliberate reconsolidation with stricter boundaries. This is the engineering story of that arc: what worked in review, what broke in integration, why the team chose rollback, and how pressure to land the feature again produced a smaller, clearer codebase than the first attempt.

The setup: one feature, many surfaces

I had built an auto-seeder capability for the platform—a full task-seeding flow with a first-version data foundation, runtime validation, a database layer with explicit state machines, HTTP routes, an agent-style engine with pipeline steps and quality gates, supporting services, a complete operator UI with list and detail views, dashboard and admin affordances, and Kubernetes wiring for the worker paths. The footprint was large by design: the feature touched almost every layer of the stack. In aggregate the branch measured on the order of fifty-seven thousand lines spread across more than three hundred files. That scale is not bragging rights; it is a warning label. The question was never “can we write it,” but “can we land it without turning main into an unmaintainable blast radius.”

The split strategy: eleven layers, eleven reviews

To keep cognitive load human-sized, I carved the work into eleven pull requests, each aimed at a coherent layer. The intent was familiar: reviewers could reason about schema changes without wading through React components, and UI folks could comment on UX without blocking on migration SQL.

  1. PR 1: ORM schema, migrations, and core shared types.
  2. PR 2: Runtime schema definitions for request and domain payloads.
  3. PR 3: Database access layer, step and task state machines, and internal documentation.
  4. PR 4: HTTP routes plus shared library updates they depended on.
  5. PR 5: Auto-seeder agent module and core runtime wiring.
  6. PR 6: Pipeline steps and the quality gate that sits in front of downstream work.
  7. PR 7: Services—for example run bookkeeping, taxonomy, gate review orchestration, and task detail aggregation.
  8. PR 8: Additional library helpers and complementary API surfaces.
  9. PR 9: UI: task detail, list pages, and read-only viewers.
  10. PR 10: Dashboard and admin shells, shared components, and CI configuration.
  11. PR 11: Cluster manifests and ops-facing documentation.

Each pull request went through independent review. Most stacks carried two to four approvals. On paper that is healthy process. Under the surface, however, the stack was not eleven isolated modules—it was one system delivered in slices. Tests in later pull requests imported types, fixtures, and behaviors introduced earlier. Fixtures assumed admin-capable callers. Method signatures drifted slightly as earlier layers tightened their contracts. The reviews were thorough for each slice, but no single review saw the whole dependency graph at once.

The merge problem: clean PRs, messy integration

When we tried to land the stack piecemeal, the picture changed. Individual pull requests could not merge into main in isolation without leaving the tree in a failing state. Tests in PR 9 assumed code that only existed once PRs 4 through 7 were present; merging PR 3 alone was “correct” for that file set but wrong for the repository as a whole. Failures cascaded: auth-related tests needed consistent mock claims (for example admin: true where the route enforced elevated access), and signature changes in service constructors rippled through factories that each PR had updated differently.

The pragmatic fix was to open an integration branch, merge or replay all eleven branches into it, resolve conflicts once, and run the full suite until green. That work included mechanical conflict resolution, but more importantly it forced a single coherent view of contracts: one shape for context objects, one ruleset for authorization in tests, one agreed order of initialization for the pipeline. When the branch was stable, we merged it to main as a single squashed commit—one hash, one giant diff, one point in history that contained the entire feature.

Squashing cleaned history but concentrated risk: one commit, one blame target, little room for surgical rollback. Reviewers of the final merge had to trust integration work done off the beaten path.

The rollback: size as an operational concern

The feature worked. Smoke tests and targeted manual passes looked good. And yet, the next day, the team reverted the merge. The concern was not a specific production defect; it was exposure. A change that large is hard to reason about under incident pressure. If something subtle misbehaved in production—latency, partial writes, a race in the worker—debugging would mean spelunking a single rev that touched everything from SQL to CSS. The team preferred a clean working tree and a rethink over living with that concentration of risk. A single revert commit removed the feature wholesale. It was the right call for how the organization weighed operability against shipping velocity in that moment.

Rollback is often framed as failure. Here it functioned as a forcing function: we still needed the capability, but we no longer had the luxury of carrying two half-merged directory trees and opportunistic imports. We had to come back with a story that fit in human heads and in git history.

The reconsolidation: structure before replay

Rather than cherry-picking the old branch and hoping for a warmer reception, I treated the empty tree as a design exercise. The original implementation had split concepts across two top-level areas with overlapping responsibilities—think parallel folders where both claimed “the” place for seeding logic, types, and glue code. That duplication had been expedient while prototyping; at integration time it became import cycles, inconsistent naming, and two competing answers to “where does validation live.”

Conceptually, the before state looked like this—two sibling trees with blurry boundaries:

// Before (simplified layout): overlapping concerns split across siblings
src/
  feature-a-seed/**     // validators, handlers, “utility” DB touches
  feature-b-seeder/**   // overlapping validators, duplicated DTO shapes
  apps/api/routes/**    // routes calling into both inconsistently

The reconsolidation merged those concerns into one versioned library root—below shown as src/lib/v1/—and enforced a directional rule: providers talk to the outside world, services encapsulate orchestration and policy, routes stay thin.

// After: explicit layering — dependencies point inward/downward only
src/lib/v1/
  providers/    // IO boundaries (object storage, queue, outbound HTTP)
  services/     // application use-cases; compose providers + domain
  domain/       // pure types, invariants, state transitions
  pipeline/     // worker engine, steps, gate execution
apps/api/routes/**  // parse/validate request → call service → map response

Routes stopped reaching into storage clients. Services stopped importing UI helpers. Domain code knew nothing about Express or Kubernetes—only facts about tasks, steps, and allowed transitions.

Object storage had been the worst offender for “scatter.” Multiple modules issued their own uploads, deletes, and signed URL logic. During reconsolidation I introduced a single provider type—conceptually GcsRepository—as the lone module allowed to speak the vendor SDK dialect. Everyone else depended on narrow methods with stable TypeScript signatures.

// The only module that imports the vendor SDK lives beside other providers
export class GcsRepository {
  constructor(private readonly bucket: string) {}

  async putObject(key: string, body: Buffer, contentType: string): Promise<void> {
    // vendor-specific client calls isolated here
  }

  async getSignedReadUrl(key: string, ttlMs: number): Promise<string> {
    // policy for TTL and headers centralized
  }

  async deleteObject(key: string): Promise<void> {}
}

// Services consume an interface-shaped stub in tests — never the SDK
export interface ObjectStore {
  putObject(key: string, body: Buffer, contentType: string): Promise<void>;
  getSignedReadUrl(key: string, ttlMs: number): Promise<string>;
  deleteObject(key: string): Promise<void>;
}

Retry, content-type defaults, and path naming lived in one place—so credential rotation or per-environment buckets touched a few provider lines instead of dozens of service call sites.

Pure logic moved under domain/ as small modules with no side effects—ideal for exhaustive unit tests without spinning databases or mocking network.

// domain/task-machine.ts — no framework imports
export type TaskPhase = "draft" | "queued" | "running" | "passed" | "failed";

export function canEnqueue(current: TaskPhase): boolean {
  return current === "draft";
}

export function nextPhaseAfterGate(qualityOk: boolean): TaskPhase {
  return qualityOk ? "passed" : "failed";
}

The worker runtime—steps, sequencing, timeouts—lived under pipeline/, depending on domain invariants and calling services when a side effect was required. That separation made it obvious where to add a new gate, versus where to adjust persistence, versus where to tweak an API response DTO.

When the dust settled, the consolidated tree was roughly nine thousand six hundred eighty-five lines smaller than the first landing attempt—same capabilities, fewer contradictions. Deletion is the unsung half of architecture work.

The database split: ship without blocking on migrations

On the second pass, schema and migrations lived in their own PR train, separate from the service-layer and object-storage integration branch. Reviewers could vet application logic against existing tables without coupling every discussion to forward-only migration rollback. The app could ship paths that reused current columns and defer additive schema until a dedicated migration landed. Ordering work that way does not eliminate migrations—it keeps schema debate from freezing feature integration and vice versa.

Lessons

Stacked PRs excel at review and fail at integration when dependencies are circular across layers. If later tests compile only because earlier “unfinished” APIs exist, you are building one branch that is masquerading as many. In that regime, frequent rebases of a shared feature branch beat pretending each PR is merge-eligible alone.

Squashed mega-merges trade history for cleanliness. If the org cannot operate a fifty-thousand-line atomic commit, prefer deployable vertical slices or feature flags instead of one blast.

Rollback can be productive. Losing the first merge was discouraging in the moment, but it created permission to collapse duplicate directories, enforce provider boundaries, and delete dead scaffolding. The second shape of the codebase was not a sentimental copy-paste; it was a refined design under time pressure.

Big features do not fail only because engineers cannot build them; they fail when integration strategy and operational comfort do not match the size of the diff. Align those early—or be ready to absorb the cost of a revert and treat it as a rewrite opportunity, not just a setback.