HT
Engineering
Research
Back to Portfolio
Blog
Technical deep-dives and lessons from building production systems on GCP
All
Infrastructure
DevOps
ML / AI
API Design
Performance
Security
Loading articles...
Dead Letter Queues for Pipeline Reliability
When Two Cache Layers Serve Stale Data
A UX Polish Sprint: 8 PRs in 48 Hours
GCS Race Conditions: Generation-Fenced Lease Deletion
Multi-Layer Caching for Dashboard Latency
Pipeline Hardening: Timeouts, Traceability, and Graceful Shutdown
Building a GCS-Backed Auto-Seeder Platform
When 57K Lines Get Rolled Back
Invisible Bugs: File Type Bypass and Missing Worker Runs
Designing a Task Lifecycle V1 from Scratch
Building an SFT Recording Pipeline
When Display Labels Break Sorting
Taming a State Machine: Bulk Admin Ops
Decoupling Claim Timeouts from Feature Flags
Bootstrapping Engineering Standards on a Legacy Codebase
Production-Ready Multi-Turn Evaluation
Reverse-Engineering a Codebase Into Architecture Docs
Audit-Safe Admin Tools with Event Sourcing
Multi-Turn Agent Evaluation: Persistent State
Docker Image Hardening for AI Benchmarks
Adding Responses API to an Agent Framework
Dev Starter Kit: AI Coding Tools as Senior Engineers
Building an AI Evaluation Platform on GCP
Event-Driven Architecture on GCP: Pub/Sub
Debugging AlloyDB SSL Connection Drops
Idempotent Cloud Tasks Handlers in Python
Zero-Downtime Embedding Migration
RL Training Arena for Code Agents
Security Audit: 25 Critical Issues
Building the Core AI Evaluation Engine
Automated LLM Scoring Service
Auth Gateway with Admin Dashboard
Notification Service: Event Delivery
RAG Retrieval Service: pgvector and Embeddings
RL Arena Executor and Preprocessor
Workflow, Taxonomy, and Platform Services
Incident Response and CLI Agent Debugging
Systematic Service Hardening
Production Bug Hunting: 33 Issues
CI/CD Quality Gates That Actually Catch Bugs
When Your Logging Framework Crashes Production
Hardening a Headless API for Production
Building an Events Table for Pipeline Observability
Redesigning Bulk Intake: Cascading Dropdowns and Taxonomy Validation
From Fixed Levels to L(n): Building an Extensible Taxonomy System
When RAG Says Duplicate but the LLM Disagrees