Building CI/CD Quality Gates That Actually Catch Bugs
A green checkmark on your CI pipeline means nothing if you're not actually running all your tests. I've seen pipelines that pass every time — and still let broken code reach production. The problem isn't that tests fail; it's that some tests never run at all.
The Invisible Tests
The test infrastructure was using static test group lists. When new test files were added to the codebase, they weren't included in CI runs. The pipeline would pass, the coverage report would look fine, and nobody would notice that entire test modules were sitting in the repository, never executed.
I built dynamic auto-discovery that scans the test directory at runtime and includes every test file automatically. No more silent test gaps. The implementation:
import pathlib
def discover_test_groups(test_dir: str = "tests") -> list[str]:
test_path = pathlib.Path(test_dir)
return sorted([
str(f.relative_to(test_path))
for f in test_path.rglob("test_*.py")
])
Simple, but it changed everything. Every test_*.py file gets picked up. Add a new test file tomorrow — it runs in CI without touching any configuration.
Test Coverage Enforcement
Coverage reports are useful for visibility. Coverage gates are what actually prevent regressions. I added coverage enforcement to PR checks: every PR now shows exactly which files were touched and their coverage. PRs that decrease coverage below the threshold get blocked. Not just a report — an actual gate.
Fixing What The Tests Found
Once the invisible tests ran, they exposed real bugs. A NoneType crash in question guidelines processing — an optional field accessed without a null check. A timezone-naive vs timezone-aware datetime comparison crash in the task detail endpoint — Python's datetime.utcnow() compared against database TIMESTAMPTZ. And four failing tests that had been silently skipped.
The timezone fix is worth calling out. Naive datetimes are a common footgun:
# Before: naive datetime (no timezone info)
from datetime import datetime
created = datetime.utcnow()
# After: timezone-aware
from datetime import datetime, timezone
created = datetime.now(timezone.utc)
When your database stores TIMESTAMPTZ, your Python code needs to speak the same language.
The CI Pipeline
The GitHub Actions workflow runs on every PR: lint, type check, security scan, test suite with coverage, and a coverage diff report posted as a PR comment. Fail-fast on any quality gate. If any step fails, the pipeline stops. No merging broken code.
What I'd Do Differently
Start with dynamic test discovery from day one. Static test lists are a maintenance burden that silently rots. Coverage enforcement should be a week-one decision, not a month-six retrofit. The earlier you treat quality gates as infrastructure, the less technical debt you accumulate.
Quality gates are infrastructure, not overhead. Build them like you'd build any other critical system.