Building for Scale: Architecture Decisions That Compound

Author David Park

Published on: November 21, 2025

In the past 5 years, Harbor Software has built systems that serve hundreds of requests per minute and systems that serve hundreds of thousands. The difference in architecture between these two scales is not what most people expect. It is not about using Kubernetes instead of a single server. It is not about switching from PostgreSQL to a distributed database. It is about a handful of early architectural decisions that either compound in your favor as load increases, or compound against you—decisions that cost almost nothing to make early but are enormously expensive to retrofit later. This post covers the six decisions we have seen matter most, with specific examples from systems we have built and operated.

Article Overview

Building for Scale: Architecture Decisions That Compound

8 sections · Reading flow

01
Decision 1: Separate Read and Write Paths Early

→

02
Decision 2: Use Idempotency Keys for All Mutations

→

03
Decision 3: Make Background Jobs the Default,…

→

04
Decision 4: Structured Logging from Day One

→

05
Decision 5: Feature Flags for Everything That…

→

06
Decision 6: API Versioning from the First…

→

07
The Meta-Lesson: Decisions Compound

→

08
A Practical Checklist for New Projects

HARBOR SOFTWARE · Engineering Insights

Decision 1: Separate Read and Write Paths Early

The single most impactful architectural decision for scaling a data-intensive application is separating read and write paths. Not because you need it at 100 requests per minute—you do not—but because the cost of retrofitting it later is enormous (finding and reclassifying every database query in a mature codebase), and the cost of implementing it early is minimal (a thin abstraction layer).

In practice, this means:

Writes go to a primary database (PostgreSQL, MySQL, whatever your operational database is)
Reads that require strong consistency (“show me my account balance after I just transferred money”) read from the primary
Reads that tolerate eventual consistency (“show me the dashboard with aggregated metrics”, “show me the product catalog”, “show me the activity feed”) read from a replica, a materialized view, or a dedicated analytics store

// Early implementation: just a thin abstraction layer
class DataAccess {
  constructor(
    private readonly primary: Pool,    // Write + consistent reads
    private readonly replica: Pool,    // Eventual-consistency reads
  ) {}

  async write(query: string, params: any[]) {
    return this.primary.query(query, params);
  }

  async readConsistent(query: string, params: any[]) {
    return this.primary.query(query, params);
  }

  async readEventual(query: string, params: any[]) {
    return this.replica.query(query, params);
  }
}

At low scale, primary and replica can point to the same database. The abstraction costs nothing—zero performance overhead, zero operational complexity. At high scale, you point replica to a read replica (or multiple read replicas behind a load balancer), and all your dashboard queries, search queries, and reporting queries move off the primary database without changing a single line of application code. We have done this migration on three systems, and each time it took less than a day because the abstraction was already in place. Teams that did not have this abstraction spent 2-4 weeks finding and reclassifying every database query in their codebase, because you have to trace every query to understand whether it needs strong consistency (reads its own writes within the same user session) or can tolerate eventual consistency (data can be up to a few seconds stale).

The compounding effect: every new feature you build uses the same abstraction, so you never accumulate queries that are misclassified. By the time you need the separation, 100% of your queries are already correctly routed. Compare this with the retrofit scenario, where you need to audit every query in a codebase that has been growing for 2 years without the abstraction.

Decision 2: Use Idempotency Keys for All Mutations

An idempotency key ensures that retrying a request produces the same result as the original request. This sounds like a nice-to-have for reliability, but it is actually a prerequisite for scaling because scaling introduces retries at every layer of the stack: load balancers retry on 502s, message queues retry on processing failures, clients retry on timeouts, background job systems retry on worker crashes, and distributed transaction coordinators retry on partial failures.

// Every mutation endpoint accepts an idempotency key
@app.post("/api/orders")
async def create_order(
    order: OrderRequest,
    idempotency_key: str = Header(alias="Idempotency-Key"),
):
    # Check if we already processed this request
    existing = await db.fetch_one(
        "SELECT result FROM idempotency_store WHERE key = $1",
        idempotency_key,
    )
    if existing:
        return JSONResponse(content=existing["result"], status_code=200)
    
    # Process the request
    result = await process_order(order)
    
    # Store the result for future retries
    await db.execute(
        "INSERT INTO idempotency_store (key, result, created_at) "
        "VALUES ($1, $2, NOW())",
        idempotency_key, json.dumps(result),
    )
    return result

Without idempotency keys, retries cause duplicate actions: duplicate charges, duplicate emails, duplicate database records, duplicate webhook deliveries. Debugging duplicate records in production—figuring out which one is canonical, cleaning up the duplicates, understanding how they happened, and explaining to affected users—is one of the most time-consuming operational tasks we encounter. One client discovered 340 duplicate charges over a 3-month period caused by a mobile app retrying payment requests on timeout. Each duplicate required manual investigation and refund processing. Idempotency keys prevent the entire category of problem.

We store idempotency keys for 72 hours (long enough to cover any reasonable retry window) and clean them up with a daily cron job. The storage overhead is trivial—a few megabytes per day for most systems. We use a unique index on the key column, which means concurrent retries of the same request are serialized at the database level, preventing race conditions where two concurrent retries both pass the existence check.

Decision 3: Make Background Jobs the Default, Not the Exception

The synchronous request-response model is appropriate for reads and for mutations that must complete before the user can proceed (payment authorization, authentication). For everything else—sending emails, generating reports, updating search indices, processing file uploads, syncing with third-party systems, generating thumbnails, computing analytics—background jobs are the correct default.

The compounding effect of this decision is response time stability. A system where every request completes in 50-200ms regardless of what work it triggers is a system that scales predictably. A system where some requests take 5 seconds because they trigger synchronous email sends or PDF generation is a system where response times degrade unpredictably under load, because those slow operations consume worker threads (or event loop time) that other requests need. Under sustained load, slow synchronous operations create a cascading degradation: thread pool exhausts, request queue grows, timeouts increase, clients retry, load increases, more threads are consumed by slow operations. Background jobs break this cascade by decoupling the user-facing response from the backend processing.

# Pattern: accept, enqueue, respond
@app.post("/api/reports")
async def request_report(params: ReportParams):
    report_id = generate_id()
    await job_queue.enqueue(
        "generate_report",
        payload={"report_id": report_id, "params": params.dict()},
        priority="normal",
        retry_policy=RetryPolicy(max_attempts=3, backoff="exponential"),
    )
    return {"report_id": report_id, "status": "processing"}

# Client polls or receives webhook when complete
@app.get("/api/reports/{report_id}")
async def get_report(report_id: str):
    report = await db.get_report(report_id)
    if report.status == "completed":
        return {"status": "completed", "download_url": report.url}
    elif report.status == "failed":
        return {"status": "failed", "error": report.error_message}
    return {"status": report.status, "progress": report.progress_pct}

We use BullMQ (Redis-backed) for Node.js systems and Celery (Redis-backed) for Python systems. Both provide retry policies with exponential backoff, dead-letter queues for failed jobs, priority levels, job progress tracking, and concurrency control. The infrastructure cost is a Redis instance—typically $15-50/month on managed hosting. The alternative—scaling up API servers to handle slow synchronous operations with more workers—costs 5-10x more in compute and produces a worse user experience because users wait for slow operations to complete instead of getting an immediate acknowledgment.

Decision 4: Structured Logging from Day One

Unstructured logs (plain text) are adequate when your system runs on one server and you can SSH in and grep the log files. They fail completely when your system runs on multiple servers, when logs are shipped to a centralized system (Datadog, CloudWatch, Grafana Loki), or when you need to correlate events across services. By the time you need structured logging, converting existing unstructured logs is a painful, months-long migration that touches every file in the codebase—and during the migration, you have a mix of structured and unstructured logs that makes querying even harder.

// Bad: unstructured log
console.log(`User ${userId} created order ${orderId} for $${amount}`);

// Good: structured log
logger.info("order_created", {
  user_id: userId,
  order_id: orderId,
  amount_cents: amountCents,
  currency: "USD",
  payment_method: paymentMethod,
  items_count: cart.items.length,
  latency_ms: Date.now() - startTime,
});

Structured logs are queryable in ways that unstructured logs are not. You can ask: “Show me all orders over $500 in the last hour” or “What is the p95 latency for order creation, broken down by payment method?” or “How many orders failed with payment_method=stripe vs payment_method=paypal?” With unstructured logs, answering these questions requires writing regex patterns and hoping the log format has not changed across versions of the application. With structured logs, it is a simple filter or aggregation query in your log management tool.

We use Pino for Node.js (it is the fastest structured logger in the Node ecosystem and outputs JSON natively, with zero overhead for fields that are not included) and structlog for Python. Logs go to stdout and are collected by the platform (Vercel Functions capture stdout automatically, AWS CloudWatch captures container stdout, Datadog’s agent captures stdout from any process). No application-level log shipping code, no file rotation, no log management daemons. The total setup time for structured logging is under an hour—import the library, replace console.log with logger.info, add context fields. The operational benefit over the lifetime of the system is measured in weeks of saved debugging time.

Decision 5: Feature Flags for Everything That Touches Users

Feature flags decouple deployment from release. You deploy code that is behind a flag (new code is in production but not active for any users), verify it works in production with internal users or a small percentage of external users, then gradually roll it out. If something goes wrong at any point in the rollout, you turn off the flag—no deployment, no rollback, no downtime, no risk of the rollback introducing a different bug.

// Feature flag check with graceful degradation
const newCheckoutFlow = await flags.isEnabled(
  "new-checkout-flow",
  { userId: user.id, plan: user.plan, region: user.region }
);

if (newCheckoutFlow) {
  return renderNewCheckout(cart);
} else {
  return renderLegacyCheckout(cart);
}

The compounding effect is deployment confidence. Teams with feature flags deploy more frequently because the blast radius of any individual deployment is controllable—a deployment that introduces a bug in the new checkout flow affects only the 5% of users who have the flag enabled, not the entire user base. Teams without feature flags deploy less frequently because every deployment is an all-or-nothing bet where a bug affects 100% of users simultaneously. Frequent deployment leads to smaller changesets, which leads to easier debugging (fewer changes to bisect), which leads to faster incident resolution, which leads to more deployment confidence. It is a virtuous cycle, and feature flags are the mechanism that starts it.

We use LaunchDarkly for client-facing products (their targeting rules, percentage rollouts, and kill switches are worth the subscription cost) and a simple database-backed flag system for internal tools. The internal flag system is approximately 200 lines of code (a flags table, a lookup function with caching, and an admin UI) and has been running without issues for 3 years. The important thing is not the tool—it is the practice of wrapping user-facing changes in flags so that deployment and release are separate decisions.

Decision 6: API Versioning from the First External Consumer

The moment an external system depends on your API—a mobile app, a partner integration, a third-party webhook consumer, a Chrome extension—you have an implicit contract. Breaking that contract (changing field names, removing endpoints, altering response shapes, changing pagination behavior) causes production failures for your consumers. API versioning makes the contract explicit and gives you a path to evolve the API without breaking existing consumers.

// URL-based versioning (our preference for simplicity)
GET /api/v1/products      // Returns { products: [...] }
GET /api/v2/products      // Returns { data: [...], pagination: {...} }

// v1 continues to work indefinitely for existing consumers
// New consumers are directed to v2
// v1 is deprecated (not removed) after all consumers migrate

We prefer URL-based versioning over header-based versioning because it is visible (you can see which version a consumer is using in access logs without parsing headers), cacheable (CDNs cache different URL paths separately; header-based versioning requires Vary headers that many CDNs handle poorly), and debuggable (when someone reports an API issue, the version is right there in the URL). With header-based versioning, you have to ask “which version were you calling?” and hope they know—or dig through request logs to find the header value.

The compounding effect: every API change you make is additive rather than breaking. You add v3 alongside v2, rather than modifying v2 and hoping nothing breaks. This means you can evolve your API rapidly without coordinating release schedules with consumers, which becomes critical when you have more than a handful of integrations. One system we maintain has 4 active API versions (v1 through v4) with 23 external consumers. Because each version is a separate, stable contract, we can ship improvements to v4 daily without risking v1-v3 consumers. The cost of maintaining multiple versions is real (each version has its own route handlers and response serializers), but it is a fraction of the cost of coordinating breaking changes across 23 consumers.

The Meta-Lesson: Decisions Compound

None of these six decisions are individually expensive. Read/write separation is a thin abstraction layer—20 lines of code. Idempotency keys are a database table and a middleware—50 lines of code. Background jobs are a queue and a worker—a library import and a configuration change. Structured logging is replacing console.log with logger.info—a find-and-replace plus adding context fields. Feature flags are a boolean check—an if-else statement. API versioning is a URL prefix—a routing change.

The cost of implementing all six on a new project is approximately 2-3 days of engineering time. The cost of retrofitting them on a mature system with 100,000+ lines of code, 50+ database tables, and 200+ API endpoints ranges from 2 weeks to 6 months, depending on the size and complexity of the codebase and how deeply the anti-patterns have been baked in. The cost of not having them manifests as scaling incidents, duplicate data, slow debugging cycles, risky deployments, and broken integrations—each of which consumes days or weeks of engineering time that could have been spent building product features.

Architecture decisions compound. The good ones make every subsequent decision easier. The bad ones (or the deferred ones) make every subsequent decision harder. Invest in the compounding decisions early—they cost almost nothing today and pay dividends for the entire lifetime of the system.

A Practical Checklist for New Projects

We use the following checklist for every new project at Harbor Software. It takes 2-3 days to implement all items, and we treat it as non-negotiable infrastructure that goes in before the first feature:

## Project Architecture Checklist

### Data Access Layer
- [ ] Read/write separation abstraction in place
- [ ] Primary and replica connections configured (can point to same DB initially)
- [ ] All queries classified as consistent or eventual
- [ ] Connection pooling configured with appropriate limits

### Reliability
- [ ] Idempotency key middleware for all mutation endpoints
- [ ] Idempotency store table created with unique index on key
- [ ] Cleanup cron job for expired idempotency records (72hr TTL)
- [ ] All external-facing mutation endpoints accept Idempotency-Key header

### Background Processing
- [ ] Job queue infrastructure provisioned (Redis + BullMQ/Celery)
- [ ] At least one worker process running
- [ ] Dead-letter queue configured for failed jobs
- [ ] Job progress tracking for long-running tasks
- [ ] All non-critical async work routed through queue (emails, reports, syncs)

### Observability
- [ ] Structured logging library imported and configured
- [ ] Request ID middleware generating and propagating trace IDs
- [ ] All log statements use structured format with context fields
- [ ] Health check endpoint (/health) returning service status
- [ ] Readiness endpoint (/ready) confirming dependencies are available

### Release Management
- [ ] Feature flag system in place (even a simple DB-backed one)
- [ ] All user-facing features wrapped in flags
- [ ] Flag admin UI or CLI for toggling flags
- [ ] Percentage rollout capability for gradual releases

### API Contracts
- [ ] URL-based versioning (e.g., /api/v1/) for all external endpoints
- [ ] Response schemas documented (OpenAPI or equivalent)
- [ ] Breaking changes require new version, not modification of existing

Every item on this checklist is something we have had to retrofit on a mature system at least once, and in every case the retrofit was 10-50x more expensive than the initial implementation would have been. The checklist exists specifically because we tired of learning this lesson repeatedly. If you adopt nothing else from this post, adopt the checklist. Print it out, tape it to your monitor, and do not ship your first feature until every box is checked. Your future self—the one debugging a production incident at 2 AM or trying to scale the system to 10x current load—will thank you.

Building for Scale: Architecture Decisions That Compound

Decision 1: Separate Read and Write Paths Early

Decision 2: Use Idempotency Keys for All Mutations

Decision 3: Make Background Jobs the Default, Not the Exception

Decision 4: Structured Logging from Day One

Decision 5: Feature Flags for Everything That Touches Users

Decision 6: API Versioning from the First External Consumer

The Meta-Lesson: Decisions Compound

A Practical Checklist for New Projects

You may also like

Leave a comment Cancel reply