PostgreSQL vs MongoDB for SaaS Applications: A Decision Framework

Author David Park

Published on: March 25, 2022

The PostgreSQL-versus-MongoDB debate generates more heat than light. Advocates on both sides argue from first principles (“relational data belongs in a relational database” / “documents are the natural unit of storage”) while ignoring that the right choice depends almost entirely on your specific application’s data patterns, query patterns, and operational constraints. At Harbor Software, we use both: PostgreSQL for our SaaS platform’s core data (users, subscriptions, billing, audit logs) and MongoDB for our ML pipeline’s intermediate data (training runs, experiment configurations, unstructured model metadata). This post explains the decision framework we used and the concrete trade-offs we have observed after six months of production use with both databases.

Article Overview

PostgreSQL vs MongoDB for SaaS Applications: A Decision F…

7 sections · Reading flow

01
Start With Your Data, Not Your Preferences

→

02
PostgreSQL: What It Does Better

→

03
MongoDB: What It Does Better

→

04
Our Decision Framework in Practice

→

05
Operational Reality of Running Two Databases

→

06
Common Mistakes We Have Seen

→

07
Quick Reference: Decision Checklist

HARBOR SOFTWARE · Engineering Insights

Start With Your Data, Not Your Preferences

Before choosing a database, answer these five questions honestly about your data. Write down the answers. Do not rationalize toward the technology you already know or prefer.

What is the shape of your data? Fixed schema with well-defined relationships between entities? Or variable structure where documents differ significantly from each other? A user profile is always the same shape. An ML experiment configuration varies wildly by model type, training strategy, and hyperparameter set.
How do you query it? Complex joins across multiple entities with aggregation, filtering, and sorting? Or mostly single-document reads and writes with occasional collection-level scans? If your most common query touches three tables, you need joins. If it touches one document, you do not.
What are your consistency requirements? Do you need multi-table ACID transactions where partial failure is catastrophic (billing, inventory)? Or is eventual consistency acceptable (analytics, activity feeds)?
How does your data grow? More rows of the same shape (new users, new orders)? Or more fields and nested structures (new experiment types with different parameters)?
Who operates it? Do you have a DBA or someone with deep database expertise? Or is “the developer who is least afraid of databases” your operational team?

If your answers are mostly left-column, PostgreSQL is the stronger choice. If they are mostly right-column, MongoDB deserves serious consideration. Most real applications have mixed answers, which is why many teams—including ours—end up using both.

PostgreSQL: What It Does Better

Relational Integrity

SaaS applications are fundamentally relational. A user belongs to an organization. An organization has a subscription. A subscription references a billing plan. An invoice references a subscription and a payment method. These relationships are not optional decorations—they are business rules. If an invoice references a subscription that does not exist, your billing system is broken. If a user belongs to an organization that was deleted, your access control is broken.

PostgreSQL enforces these relationships at the database level with foreign key constraints, check constraints, and unique constraints:

CREATE TABLE organizations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE subscriptions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE RESTRICT,
    plan_id TEXT NOT NULL,
    status TEXT NOT NULL CHECK (status IN ('active', 'past_due', 'canceled', 'trialing')),
    current_period_end TIMESTAMPTZ NOT NULL,
    UNIQUE(organization_id)  -- One active subscription per org
);

CREATE TABLE invoices (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    subscription_id UUID NOT NULL REFERENCES subscriptions(id),
    amount_cents INTEGER NOT NULL CHECK (amount_cents >= 0),
    status TEXT NOT NULL CHECK (status IN ('draft', 'open', 'paid', 'void')),
    issued_at TIMESTAMPTZ DEFAULT NOW()
);

That REFERENCES clause means PostgreSQL will reject any attempt to create an invoice for a non-existent subscription, or to delete an organization that still has an active subscription (ON DELETE RESTRICT). The CHECK constraints ensure that status fields only contain valid values and amounts cannot be negative. The UNIQUE constraint ensures each organization has at most one subscription. You get all of this protection for free, enforced at the database level regardless of which application, script, admin tool, or data migration job accesses the data.

MongoDB has no equivalent. You can add application-level validation using schema validation rules or application code, but every code path that writes to the database must implement and maintain that validation independently. In our experience across multiple projects, this leads to data inconsistencies within 3–6 months as the team grows, new developers join, internal tools are built, and data migrations are run. The inconsistencies are small individually (a user with a reference to a deleted organization, a subscription with an invalid status string) but they accumulate and cause subtle bugs that are painful to diagnose.

Complex Queries and Reporting

SaaS applications generate complex reporting queries constantly. “Show me monthly recurring revenue by plan, excluding canceled subscriptions, for organizations that signed up in Q1.” “Which customers are approaching their usage limits this month?” “What is the average time from trial signup to paid conversion, segmented by referral source?”

In PostgreSQL, these queries are straightforward SQL:

SELECT 
    s.plan_id,
    COUNT(DISTINCT o.id) AS org_count,
    SUM(p.price_cents) / 100.0 AS mrr_dollars
FROM subscriptions s
JOIN organizations o ON s.organization_id = o.id
JOIN plans p ON s.plan_id = p.id
WHERE s.status = 'active'
  AND o.created_at >= '2022-01-01'
  AND o.created_at < '2022-04-01'
GROUP BY s.plan_id
ORDER BY mrr_dollars DESC;

This query touches three tables, filters on conditions across two of them, aggregates with GROUP BY, and sorts the result. PostgreSQL's query planner optimizes this automatically using indexes, hash joins, and sequential scan avoidance. The execution plan is deterministic and inspectable with EXPLAIN ANALYZE. On our dataset of 2,000 organizations with standard B-tree indexes, this query runs in 3ms.

The equivalent in MongoDB requires either multiple queries with application-level joining (slow, complex, and error-prone), a complex aggregation pipeline that is harder to read and harder to optimize, or denormalizing the data by embedding organization and plan information inside each subscription document (which creates consistency problems when organization names or plan prices change). None of these alternatives is as clean, performant, or maintainable as the SQL query.

ACID Transactions

When a customer upgrades their plan, you need to: update the subscription record, prorate the current billing period, create a new invoice line item, update the organization's usage limits, and log the change in the audit trail. All five operations must succeed or fail together. If the invoice creation succeeds but the subscription update fails, your billing state is inconsistent—the customer has been charged for a plan they are not on.

BEGIN;
  UPDATE subscriptions SET plan_id = 'enterprise', status = 'active' WHERE id = $1;
  INSERT INTO invoice_items (invoice_id, description, amount_cents) 
    VALUES ($2, 'Plan upgrade proration', $3);
  UPDATE organizations SET usage_limit = 100000 WHERE id = $4;
  INSERT INTO audit_log (entity_type, entity_id, action, details) 
    VALUES ('subscription', $1, 'upgrade', $5);
COMMIT;

PostgreSQL transactions guarantee atomicity (all or nothing), consistency (constraints are enforced), isolation (concurrent transactions do not interfere), and durability (committed data survives crashes). MongoDB added multi-document transactions in version 4.0, but they carry measurable performance penalties (10–30% throughput reduction in our benchmarks), require replica set deployment, have stricter size limits (16 MB per transaction), and add operational complexity that PostgreSQL transactions do not.

MongoDB: What It Does Better

Schema Flexibility for Heterogeneous Data

Not all data has a fixed schema. Our ML pipeline stores experiment configurations that vary significantly between model types. A text classification experiment has hyperparameters like learning rate, epochs, and class weights. An image embedding experiment has augmentation strategies, embedding dimensions, and triplet margins. A reinforcement learning experiment has reward functions, discount factors, and episode lengths. These share almost no fields beyond a timestamp and a status.

// Text classification experiment
{
    "_id": "exp_001",
    "type": "text_classification",
    "model": "distilbert-base-uncased",
    "hyperparameters": {
        "learning_rate": 2e-5,
        "epochs": 3,
        "batch_size": 16,
        "warmup_steps": 500,
        "weight_decay": 0.01
    },
    "dataset": {
        "name": "customer_feedback_v3",
        "train_size": 15000,
        "val_size": 2000,
        "class_distribution": {"positive": 0.45, "negative": 0.35, "neutral": 0.20}
    },
    "metrics": {
        "accuracy": 0.923,
        "f1_macro": 0.891,
        "confusion_matrix": [[450, 32], [18, 500]]
    }
}

// Image embedding experiment (completely different structure)
{
    "_id": "exp_002",
    "type": "image_embedding",
    "model": "resnet50",
    "hyperparameters": {
        "learning_rate": 1e-4,
        "epochs": 20,
        "augmentations": ["random_crop", "horizontal_flip", "color_jitter"],
        "embedding_dim": 256,
        "triplet_margin": 0.3
    },
    "dataset": {
        "name": "product_images_v2",
        "train_size": 50000,
        "num_classes": 150
    },
    "metrics": {
        "recall_at_1": 0.78,
        "recall_at_5": 0.92,
        "nmi": 0.65
    }
}

These two documents have completely different hyperparameters, dataset, and metrics structures. In PostgreSQL, you would either need separate tables per experiment type (rigid, hard to add new experiment types, and difficult to query across types) or a JSONB column (which works but loses the type safety, indexing efficiency, and constraint enforcement that motivate using PostgreSQL). The JSONB approach is a reasonable compromise, but if most of your data is in the JSONB column rather than in typed columns, you are essentially using PostgreSQL as a document store—and MongoDB does that better.

MongoDB handles this naturally. Each document can have a different structure, and you can query across all experiments regardless of type: db.experiments.find({"metrics.accuracy": {$gt: 0.9}}) works even though not all documents have an accuracy field—documents without it are simply excluded from results. Adding a new experiment type requires zero schema changes. Just insert a document with the new structure.

Write Performance and Operational Simplicity

MongoDB's write performance advantage is most pronounced for workloads that are primarily inserts of independent documents with no cross-document consistency requirements. Our ML pipeline writes thousands of log entries per training run—each entry is a JSON blob containing the current epoch, loss values, learning rate, GPU utilization, and timestamp. These entries are independent, require no transaction, and have no relationship to other entries. MongoDB handles this with less overhead than PostgreSQL because there is no write-ahead log synchronization between tables, no constraint checking, and no MVCC version tracking.

MongoDB's horizontal scaling is also more mature than PostgreSQL's. When your data outgrows a single server, MongoDB distributes data across shards automatically based on a shard key. You configure the shard key, and MongoDB handles the data distribution, rebalancing, and query routing. PostgreSQL sharding exists through extensions like Citus and through native partitioning, but it requires more manual configuration, careful partition key selection, and operational expertise. For teams without a dedicated DBA, MongoDB's operational simplicity for scaling is a genuine advantage that should not be dismissed.

Our Decision Framework in Practice

After evaluating both databases against our specific requirements, we established a rule that has served us well for six months:

PostgreSQL for: User-facing data, billing, anything that requires referential integrity across entities, anything that requires complex queries or reporting involving multiple data types, anything that participates in a multi-step transaction where partial failure is not acceptable.
MongoDB for: ML experiment tracking, training logs, unstructured pipeline outputs, configuration documents with variable schemas that change frequently, high-throughput write workloads where individual documents are independent and eventual consistency is acceptable.

This is not a universal rule. It is specific to our application's data patterns. A content management system with highly variable content types and simple access patterns might be better served entirely by MongoDB. A financial application where every dollar must be accounted for should almost certainly use PostgreSQL exclusively. The framework is not "PostgreSQL good, MongoDB bad" or vice versa—it is "characterize your data, then match the database to the data pattern."

Operational Reality of Running Two Databases

Running two databases adds complexity that should not be underestimated. You have two backup strategies, two monitoring dashboards, two upgrade processes, two sets of connection pooling to configure, and two failure modes to understand. We mitigate this by using managed services: Amazon RDS for PostgreSQL and MongoDB Atlas for MongoDB. The managed services handle backups, replication, patching, monitoring alerts, and failover. The incremental cost of a second managed database is roughly $50–$80/month at our scale (development tier instances)—a trivial price for using the right tool for each job.

If you are a small team and the thought of operating two databases gives you pause, that is a legitimate concern and you should take it seriously. Use PostgreSQL for everything and store variable-schema data in JSONB columns. PostgreSQL's JSONB support is genuinely excellent—you can create GIN indexes on JSONB fields for fast querying, use SQL alongside JSON path expressions for hybrid queries, and enforce constraints on JSONB fields using check constraints with JSON schema validation. You lose some of MongoDB's ergonomics for schema-flexible data, but you gain operational simplicity that is worth more than ergonomics at small scale. Only add MongoDB when you have a clear workload that PostgreSQL handles poorly and the operational overhead of a second database is justified by the improvement.

Common Mistakes We Have Seen

Mistake 1: Choosing MongoDB because "NoSQL scales better." This was true in 2012. In 2022, PostgreSQL with proper indexing, connection pooling (PgBouncer), and read replicas handles tens of thousands of requests per second. Unless you are operating at a scale where a single PostgreSQL server cannot hold your data (multiple terabytes), scaling is not a reason to choose MongoDB.

Mistake 2: Choosing PostgreSQL and putting everything in JSONB. If 80% of your data is in JSONB columns, you are using PostgreSQL as a document store with extra steps. You get worse query performance than MongoDB (JSONB querying is slower than native MongoDB document querying) and worse ergonomics (SQL + JSON path expressions is more verbose than MongoDB's query language). Either commit to relational modeling or use a document store.

Mistake 3: Denormalizing in MongoDB to avoid joins, then struggling with consistency. If you embed organization data inside every user document to avoid lookups, you now have N copies of the organization name to update when it changes. This is a well-known trade-off, but teams consistently underestimate how often "rarely changes" data actually changes in production.

Conclusion

The PostgreSQL vs. MongoDB decision is not philosophical. It is empirical. Characterize your data patterns, identify your query patterns, estimate your operational capacity, and pick the database (or databases) that best serve those specific needs. For most SaaS applications, PostgreSQL should be the default because SaaS data is fundamentally relational—users belong to organizations, organizations have subscriptions, subscriptions generate invoices. MongoDB earns its place when you have genuinely schema-flexible data with high write throughput and minimal cross-document consistency requirements. The mistake is not picking the "wrong" database—it is picking any database without first understanding what your data actually looks like and how you need to query it.

Quick Reference: Decision Checklist

For teams making this decision right now, here is a distilled checklist. Answer each question honestly, then count which column has more checks:

Do you need foreign key constraints? PostgreSQL. MongoDB cannot enforce referential integrity at the database level.
Do your queries regularly join 3+ tables? PostgreSQL. Multi-collection joins in MongoDB are painful and slow.
Do you need multi-document ACID transactions? PostgreSQL. MongoDB supports them but with performance penalties and operational caveats.
Does your schema vary significantly between documents of the same type? MongoDB. PostgreSQL's JSONB works but is a compromise.
Is your primary workload high-volume inserts of independent records? MongoDB. It is optimized for this pattern.
Do you need to scale writes beyond a single server? MongoDB. Its sharding is mature and well-documented.
Will your team size stay under 10 engineers for the next year? Pick one database, not two. Operational simplicity matters more than theoretical optimization at small scale.
Do you have a DBA or database-experienced engineer on the team? If yes, either database is fine—the expertise will compensate for suboptimal choices. If no, lean toward the database your team has more experience with, regardless of theoretical fit.

If you checked mostly left-column items: start with PostgreSQL. If you checked mostly right-column items: start with MongoDB. If it is a split: start with PostgreSQL (it is the safer default for most applications) and add MongoDB later when you have a specific workload that justifies it. The cost of migrating a few collections from PostgreSQL JSONB to MongoDB is low. The cost of migrating core relational data from MongoDB to PostgreSQL is high. Optimize for reversibility when you are uncertain.