Skip links

Milvus vs Pinecone vs pgvector: Choosing a Vector Database

The Vector Database Decision

If you’re building anything with embeddings — semantic search, RAG pipelines, recommendation engines, image similarity — you need somewhere to store and query vectors. The market has exploded with options, and the decision isn’t obvious. We’ve deployed all three of the databases in this comparison (Milvus, Pinecone, pgvector) across different projects at Harbor Software, and the right choice depends on factors that benchmarks alone won’t tell you.

Article Overview

Milvus vs Pinecone vs pgvector: Choosing a Vector Database

9 sections · Reading flow

01
The Vector Database Decision
02
What Vector Databases Actually Do
03
pgvector: The Pragmatic Default
04
Pinecone: The Managed Path
05
Milvus: The Scale-First Engine
06
Head-to-Head Comparison
07
Our Decision Framework
08
Building an Abstraction Layer
09
The Real Answer

HARBOR SOFTWARE · Engineering Insights

This isn’t a feature-matrix comparison you can find in any vendor’s marketing material. This is a practical evaluation based on building production systems: what works, what breaks, and what matters when you’re past the prototype stage.

What Vector Databases Actually Do

Before comparing, let’s be precise about the problem. A vector database stores high-dimensional vectors (typically 256-4096 dimensions, produced by embedding models) and supports approximate nearest neighbor (ANN) search — finding the vectors most similar to a query vector.

The “approximate” part is key. Exact nearest neighbor search in high dimensions is computationally prohibitive at scale. Every vector database uses some form of ANN index to trade a small amount of accuracy for dramatically better performance. The indexes they use (HNSW, IVF, DiskANN variants) and how they manage them is where the real differences emerge.

pgvector: The Pragmatic Default

pgvector is a PostgreSQL extension that adds vector storage and ANN search to your existing Postgres database. No new infrastructure. No new operational burden. No new vendor.

When to Use It

pgvector is the right choice when you have fewer than 5-10 million vectors, your query latency requirements are in the 10-100ms range (not sub-millisecond), and you’re already running PostgreSQL. It’s also the right choice when you’re in the early stages of a product and don’t yet know if vector search will be a core feature or an experiment that gets cut.

We’ve used pgvector for RAG pipelines serving up to 2 million document chunks with query latencies under 50ms. For that scale, adding a dedicated vector database would have been over-engineering.

Setup and Usage

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with a vector column
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI ada-002 dimensionality
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create an HNSW index (recommended for most use cases)
CREATE INDEX idx_documents_embedding ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

-- Query: find 10 most similar documents
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE metadata->>'category' = 'engineering'  -- filtered search
ORDER BY embedding <=> $1
LIMIT 10;

The HNSW index parameters matter. m controls the number of bi-directional links per node (higher = better recall, more memory). ef_construction controls the quality of the index build (higher = better recall, slower builds). The defaults are conservative; for production workloads we typically use m = 16 and ef_construction = 200.

At query time, you can tune ef_search for the recall/speed trade-off:

-- Higher ef_search = better recall, slower queries
SET hnsw.ef_search = 100;  -- default is 40

SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

Strengths

  • Zero new infrastructure: If you’re on PostgreSQL, you add an extension and you’re done. No new services to deploy, monitor, or pay for.
  • Transactional consistency: Vector data participates in normal PostgreSQL transactions. Insert a document and its embedding atomically. No eventual consistency headaches.
  • Combined queries: You can filter by metadata, join with other tables, and do vector search in a single SQL query. This is enormously powerful for filtered search where you need vectors matching a criteria (e.g., “documents from the last 30 days similar to this query”).
  • Familiar tooling: pg_dump, pg_restore, EXPLAIN ANALYZE, all your existing monitoring — it all works.

Weaknesses

  • Scale ceiling: Performance degrades noticeably above 5-10 million vectors on a single instance. You can shard, but at that point you’re building distributed systems on top of Postgres, which defeats the simplicity advantage.
  • Memory consumption: HNSW indexes are memory-resident. 1 million 1536-dimension float32 vectors consume roughly 6GB of memory for the index alone. Plan your instance sizing accordingly.
  • Index build time: Building an HNSW index on millions of vectors can take hours. During the build, queries fall back to sequential scan. Plan for this during initial data loads.
  • No built-in sharding: If you outgrow one instance, you’re on your own for data distribution.

Pinecone: The Managed Path

Pinecone is a fully managed vector database service. You get an API endpoint, you upsert vectors, you query. No infrastructure to manage, no indexes to tune, no scaling to configure.

When to Use It

Pinecone is the right choice when your team doesn’t have (or want) vector database operational expertise, you need to scale beyond what a single PostgreSQL instance can handle, and you’re willing to pay for managed infrastructure. It’s also a strong choice when your vector search workload is decoupled from your transactional database — when you don’t need to join vector results with relational data in a single query.

Architecture and Usage

import { Pinecone } from '@pinecone-database/pinecone';

const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pc.index('documents');

// Upsert vectors
await index.namespace('engineering').upsert([
  {
    id: 'doc-001',
    values: embedding,  // float32 array, length 1536
    metadata: {
      title: 'System Design Overview',
      category: 'engineering',
      publishedAt: '2025-01-15',
      wordCount: 2400
    }
  }
]);

// Query with metadata filtering
const results = await index.namespace('engineering').query({
  vector: queryEmbedding,
  topK: 10,
  filter: {
    category: { $eq: 'engineering' },
    wordCount: { $gte: 1000 }
  },
  includeMetadata: true
});

// results.matches: [{ id, score, metadata }, ...]

Pinecone’s namespace concept is useful for logical separation within an index. We use namespaces to separate data by tenant in multi-tenant RAG applications, or by content type (documents, images, products) when different embedding models produce different vector dimensions.

Strengths

  • True zero-ops: No servers, no scaling knobs, no index management. Pinecone handles replication, sharding, and index optimization automatically.
  • Consistent performance at scale: We’ve tested with 50 million vectors and query latencies stayed under 50ms at p99. Pinecone’s architecture is designed for this.
  • Serverless tier: The serverless offering charges per query and per GB stored, with no minimum. For low-traffic applications or development environments, costs can be very low.
  • Metadata filtering: Filter support is solid — equality, range, set membership. Filters are applied before the ANN search, so they don’t degrade recall.

Weaknesses

  • Vendor lock-in: Your data is in Pinecone’s proprietary system. There’s no standard format to export and reimport into another vector database. Plan your abstraction layer accordingly.
  • No transactional guarantees: Upserts are eventually consistent. If you upsert a vector and immediately query for it, you might not find it. The propagation delay is typically under a second, but it exists.
  • Metadata limitations: Metadata values have size limits. You can’t store full document text in metadata — just short fields for filtering. You’ll need a separate data store for the actual content.
  • Cost at scale: The pod-based tier can get expensive for large datasets with high query volumes. We’ve seen monthly bills exceed $2,000 for a moderately sized production workload that would cost $200/month on self-hosted alternatives.
  • No complex queries: You can filter metadata and do vector search. You cannot join, aggregate, or run SQL-like queries. Every complex operation requires round-trips between Pinecone and your application database.

Milvus: The Scale-First Engine

Milvus is an open-source, distributed vector database built from the ground up for large-scale vector search. It separates storage and compute, supports multiple index types, and can handle billions of vectors across a cluster.

When to Use It

Milvus is the right choice when you have (or expect to have) hundreds of millions to billions of vectors, you need fine-grained control over index types and search parameters, and you have the operational capacity to run a distributed system. It’s also the right choice when you can’t use a managed service due to compliance, data residency, or cost constraints.

Architecture and Deployment

Milvus has two deployment modes. Milvus Lite runs embedded in your application process — useful for development and small datasets. Milvus Distributed runs as a cluster of microservices (proxy, data node, index node, query node) coordinated by etcd and backed by object storage (S3/MinIO) for persistence.

For production, the distributed deployment looks like this:

# docker-compose.yml (simplified production setup)
services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.5
    command: etcd --listen-client-urls=http://0.0.0.0:2379

  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"

  milvus:
    image: milvusdb/milvus:v2.4-latest
    command: milvus run standalone
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    ports:
      - "19530:19530"  # gRPC
      - "9091:9091"    # Metrics
    depends_on:
      - etcd
      - minio

And the application code:

from pymilvus import (
    connections, Collection, FieldSchema,
    CollectionSchema, DataType, utility
)

# Connect
connections.connect(host='localhost', port='19530')

# Define schema
fields = [
    FieldSchema(name='id', dtype=DataType.VARCHAR, max_length=64, is_primary=True),
    FieldSchema(name='title', dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name='category', dtype=DataType.VARCHAR, max_length=128),
    FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1536),
]
schema = CollectionSchema(fields, description='Document embeddings')

# Create collection
collection = Collection('documents', schema)

# Create HNSW index
index_params = {
    'index_type': 'HNSW',
    'metric_type': 'COSINE',
    'params': {
        'M': 16,
        'efConstruction': 256
    }
}
collection.create_index('embedding', index_params)

# Load into memory for querying
collection.load()

# Search
results = collection.search(
    data=[query_embedding],
    anns_field='embedding',
    param={'metric_type': 'COSINE', 'params': {'ef': 128}},
    limit=10,
    expr='category == "engineering"',
    output_fields=['title', 'category']
)

Strengths

  • Scale: Milvus handles billions of vectors across distributed nodes. If you’re building a search engine for a large corpus, this is where Milvus shines. The architecture separates storage (S3/MinIO) from compute (query nodes), so you can scale them independently.
  • Index variety: HNSW, IVF_FLAT, IVF_SQ8, IVF_PQ, DiskANN, GPU indexes. Different workloads benefit from different index types. Milvus lets you choose and switch without re-architecting.
  • Open source: No vendor lock-in. You can deploy on any cloud, on-premises, or air-gapped. Zilliz Cloud offers a managed version if you want the Milvus engine without the ops.
  • Hybrid search: Milvus supports combining vector similarity with scalar filtering, full-text search, and even multi-vector queries (searching across multiple vector fields simultaneously).
  • GPU acceleration: For workloads requiring extremely low latency or extremely high throughput, Milvus supports GPU-accelerated indexes (GPU_IVF_FLAT, GPU_CAGRA). We’ve used this for real-time image similarity where sub-5ms latency was a requirement.

Weaknesses

  • Operational complexity: Running Milvus in production means managing etcd, object storage, and multiple Milvus node types. This is a distributed system with all the associated complexity — network partitions, node failures, rolling upgrades. You need a team that’s comfortable with this.
  • Memory requirements: Milvus loads collection data into memory for querying. A collection with 100 million 1536-dim vectors needs roughly 600GB of memory across query nodes. You can use DiskANN to trade memory for disk, but at a latency cost.
  • Learning curve: The API has concepts (partitions, segments, compaction) that don’t exist in simpler solutions. The documentation has improved significantly, but it’s still a steeper on-ramp than pgvector or Pinecone.
  • Eventual consistency by default: Like Pinecone, inserts are not immediately visible. You can force consistency with a consistency_level parameter, but “Strong” consistency has a performance cost.

Head-to-Head Comparison

Here’s how the three compare across the dimensions that actually matter in production:

| Dimension              | pgvector         | Pinecone         | Milvus           |
|------------------------|------------------|------------------|------------------|
| Max practical scale    | ~10M vectors     | 100M+ vectors    | 1B+ vectors      |
| Query latency (p50)    | 10-50ms          | 10-30ms          | 5-30ms           |
| Ops burden             | Near zero        | Zero             | Significant      |
| Cost (10M vectors)     | ~$50/mo (RDS)    | ~$300/mo         | ~$150/mo (self)  |
| Filtered search        | Excellent (SQL)  | Good             | Good             |
| Transactional          | Yes              | No               | No               |
| Joins with app data    | Native (SQL)     | Requires app     | Requires app     |
| GPU acceleration       | No               | N/A (managed)    | Yes              |
| Vendor lock-in         | None             | High             | None (open src)  |
| Time to production     | Hours            | Minutes          | Days             |

Our Decision Framework

After deploying all three in production, here’s the decision tree we follow:

  1. Are you already on PostgreSQL and expect fewer than 5 million vectors? Use pgvector. Don’t overthink it. You can migrate later if you outgrow it.
  2. Do you need to join vector search results with relational data in the same query? Use pgvector. The ability to do SELECT * FROM documents d JOIN authors a ON d.author_id = a.id ORDER BY d.embedding <=> $1 LIMIT 10 in a single query is uniquely powerful.
  3. Do you need to scale beyond 10 million vectors and don’t want to manage infrastructure? Use Pinecone. Accept the vendor lock-in and the cost. Your engineering time is worth more than the price difference.
  4. Do you need to scale to hundreds of millions or billions of vectors, or need GPU-accelerated search, or can’t use a managed service? Use Milvus. Budget for the operational complexity.
  5. Are you building a prototype or MVP? Use pgvector or Pinecone’s free tier. Do not set up Milvus for a prototype. You’ll spend more time on infrastructure than on your actual product.

Building an Abstraction Layer

Whichever you choose, wrap it in an abstraction. Vector database technology is moving fast, and you may want to switch. Here’s the interface we use:

// lib/vector-store.ts
export interface VectorSearchResult {
  id: string;
  score: number;
  metadata: Record<string, unknown>;
}

export interface VectorStore {
  upsert(vectors: {
    id: string;
    values: number[];
    metadata?: Record<string, unknown>;
  }[]): Promise<void>;

  query(params: {
    vector: number[];
    topK: number;
    filter?: Record<string, unknown>;
  }): Promise<VectorSearchResult[]>;

  delete(ids: string[]): Promise<void>;
}

// Implementations
export class PgVectorStore implements VectorStore { /* ... */ }
export class PineconeVectorStore implements VectorStore { /* ... */ }
export class MilvusVectorStore implements VectorStore { /* ... */ }

This abstraction has saved us twice — once when a client outgrew pgvector and migrated to Pinecone, and once when a client moved from Pinecone to Milvus to reduce costs. In both cases, the migration was a backend change that the application layer didn’t need to know about.

The Real Answer

Most teams reading this should start with pgvector. It’s not the fastest, it’s not the most scalable, but it eliminates an entire category of operational complexity. You can build, ship, and iterate on your AI features without managing a new piece of infrastructure. When — and only when — you hit pgvector’s limits, you’ll have enough production data and usage patterns to make an informed decision about what to migrate to.

The worst outcome is spending two weeks setting up Milvus for an application that serves 100 queries per day and stores 50,000 vectors. We’ve seen it happen. Start simple. Scale when the numbers demand it.

Leave a comment

Explore
Drag