Skip to content

pgvector Guide

PostgreSQL Vector Extension


Overview

pgvector is an open-source PostgreSQL extension for vector similarity search. It brings vector database capabilities to PostgreSQL, enabling AI/ML workloads without separate infrastructure. Ideal for teams already using PostgreSQL who need vector search capabilities.


pgvector Architecture

Extension Architecture

Key Components:

  • Vector Data Type: Store vectors as columns (up to 2000 dimensions)
  • IVFFlat Index: Inverted file index for approximate search
  • HNSW Index: Hierarchical navigable small world for fastest search
  • Vector Operators: Cosine distance, inner product, L2 distance

pgvector Installation

Installation

Terminal window
# Install pgvector extension
# macOS (Homebrew)
brew install pgvector
# Ubuntu/Debian
sudo apt install postgresql-pgvector
# From source
git clone --branch v0.5.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install
# Enable extension in PostgreSQL
psql -d your_database -c "CREATE EXTENSION vector;"

Docker Setup

Terminal window
# Docker Compose with pgvector
# docker-compose.yml
version: '3.8'
services:
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: vector_db
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:

pgvector Operations

Table Creation

-- Create table with vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
metadata JSONB,
embedding vector(1536), -- OpenAI embedding dimension
created_at TIMESTAMP DEFAULT NOW()
);
-- Create IVFFlat index (recommended for < 1M vectors)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Create HNSW index (recommended for > 1M vectors)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Index parameters:
-- IVFFlat:
-- lists: Number of lists (typically sqrt(rows))
-- HNSW:
-- m: Number of bidirectional links (16-32)
-- ef_construction: Size of dynamic list (32-512)

Insert Vectors

-- Insert vectors into pgvector
INSERT INTO documents (content, metadata, embedding) VALUES
(
'Machine learning is a subset of artificial intelligence.',
'{"category": "AI", "author": "John Doe", "date": "2025-01-27"}'::jsonb,
'[0.1, 0.2, 0.3, ...]'::vector -- 1536 dimensions
),
(
'Data engineering involves building data pipelines.',
'{"category": "DE", "author": "Jane Smith", "date": "2025-01-26"}'::jsonb,
'[0.4, 0.5, 0.6, ...]'::vector
);
-- Bulk insert with COPY
COPY documents (content, metadata, embedding)
FROM '/path/to/documents.csv'
CSV HEADER;
-- Vector similarity search
-- Cosine distance (normalized vectors, recommended for OpenAI)
SELECT
id,
content,
metadata,
1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- Inner product (faster than cosine, requires normalized vectors)
SELECT
id,
content,
metadata,
embedding <#> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- L2 distance (Euclidean, not recommended for high-dimensional vectors)
SELECT
id,
content,
metadata,
embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- Vector operators:
-- <=>: Cosine distance (recommended for OpenAI embeddings)
-- <#>: Inner product (fastest, requires normalized vectors)
-- <->: L2 distance (Euclidean, not recommended for high-dim)
-- Combine vector search with metadata filtering
SELECT
id,
content,
metadata,
1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'AI'
AND metadata->>'date' >= '2025-01-01'
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- Full-text search + vector search (hybrid)
SELECT
id,
content,
metadata,
ts_rank(text_search, query) AS text_score,
1 - (embedding <=> query_vector) AS vector_score,
(ts_rank(text_search, query) * 0.5 +
(1 - (embedding <=> query_vector)) * 0.5) AS combined_score
FROM documents,
to_tsquery('english', 'machine & learning') query,
'[0.1, 0.2, 0.3, ...]'::vector query_vector
WHERE text_search @@ query
ORDER BY combined_score DESC
LIMIT 10;

pgvector Index Types

Index Comparison

Index TypeDescriptionSpeedAccuracyUse Case
IVFFlatInverted fileFastGood< 1M vectors
HNSWHierarchical NSWVery fastExcellent> 1M vectors
NoneSequential scanSlow100%Exact match, small datasets

IVFFlat Index

-- IVFFlat index configuration
-- Create IVFFlat index with cosine distance
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Index parameters:
-- lists: Number of inverted lists
-- - Rule of thumb: sqrt(number of rows)
-- - 1000 rows → 32 lists
-- - 1M rows → 1000 lists
-- - Higher lists = Better recall, slower indexing
-- Distance operators:
-- vector_cosine_ops: Cosine distance (<=>)
-- vector_l2_ops: L2 distance (<->)
-- vector_ip_ops: Inner product (<#>)
-- Rebuild index (after significant data changes)
REINDEX INDEX documents_embedding_idx;
-- Index maintenance
VACUUM ANALYZE documents;

HNSW Index

-- HNSW index configuration (recommended for production)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
m = 16, -- Number of bidirectional links (16-32)
ef_construction = 64 -- Size of dynamic list (32-512)
);
-- HNSW parameters:
-- m: Number of bidirectional links
-- - Higher: Better recall, more memory, slower indexing
-- - Recommended: 16-32
--
-- ef_construction: Size of dynamic candidate list
-- - Higher: Better recall, slower indexing
-- - Recommended: 64-512
--
-- ef (runtime): Search parameter (set per query)
-- - Higher: Better recall, slower search
-- - Recommended: ef >= top_k
-- Set runtime ef for better recall
SET hnsw.ef_search = 100;
-- Query with specific ef
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

pgvector Performance

Performance Tuning

-- Performance optimization strategies
-- 1. Choose appropriate index type
-- IVFFlat: < 1M vectors, good recall
-- HNSW: > 1M vectors, excellent recall
-- 2. Configure index parameters
-- IVFFlat: lists = sqrt(rows)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000); -- For ~1M rows
-- HNSW: m = 16, ef_construction = 256
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);
-- 3. Use appropriate distance operator
-- Cosine (<=>): Recommended for OpenAI embeddings
-- Inner product (<#>): Faster, requires normalized vectors
-- L2 (<->): Not recommended for high-dimensional vectors
-- 4. Partition large tables
CREATE TABLE documents_partitioned (
id BIGSERIAL,
content TEXT,
metadata JSONB,
embedding vector(1536)
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE documents_2025_q1 PARTITION OF documents_partitioned
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
-- Create index per partition
CREATE INDEX ON documents_2025_q1
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);
-- 5. Use connection pooling
-- PgBouncer for connection pooling

Query Performance

-- Optimize query performance
-- 1. Use LIMIT to reduce search space
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10; -- Only search top 10
-- 2. Use metadata filters to reduce search space
SELECT id, content
FROM documents
WHERE metadata->>'category' = 'AI'
ORDER BY embedding <=> query_vector
LIMIT 10;
-- 3. Use materialized views for frequent queries
CREATE MATERIALIZED VIEW ai_documents AS
SELECT id, content, metadata, embedding
FROM documents
WHERE metadata->>'category' = 'AI';
CREATE INDEX ON ai_documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);
-- 4. Use EXPLAIN ANALYZE to optimize queries
EXPLAIN ANALYZE
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;
-- Look for:
-- - Index Scan using index_name
-- - Index Cond: (embedding <=> query_vector)

pgvector Cost Optimization

Self-Hosted Cost

ComponentCostNotes
PostgreSQL$0 (self-hosted)Hardware only
Cloud SQL$0.10-1.00/hourManaged PostgreSQL
RDS$0.10-1.00/hourManaged PostgreSQL
Storage$0.10/GB/monthEBS/Persistent disk

Cost Optimization

-- Cost optimization strategies
-- 1. Use appropriate data types
-- vector(1536) instead of vector(2048) to save space
-- Lower dimensions = Lower storage cost
-- 2. Partition old data
-- Move old partitions to cold storage
CREATE TABLE documents_archive (
LIKE documents INCLUDING ALL
);
-- Insert old data
INSERT INTO documents_archive
SELECT * FROM documents
WHERE created_at < '2024-01-01';
-- Delete from main table
DELETE FROM documents
WHERE created_at < '2024-01-01';
-- 3. Use compression
-- PostgreSQL TOAST compression automatic
-- Additional compression with pg compresstion
-- 4. Delete unused vectors
DELETE FROM documents
WHERE created_at < CURRENT_DATE - INTERVAL '90 days';
-- 5. Use connection pooling
-- Reduce connection overhead

pgvector for RAG

RAG Implementation

# RAG pipeline with pgvector
import psycopg2
from openai import OpenAI
import numpy as np
# Connect to PostgreSQL
conn = psycopg2.connect(
host="localhost",
database="vector_db",
user="postgres",
password="postgres"
)
cursor = conn.cursor()
# Initialize OpenAI
client = OpenAI(api_key="your-api-key")
def embed_text(text):
"""Generate embedding using OpenAI"""
response = client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return np.array(response.data[0].embedding)
def rag_pipeline(question: str) -> str:
"""RAG pipeline with pgvector"""
# 1. Embed question
query_vector = embed_text(question)
query_vector_str = "[" + ",".join(map(str, query_vector)) + "]"
# 2. Search pgvector for relevant documents
query = """
SELECT id, content, metadata,
1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT 5
"""
cursor.execute(query, (query_vector_str, query_vector_str))
results = cursor.fetchall()
# 3. Augment prompt with context
context = "\n".join([
f"Document {i+1}: {row[1]}\n"
for i, row in enumerate(results)
])
augmented_prompt = f"""
Context:
{context}
Question: {question}
Answer:
"""
# 4. Generate answer with LLM
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": augmented_prompt}
]
)
answer = response.choices[0].message.content
# 5. Add citations
citations = [row[0] for row in results]
return f"{answer}\n\nCitations: {citations}"
# Example usage
answer = rag_pipeline("What is machine learning?")
print(answer)

pgvector Monitoring

Metrics

-- Monitor pgvector performance
-- 1. Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan AS index_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%'
ORDER BY idx_scan DESC;
-- 2. Table size
SELECT
pg_size_pretty(pg_total_relation_size('documents')) AS total_size,
pg_size_pretty(pg_relation_size('documents')) AS table_size,
pg_size_pretty(pg_total_relation_size('documents') - pg_relation_size('documents')) AS index_size;
-- 3. Query performance
EXPLAIN ANALYZE
SELECT id, content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- Look for:
-- - Execution time
-- - Index Scan vs Seq Scan
-- - Rows removed by index
-- 4. Vector statistics
SELECT
COUNT(*) AS total_vectors,
AVG(dimensions) AS avg_dimensions,
MIN(dimensions) AS min_dimensions,
MAX(dimensions) AS max_dimensions
FROM (
SELECT array_length(embedding, 1) AS dimensions
FROM documents
) t;

pgvector Best Practices

DO

-- 1. Use HNSW for production
-- Fastest and most efficient
-- 2. Use cosine distance for OpenAI embeddings
-- <=> operator
-- 3. Partition large tables
-- Partition by date for better performance
-- 4. Use connection pooling
-- PgBouncer for production
-- 5. Monitor index usage
-- Ensure queries use indexes

DON’T

-- 1. Don't use L2 distance for high-dimensional vectors
-- Not suitable for > 100 dimensions
-- 2. Don't ignore index parameters
-- m and ef_construction matter
-- 3. Don't forget to vacuum
-- VACUUM ANALYZE for performance
-- 4. Don't skip monitoring
-- Essential for production
-- 5. Don't use low-dimensional vectors
-- 768+ dimensions recommended for quality

pgvector vs. Alternatives

| Feature | pgvector | Pinecone | Milvus | Weaviate | |---------|----------|---------|--------|----------| | | Database | PostgreSQL extension | Managed service | Standalone | Standalone | | Open Source | Yes | No | Yes | Yes | | Managed | Self-hosted only | Yes | Yes (optional) | Yes (optional) | | Index Types | IVFFlat, HNSW | HNSW | IVF, HNSW, FLAT | HNSW | | Max Dimensions | 2000 | 20000 | 32768 | Unlimited | | Metadata | JSONB columns | Metadata store | Dynamic schema | Schema | | Best For | Self-hosted, PostgreSQL shops | RAG, managed service | Open-source, on-prem | Knowledge graphs |


Key Takeaways

  1. PostgreSQL extension: No separate infrastructure needed
  2. Vector data type: Store vectors as columns (up to 2000 dimensions)
  3. Index types: IVFFlat (< 1M vectors), HNSW (> 1M vectors)
  4. Distance operators: Cosine (<=>), inner product (<#>), L2 (<->)
  5. Self-hosted: Free (hardware only), no managed option
  6. Performance: HNSW for fastest queries
  7. Integration: Seamless integration with PostgreSQL ecosystem
  8. Use When: Self-hosted, PostgreSQL already in stack, < 10M vectors

Back to Module 5