pgvector Guide
PostgreSQL Vector Extension
Overview
pgvector is an open-source PostgreSQL extension for vector similarity search. It brings vector database capabilities to PostgreSQL, enabling AI/ML workloads without separate infrastructure. Ideal for teams already using PostgreSQL who need vector search capabilities.
pgvector Architecture
Extension Architecture
Key Components:
- Vector Data Type: Store vectors as columns (up to 2000 dimensions)
- IVFFlat Index: Inverted file index for approximate search
- HNSW Index: Hierarchical navigable small world for fastest search
- Vector Operators: Cosine distance, inner product, L2 distance
pgvector Installation
Installation
# Install pgvector extension
# macOS (Homebrew)brew install pgvector
# Ubuntu/Debiansudo apt install postgresql-pgvector
# From sourcegit clone --branch v0.5.0 https://github.com/pgvector/pgvector.gitcd pgvectormakesudo make install
# Enable extension in PostgreSQLpsql -d your_database -c "CREATE EXTENSION vector;"Docker Setup
# Docker Compose with pgvector
# docker-compose.ymlversion: '3.8'
services: postgres: image: pgvector/pgvector:pg16 environment: POSTGRES_DB: vector_db POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data
volumes: postgres_data:pgvector Operations
Table Creation
-- Create table with vector column
CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY, content TEXT, metadata JSONB, embedding vector(1536), -- OpenAI embedding dimension created_at TIMESTAMP DEFAULT NOW());
-- Create IVFFlat index (recommended for < 1M vectors)CREATE INDEX ON documentsUSING ivfflat (embedding vector_cosine_ops)WITH (lists = 100);
-- Create HNSW index (recommended for > 1M vectors)CREATE INDEX ON documentsUSING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 64);
-- Index parameters:-- IVFFlat:-- lists: Number of lists (typically sqrt(rows))-- HNSW:-- m: Number of bidirectional links (16-32)-- ef_construction: Size of dynamic list (32-512)Insert Vectors
-- Insert vectors into pgvector
INSERT INTO documents (content, metadata, embedding) VALUES( 'Machine learning is a subset of artificial intelligence.', '{"category": "AI", "author": "John Doe", "date": "2025-01-27"}'::jsonb, '[0.1, 0.2, 0.3, ...]'::vector -- 1536 dimensions),( 'Data engineering involves building data pipelines.', '{"category": "DE", "author": "Jane Smith", "date": "2025-01-26"}'::jsonb, '[0.4, 0.5, 0.6, ...]'::vector);
-- Bulk insert with COPYCOPY documents (content, metadata, embedding)FROM '/path/to/documents.csv'CSV HEADER;Vector Search
-- Vector similarity search
-- Cosine distance (normalized vectors, recommended for OpenAI)SELECT id, content, metadata, 1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarityFROM documentsORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;
-- Inner product (faster than cosine, requires normalized vectors)SELECT id, content, metadata, embedding <#> '[0.1, 0.2, 0.3, ...]'::vector AS distanceFROM documentsORDER BY embedding <#> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;
-- L2 distance (Euclidean, not recommended for high-dimensional vectors)SELECT id, content, metadata, embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distanceFROM documentsORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;
-- Vector operators:-- <=>: Cosine distance (recommended for OpenAI embeddings)-- <#>: Inner product (fastest, requires normalized vectors)-- <->: L2 distance (Euclidean, not recommended for high-dim)Hybrid Search
-- Combine vector search with metadata filtering
SELECT id, content, metadata, 1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarityFROM documentsWHERE metadata->>'category' = 'AI' AND metadata->>'date' >= '2025-01-01'ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;
-- Full-text search + vector search (hybrid)SELECT id, content, metadata, ts_rank(text_search, query) AS text_score, 1 - (embedding <=> query_vector) AS vector_score, (ts_rank(text_search, query) * 0.5 + (1 - (embedding <=> query_vector)) * 0.5) AS combined_scoreFROM documents,to_tsquery('english', 'machine & learning') query,'[0.1, 0.2, 0.3, ...]'::vector query_vectorWHERE text_search @@ queryORDER BY combined_score DESCLIMIT 10;pgvector Index Types
Index Comparison
| Index Type | Description | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| IVFFlat | Inverted file | Fast | Good | < 1M vectors |
| HNSW | Hierarchical NSW | Very fast | Excellent | > 1M vectors |
| None | Sequential scan | Slow | 100% | Exact match, small datasets |
IVFFlat Index
-- IVFFlat index configuration
-- Create IVFFlat index with cosine distanceCREATE INDEX ON documentsUSING ivfflat (embedding vector_cosine_ops)WITH (lists = 100);
-- Index parameters:-- lists: Number of inverted lists-- - Rule of thumb: sqrt(number of rows)-- - 1000 rows → 32 lists-- - 1M rows → 1000 lists-- - Higher lists = Better recall, slower indexing
-- Distance operators:-- vector_cosine_ops: Cosine distance (<=>)-- vector_l2_ops: L2 distance (<->)-- vector_ip_ops: Inner product (<#>)
-- Rebuild index (after significant data changes)REINDEX INDEX documents_embedding_idx;
-- Index maintenanceVACUUM ANALYZE documents;HNSW Index
-- HNSW index configuration (recommended for production)
CREATE INDEX ON documentsUSING hnsw (embedding vector_cosine_ops)WITH ( m = 16, -- Number of bidirectional links (16-32) ef_construction = 64 -- Size of dynamic list (32-512));
-- HNSW parameters:-- m: Number of bidirectional links-- - Higher: Better recall, more memory, slower indexing-- - Recommended: 16-32---- ef_construction: Size of dynamic candidate list-- - Higher: Better recall, slower indexing-- - Recommended: 64-512---- ef (runtime): Search parameter (set per query)-- - Higher: Better recall, slower search-- - Recommended: ef >= top_k
-- Set runtime ef for better recallSET hnsw.ef_search = 100;
-- Query with specific efSELECT id, contentFROM documentsORDER BY embedding <=> query_vectorLIMIT 10;pgvector Performance
Performance Tuning
-- Performance optimization strategies
-- 1. Choose appropriate index type-- IVFFlat: < 1M vectors, good recall-- HNSW: > 1M vectors, excellent recall
-- 2. Configure index parameters-- IVFFlat: lists = sqrt(rows)CREATE INDEX ON documentsUSING ivfflat (embedding vector_cosine_ops)WITH (lists = 1000); -- For ~1M rows
-- HNSW: m = 16, ef_construction = 256CREATE INDEX ON documentsUSING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 256);
-- 3. Use appropriate distance operator-- Cosine (<=>): Recommended for OpenAI embeddings-- Inner product (<#>): Faster, requires normalized vectors-- L2 (<->): Not recommended for high-dimensional vectors
-- 4. Partition large tablesCREATE TABLE documents_partitioned ( id BIGSERIAL, content TEXT, metadata JSONB, embedding vector(1536)) PARTITION BY RANGE (created_at);
-- Create partitionsCREATE TABLE documents_2025_q1 PARTITION OF documents_partitionedFOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
-- Create index per partitionCREATE INDEX ON documents_2025_q1USING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 256);
-- 5. Use connection pooling-- PgBouncer for connection poolingQuery Performance
-- Optimize query performance
-- 1. Use LIMIT to reduce search spaceSELECT id, contentFROM documentsORDER BY embedding <=> query_vectorLIMIT 10; -- Only search top 10
-- 2. Use metadata filters to reduce search spaceSELECT id, contentFROM documentsWHERE metadata->>'category' = 'AI'ORDER BY embedding <=> query_vectorLIMIT 10;
-- 3. Use materialized views for frequent queriesCREATE MATERIALIZED VIEW ai_documents ASSELECT id, content, metadata, embeddingFROM documentsWHERE metadata->>'category' = 'AI';
CREATE INDEX ON ai_documentsUSING hnsw (embedding vector_cosine_ops)WITH (m = 16, ef_construction = 256);
-- 4. Use EXPLAIN ANALYZE to optimize queriesEXPLAIN ANALYZESELECT id, contentFROM documentsORDER BY embedding <=> query_vectorLIMIT 10;
-- Look for:-- - Index Scan using index_name-- - Index Cond: (embedding <=> query_vector)pgvector Cost Optimization
Self-Hosted Cost
| Component | Cost | Notes |
|---|---|---|
| PostgreSQL | $0 (self-hosted) | Hardware only |
| Cloud SQL | $0.10-1.00/hour | Managed PostgreSQL |
| RDS | $0.10-1.00/hour | Managed PostgreSQL |
| Storage | $0.10/GB/month | EBS/Persistent disk |
Cost Optimization
-- Cost optimization strategies
-- 1. Use appropriate data types-- vector(1536) instead of vector(2048) to save space-- Lower dimensions = Lower storage cost
-- 2. Partition old data-- Move old partitions to cold storageCREATE TABLE documents_archive ( LIKE documents INCLUDING ALL);
-- Insert old dataINSERT INTO documents_archiveSELECT * FROM documentsWHERE created_at < '2024-01-01';
-- Delete from main tableDELETE FROM documentsWHERE created_at < '2024-01-01';
-- 3. Use compression-- PostgreSQL TOAST compression automatic-- Additional compression with pg compresstion
-- 4. Delete unused vectorsDELETE FROM documentsWHERE created_at < CURRENT_DATE - INTERVAL '90 days';
-- 5. Use connection pooling-- Reduce connection overheadpgvector for RAG
RAG Implementation
# RAG pipeline with pgvector
import psycopg2from openai import OpenAIimport numpy as np
# Connect to PostgreSQLconn = psycopg2.connect( host="localhost", database="vector_db", user="postgres", password="postgres")cursor = conn.cursor()
# Initialize OpenAIclient = OpenAI(api_key="your-api-key")
def embed_text(text): """Generate embedding using OpenAI""" response = client.embeddings.create( model="text-embedding-ada-002", input=text ) return np.array(response.data[0].embedding)
def rag_pipeline(question: str) -> str: """RAG pipeline with pgvector"""
# 1. Embed question query_vector = embed_text(question) query_vector_str = "[" + ",".join(map(str, query_vector)) + "]"
# 2. Search pgvector for relevant documents query = """ SELECT id, content, metadata, 1 - (embedding <=> %s::vector) AS similarity FROM documents ORDER BY embedding <=> %s::vector LIMIT 5 """
cursor.execute(query, (query_vector_str, query_vector_str)) results = cursor.fetchall()
# 3. Augment prompt with context context = "\n".join([ f"Document {i+1}: {row[1]}\n" for i, row in enumerate(results) ])
augmented_prompt = f""" Context: {context}
Question: {question}
Answer: """
# 4. Generate answer with LLM response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": augmented_prompt} ] )
answer = response.choices[0].message.content
# 5. Add citations citations = [row[0] for row in results]
return f"{answer}\n\nCitations: {citations}"
# Example usageanswer = rag_pipeline("What is machine learning?")print(answer)pgvector Monitoring
Metrics
-- Monitor pgvector performance
-- 1. Index usageSELECT schemaname, tablename, indexname, idx_scan AS index_scans, idx_tup_read AS tuples_read, idx_tup_fetch AS tuples_fetchedFROM pg_stat_user_indexesWHERE indexname LIKE '%embedding%'ORDER BY idx_scan DESC;
-- 2. Table sizeSELECT pg_size_pretty(pg_total_relation_size('documents')) AS total_size, pg_size_pretty(pg_relation_size('documents')) AS table_size, pg_size_pretty(pg_total_relation_size('documents') - pg_relation_size('documents')) AS index_size;
-- 3. Query performanceEXPLAIN ANALYZESELECT id, contentFROM documentsORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vectorLIMIT 10;
-- Look for:-- - Execution time-- - Index Scan vs Seq Scan-- - Rows removed by index
-- 4. Vector statisticsSELECT COUNT(*) AS total_vectors, AVG(dimensions) AS avg_dimensions, MIN(dimensions) AS min_dimensions, MAX(dimensions) AS max_dimensionsFROM ( SELECT array_length(embedding, 1) AS dimensions FROM documents) t;pgvector Best Practices
DO
-- 1. Use HNSW for production-- Fastest and most efficient
-- 2. Use cosine distance for OpenAI embeddings-- <=> operator
-- 3. Partition large tables-- Partition by date for better performance
-- 4. Use connection pooling-- PgBouncer for production
-- 5. Monitor index usage-- Ensure queries use indexesDON’T
-- 1. Don't use L2 distance for high-dimensional vectors-- Not suitable for > 100 dimensions
-- 2. Don't ignore index parameters-- m and ef_construction matter
-- 3. Don't forget to vacuum-- VACUUM ANALYZE for performance
-- 4. Don't skip monitoring-- Essential for production
-- 5. Don't use low-dimensional vectors-- 768+ dimensions recommended for qualitypgvector vs. Alternatives
| Feature | pgvector | Pinecone | Milvus | Weaviate | |---------|----------|---------|--------|----------| | | Database | PostgreSQL extension | Managed service | Standalone | Standalone | | Open Source | Yes | No | Yes | Yes | | Managed | Self-hosted only | Yes | Yes (optional) | Yes (optional) | | Index Types | IVFFlat, HNSW | HNSW | IVF, HNSW, FLAT | HNSW | | Max Dimensions | 2000 | 20000 | 32768 | Unlimited | | Metadata | JSONB columns | Metadata store | Dynamic schema | Schema | | Best For | Self-hosted, PostgreSQL shops | RAG, managed service | Open-source, on-prem | Knowledge graphs |
Key Takeaways
- PostgreSQL extension: No separate infrastructure needed
- Vector data type: Store vectors as columns (up to 2000 dimensions)
- Index types: IVFFlat (< 1M vectors), HNSW (> 1M vectors)
- Distance operators: Cosine (<=>), inner product (<#>), L2 (<->)
- Self-hosted: Free (hardware only), no managed option
- Performance: HNSW for fastest queries
- Integration: Seamless integration with PostgreSQL ecosystem
- Use When: Self-hosted, PostgreSQL already in stack, < 10M vectors
Back to Module 5