pgvector Guide

PostgreSQL Vector Extension

Overview

pgvector is an open-source PostgreSQL extension for vector similarity search. It brings vector database capabilities to PostgreSQL, enabling AI/ML workloads without separate infrastructure. Ideal for teams already using PostgreSQL who need vector search capabilities.

pgvector Architecture

Extension Architecture

Key Components:

Vector Data Type: Store vectors as columns (up to 2000 dimensions)
IVFFlat Index: Inverted file index for approximate search
HNSW Index: Hierarchical navigable small world for fastest search
Vector Operators: Cosine distance, inner product, L2 distance

pgvector Installation

Installation

# Install pgvector extension

# macOS (Homebrew)
brew install pgvector

# Ubuntu/Debian
sudo apt install postgresql-pgvector

# From source
git clone --branch v0.5.0 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

# Enable extension in PostgreSQL
psql -d your_database -c "CREATE EXTENSION vector;"

Docker Setup

# Docker Compose with pgvector

# docker-compose.yml
version: '3.8'

services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: vector_db
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

pgvector Operations

Table Creation

-- Create table with vector column

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT,
    metadata JSONB,
    embedding vector(1536),  -- OpenAI embedding dimension
    created_at TIMESTAMP DEFAULT NOW()
);

-- Create IVFFlat index (recommended for < 1M vectors)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Create HNSW index (recommended for > 1M vectors)
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Index parameters:
-- IVFFlat:
--   lists: Number of lists (typically sqrt(rows))
-- HNSW:
--   m: Number of bidirectional links (16-32)
--   ef_construction: Size of dynamic list (32-512)

Insert Vectors

-- Insert vectors into pgvector

INSERT INTO documents (content, metadata, embedding) VALUES
(
    'Machine learning is a subset of artificial intelligence.',
    '{"category": "AI", "author": "John Doe", "date": "2025-01-27"}'::jsonb,
    '[0.1, 0.2, 0.3, ...]'::vector  -- 1536 dimensions
),
(
    'Data engineering involves building data pipelines.',
    '{"category": "DE", "author": "Jane Smith", "date": "2025-01-26"}'::jsonb,
    '[0.4, 0.5, 0.6, ...]'::vector
);

-- Bulk insert with COPY
COPY documents (content, metadata, embedding)
FROM '/path/to/documents.csv'
CSV HEADER;

Vector Search

-- Vector similarity search

-- Cosine distance (normalized vectors, recommended for OpenAI)
SELECT
    id,
    content,
    metadata,
    1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

-- Inner product (faster than cosine, requires normalized vectors)
SELECT
    id,
    content,
    metadata,
    embedding <#> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <#> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

-- L2 distance (Euclidean, not recommended for high-dimensional vectors)
SELECT
    id,
    content,
    metadata,
    embedding <-> '[0.1, 0.2, 0.3, ...]'::vector AS distance
FROM documents
ORDER BY embedding <-> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

-- Vector operators:
-- <=>: Cosine distance (recommended for OpenAI embeddings)
-- <#>: Inner product (fastest, requires normalized vectors)
-- <->: L2 distance (Euclidean, not recommended for high-dim)

Hybrid Search

-- Combine vector search with metadata filtering

SELECT
    id,
    content,
    metadata,
    1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) AS similarity
FROM documents
WHERE metadata->>'category' = 'AI'
  AND metadata->>'date' >= '2025-01-01'
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

-- Full-text search + vector search (hybrid)
SELECT
    id,
    content,
    metadata,
    ts_rank(text_search, query) AS text_score,
    1 - (embedding <=> query_vector) AS vector_score,
    (ts_rank(text_search, query) * 0.5 +
     (1 - (embedding <=> query_vector)) * 0.5) AS combined_score
FROM documents,
to_tsquery('english', 'machine & learning') query,
'[0.1, 0.2, 0.3, ...]'::vector query_vector
WHERE text_search @@ query
ORDER BY combined_score DESC
LIMIT 10;

pgvector Index Types

Index Comparison

Index Type	Description	Speed	Accuracy	Use Case
IVFFlat	Inverted file	Fast	Good	< 1M vectors
HNSW	Hierarchical NSW	Very fast	Excellent	> 1M vectors
None	Sequential scan	Slow	100%	Exact match, small datasets

IVFFlat Index

-- IVFFlat index configuration

-- Create IVFFlat index with cosine distance
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Index parameters:
-- lists: Number of inverted lists
--   - Rule of thumb: sqrt(number of rows)
--   - 1000 rows → 32 lists
--   - 1M rows → 1000 lists
--   - Higher lists = Better recall, slower indexing

-- Distance operators:
-- vector_cosine_ops: Cosine distance (<=>)
-- vector_l2_ops: L2 distance (<->)
-- vector_ip_ops: Inner product (<#>)

-- Rebuild index (after significant data changes)
REINDEX INDEX documents_embedding_idx;

-- Index maintenance
VACUUM ANALYZE documents;

HNSW Index

-- HNSW index configuration (recommended for production)

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (
    m = 16,              -- Number of bidirectional links (16-32)
    ef_construction = 64  -- Size of dynamic list (32-512)
);

-- HNSW parameters:
-- m: Number of bidirectional links
--   - Higher: Better recall, more memory, slower indexing
--   - Recommended: 16-32
--
-- ef_construction: Size of dynamic candidate list
--   - Higher: Better recall, slower indexing
--   - Recommended: 64-512
--
-- ef (runtime): Search parameter (set per query)
--   - Higher: Better recall, slower search
--   - Recommended: ef >= top_k

-- Set runtime ef for better recall
SET hnsw.ef_search = 100;

-- Query with specific ef
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

pgvector Performance

Performance Tuning

-- Performance optimization strategies

-- 1. Choose appropriate index type
-- IVFFlat: < 1M vectors, good recall
-- HNSW: > 1M vectors, excellent recall

-- 2. Configure index parameters
-- IVFFlat: lists = sqrt(rows)
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);  -- For ~1M rows

-- HNSW: m = 16, ef_construction = 256
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);

-- 3. Use appropriate distance operator
-- Cosine (<=>): Recommended for OpenAI embeddings
-- Inner product (<#>): Faster, requires normalized vectors
-- L2 (<->): Not recommended for high-dimensional vectors

-- 4. Partition large tables
CREATE TABLE documents_partitioned (
    id BIGSERIAL,
    content TEXT,
    metadata JSONB,
    embedding vector(1536)
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE documents_2025_q1 PARTITION OF documents_partitioned
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');

-- Create index per partition
CREATE INDEX ON documents_2025_q1
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);

-- 5. Use connection pooling
-- PgBouncer for connection pooling

Query Performance

-- Optimize query performance

-- 1. Use LIMIT to reduce search space
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;  -- Only search top 10

-- 2. Use metadata filters to reduce search space
SELECT id, content
FROM documents
WHERE metadata->>'category' = 'AI'
ORDER BY embedding <=> query_vector
LIMIT 10;

-- 3. Use materialized views for frequent queries
CREATE MATERIALIZED VIEW ai_documents AS
SELECT id, content, metadata, embedding
FROM documents
WHERE metadata->>'category' = 'AI';

CREATE INDEX ON ai_documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 256);

-- 4. Use EXPLAIN ANALYZE to optimize queries
EXPLAIN ANALYZE
SELECT id, content
FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

-- Look for:
-- - Index Scan using index_name
-- - Index Cond: (embedding <=> query_vector)

pgvector Cost Optimization

Self-Hosted Cost

Component	Cost	Notes
PostgreSQL	$0 (self-hosted)	Hardware only
Cloud SQL	$0.10-1.00/hour	Managed PostgreSQL
RDS	$0.10-1.00/hour	Managed PostgreSQL
Storage	$0.10/GB/month	EBS/Persistent disk

Cost Optimization

-- Cost optimization strategies

-- 1. Use appropriate data types
-- vector(1536) instead of vector(2048) to save space
-- Lower dimensions = Lower storage cost

-- 2. Partition old data
-- Move old partitions to cold storage
CREATE TABLE documents_archive (
    LIKE documents INCLUDING ALL
);

-- Insert old data
INSERT INTO documents_archive
SELECT * FROM documents
WHERE created_at < '2024-01-01';

-- Delete from main table
DELETE FROM documents
WHERE created_at < '2024-01-01';

-- 3. Use compression
-- PostgreSQL TOAST compression automatic
-- Additional compression with pg compresstion

-- 4. Delete unused vectors
DELETE FROM documents
WHERE created_at < CURRENT_DATE - INTERVAL '90 days';

-- 5. Use connection pooling
-- Reduce connection overhead

pgvector for RAG

RAG Implementation

# RAG pipeline with pgvector

import psycopg2
from openai import OpenAI
import numpy as np

# Connect to PostgreSQL
conn = psycopg2.connect(
    host="localhost",
    database="vector_db",
    user="postgres",
    password="postgres"
)
cursor = conn.cursor()

# Initialize OpenAI
client = OpenAI(api_key="your-api-key")

def embed_text(text):
    """Generate embedding using OpenAI"""
    response = client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return np.array(response.data[0].embedding)

def rag_pipeline(question: str) -> str:
    """RAG pipeline with pgvector"""

    # 1. Embed question
    query_vector = embed_text(question)
    query_vector_str = "[" + ",".join(map(str, query_vector)) + "]"

    # 2. Search pgvector for relevant documents
    query = """
        SELECT id, content, metadata,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT 5
    """

    cursor.execute(query, (query_vector_str, query_vector_str))
    results = cursor.fetchall()

    # 3. Augment prompt with context
    context = "\n".join([
        f"Document {i+1}: {row[1]}\n"
        for i, row in enumerate(results)
    ])

    augmented_prompt = f"""
    Context:
    {context}

    Question: {question}

    Answer:
    """

    # 4. Generate answer with LLM
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": augmented_prompt}
        ]
    )

    answer = response.choices[0].message.content

    # 5. Add citations
    citations = [row[0] for row in results]

    return f"{answer}\n\nCitations: {citations}"

# Example usage
answer = rag_pipeline("What is machine learning?")
print(answer)

pgvector Monitoring

Metrics

-- Monitor pgvector performance

-- 1. Index usage
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%'
ORDER BY idx_scan DESC;

-- 2. Table size
SELECT
    pg_size_pretty(pg_total_relation_size('documents')) AS total_size,
    pg_size_pretty(pg_relation_size('documents')) AS table_size,
    pg_size_pretty(pg_total_relation_size('documents') - pg_relation_size('documents')) AS index_size;

-- 3. Query performance
EXPLAIN ANALYZE
SELECT id, content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;

-- Look for:
-- - Execution time
-- - Index Scan vs Seq Scan
-- - Rows removed by index

-- 4. Vector statistics
SELECT
    COUNT(*) AS total_vectors,
    AVG(dimensions) AS avg_dimensions,
    MIN(dimensions) AS min_dimensions,
    MAX(dimensions) AS max_dimensions
FROM (
    SELECT array_length(embedding, 1) AS dimensions
    FROM documents
) t;

pgvector Best Practices

DO

-- 1. Use HNSW for production
-- Fastest and most efficient

-- 2. Use cosine distance for OpenAI embeddings
-- <=> operator

-- 3. Partition large tables
-- Partition by date for better performance

-- 4. Use connection pooling
-- PgBouncer for production

-- 5. Monitor index usage
-- Ensure queries use indexes

DON’T

-- 1. Don't use L2 distance for high-dimensional vectors
-- Not suitable for > 100 dimensions

-- 2. Don't ignore index parameters
-- m and ef_construction matter

-- 3. Don't forget to vacuum
-- VACUUM ANALYZE for performance

-- 4. Don't skip monitoring
-- Essential for production

-- 5. Don't use low-dimensional vectors
-- 768+ dimensions recommended for quality

pgvector vs. Alternatives

| Feature | pgvector | Pinecone | Milvus | Weaviate | |---------|----------|---------|--------|----------| | | Database | PostgreSQL extension | Managed service | Standalone | Standalone | | Open Source | Yes | No | Yes | Yes | | Managed | Self-hosted only | Yes | Yes (optional) | Yes (optional) | | Index Types | IVFFlat, HNSW | HNSW | IVF, HNSW, FLAT | HNSW | | Max Dimensions | 2000 | 20000 | 32768 | Unlimited | | Metadata | JSONB columns | Metadata store | Dynamic schema | Schema | | Best For | Self-hosted, PostgreSQL shops | RAG, managed service | Open-source, on-prem | Knowledge graphs |

Key Takeaways

PostgreSQL extension: No separate infrastructure needed
Vector data type: Store vectors as columns (up to 2000 dimensions)
Index types: IVFFlat (< 1M vectors), HNSW (> 1M vectors)
Distance operators: Cosine (<=>), inner product (<#>), L2 (<->)
Self-hosted: Free (hardware only), no managed option
Performance: HNSW for fastest queries
Integration: Seamless integration with PostgreSQL ecosystem
Use When: Self-hosted, PostgreSQL already in stack, < 10M vectors

Back to Module 5