Milvus Guide

Open-Source Vector Database

Overview

Milvus is an open-source vector database built for scalable similarity search and AI applications. It provides high-performance vector search, cloud-native architecture, and supports multiple index types and distance metrics.

Milvus Architecture

Cloud-Native Architecture

Key Components:

Proxy: API gateway for load balancing
Root Coordinator: Manages query execution
Query Node: Executes search queries
Index Node: Manages vector indices
Object Storage: Stores vector data and metadata
Etcd: Metadata and coordination

Milvus Installation

Docker Deployment

version: '3.5'

services:
  etcd:
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - etcd:/etcd
    command: etcd --advertise-client-urls=http://0.0.0.0:2379

  minio:
    image: minio/minio:latest
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - minio:/minio
    command: server /data

  milvus-standalone:
    image: milvusdb/milvus:latest
    ports:
      - "19530:19530"
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
      common.storageType: minio
    depends_on:
      - etcd
      - minio

  attu:
    image: zilliz/attu:v0.3.0
    ports:
      - "3000:3000"
    depends_on:
      - milvus-standalone

volumes:
  etcd:
  minio:

Milvus Standalone

# Install Milvus standalone
# Using Docker (recommended)

docker run -d --name milvus-standalone \
  -p 19530:19530 \
  -v /path/to/milvus:/milvus \
  -e ETCD_ENDPOINTS=etcd:2379 \
  -e MINIO_ADDRESS=minio:9000 \
  milvusdb/milvus:latest

# Access Attu UI (web interface)
# http://localhost:3000

Milvus Operations

Collection Creation

from pymilvus import connections, Field, Collection, DataType
from pymilvus import MilvusClient

# Connect to Milvus
client = MilvusClient(
    alias="default",
    uri="http://localhost:19530"
)

# Define collection schema
collection_name = "ml_documents"

fields = [
    Field("id", DataType.VARCHAR, is_primary=True, auto_id=True),
    Field("embedding", DataType.FLOAT_VECTOR, dim=768),
    Field("text", DataType.VARCHAR, max_length=65535),
    Field("category", dtype=DataType.VARCHAR, max_length=100),
    Field("author", dtype=DataType.VARCHAR, max_length=100),
    Field("date", dtype=DataType.VARCHAR, max_length=50)
]

# Create collection
schema = CollectionSchema(
    fields=fields,
    description="ML document collection",
    enable_dynamic_field=True
)

client.create_collection(
    collection_name=collection_name,
    schema=schema
)

# Create index on vector field
index_params = {
    "index_type": "IVF_FLAT",
    "metric_type": "COSINE",
    "params": {"nlist": 128}
}

client.create_index(
    collection_name=collection_name,
    index_name="vector_index",
    field_name="embedding",
    index_params=index_params
)

Insert Vectors

# Insert vectors into Milvus

import numpy as np
from pymilvus import utility

# Generate random vectors (replace with actual embeddings)
vectors = [np.random.rand(768).tolist() for _ in range(100)]

# Prepare data
entities = [
    {
        "text": f"Document {i}",
        "category": "AI" if i % 2 == 0 else "ML",
        "author": f"Author {i}",
        "date": "2025-01-27",
        "embedding": vectors[i]
    }
    for i in range(100)
]

# Insert entities
client.insert(
    collection_name=collection_name,
    data=entities
)

# Flush to ensure data is persisted
client.flush(collection_name)

Search Operations

# Search vectors

# Generate query vector (replace with actual embedding)
query_vector = np.random.rand(768).tolist()

# Search parameters
search_params = {
    "metric_type": "COSINE",
    "params": {"nprobe": 16}
}

# Execute search
results = client.search(
    collection_name=collection_name,
    data=[{"vector": query_vector}],
    limit=10,
    search_params=search_params,
    output_fields=["text", "category", "author", "date"]
)

# Process results
for result in results[0]:
    for hit in results[0]:
        entity = hit['entity']
        score = hit['score']
        print(f"Score: {score:.4f}, Text: {entity['text']}, Category: {entity['category']}")

Milvus Index Types

Index Comparison

Index Type	Description	Speed	Accuracy	Use Case
FLAT	Brute-force search	Slow	100%	Exact match, small datasets
IVF_FLAT	Inverted file	Fast	100%	General purpose
IVF_SQ8	IVF with scalar quantization	Faster	Slight loss	Large datasets
IVF_PQ	Product quantization	Fastest	Noticeable loss	Real-time
HNSW	Hierarchical NSW	Very fast	High	High-throughput

Index Configuration

# Configure HNSW index (recommended for production)

index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {
        "M": 16,  # Number of bi-directional links
        "efConstruction": 512,  # Size of dynamic list
        "full_refresh_rate": 100
    }
}

client.create_index(
    collection_name=collection_name,
    index_name="hnsw_index",
    field_name="embedding",
    index_params=index_params
)

# HNSW configuration parameters:
# - M: Number of bidirectional links (16-64)
#   Higher = Better recall, slower indexing
# - efConstruction: Size of dynamic list (16-512)
#   Higher = Better recall, slower indexing
# - full_refresh_rate: Frequency of graph refresh

Milvus Performance

Scaling Strategies

Scaling Options:

Standalone: Single node, development/testing
Cluster: Multiple nodes, production on-premises
Cloud-native: Kubernetes, cloud-managed (Milvus Cloud)

Performance Tuning

# Tune Milvus for performance

# 1. Choose appropriate index type
# - FLAT: < 10K vectors
# - IVF_FLAT: 10K-1M vectors
# - IVF_SQ8: 1M-10M vectors
# - HNSW: > 10M vectors

# 2. Configure index parameters
index_params = {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {
        "M": 32,  # More links = better recall
        "efConstruction": 512  # Larger list = better recall
    }
}

# 3. Optimize search parameters
search_params = {
    "metric_type": "COSINE",
    "params": {
        "ef": 512,  # Search ef size (<= efConstruction)
        "nprobe": 16  # Number of probes (higher = better recall, slower)
    }
}

# 4. Use load balancing
# Deploy multiple query nodes for parallel queries

Milvus Cost Optimization

Self-Hosted vs. Managed

Deployment	Cost	Complexity	Use Case
Self-hosted (Docker)	Free (hardware only)	High	Learning, testing
Self-hosted (K8s)	Cloud costs only	Very high	Production, control
Milvus Cloud (Zilliz)	$0.20-1.00/hour	Low	Production, managed

Cost Optimization

# Cost optimization strategies

# 1. Use appropriate index type
# HNSW for production (fastest, lowest cost per query)

# 2. Delete old data
# Milvus supports data retention policies

# 3. Use scalar quantization (IVF_SQ8)
# Reduces memory footprint, faster search

# 4. Use partitioning
# Partition collections for better performance

# 5. Scale query nodes
# Add query nodes for parallel queries

Milvus Monitoring

Metrics

milvus_metrics:
  - name: "Query latency (P50)"
    metric: milvus_query_latency_p50
    alert: "If > 100ms"

  - name: "Query latency (P99)"
    metric: milvus_query_latency_p99
    alert: "UIf > 500ms"

  - "name: "Index size"
    metric: milvus_index_size
    alert: "If > 10M vectors"

  - name: "Memory usage"
    metric: milvus_memory_usage_bytes
    alert: "If > 100GB"

  - name: "Query QPS"
    metric: milvus_query_qps
    alert: "If > 1000 qps"

Milvus Best Practices

DO

# 1. Use HNSW for production
# Fastest and most efficient

# 2. Set appropriate index parameters
# M=32, efConstruction=512

# 3. Partition large collections
# Partition by date, category, etc.

# 4. Use load balancing
# Multiple query nodes for HA

# 5. Monitor performance
# Track query latency and throughput

DON’T

# 1. Don't use FLAT for large datasets
# Too slow for production

# 2. Don't ignore index parameters
# M and efConstruction matter

# 3. Don't forget to backup
# Back up metadata and vectors

# 4. Don't skip monitoring
# Essential for production

# 5. Don't use low-dimensional vectors
# 768+ dimensions recommended for quality

Key Takeaways

Open-source: Free to deploy, self-hosted option
Scalable: Cluster mode for high throughput
Index types: FLAT, IVF, HNSW for different use cases
Cloud-native: Kubernetes-ready, cloud-managed option
Cost: Self-hosted (free) or managed (pay-per-hour)
Performance: HNSW for fastest queries
Flexibility: Support for multiple index types and metrics
Use When: Open-source preference, on-premises, Kubernetes

Back to Module 5