Milvus Guide
Open-Source Vector Database
Overview
Milvus is an open-source vector database built for scalable similarity search and AI applications. It provides high-performance vector search, cloud-native architecture, and supports multiple index types and distance metrics.
Milvus Architecture
Cloud-Native Architecture
Key Components:
- Proxy: API gateway for load balancing
- Root Coordinator: Manages query execution
- Query Node: Executes search queries
- Index Node: Manages vector indices
- Object Storage: Stores vector data and metadata
- Etcd: Metadata and coordination
Milvus Installation
Docker Deployment
version: '3.5'
services: etcd: image: quay.io/coreos/etcd:v3.5.5 environment: - ETCD_AUTO_COMPACTION_MODE=revision - ETCD_QUOTA_BACKEND_BYTES=4294967296 - ETCD_SNAPSHOT_COUNT=50000 volumes: - etcd:/etcd command: etcd --advertise-client-urls=http://0.0.0.0:2379
minio: image: minio/minio:latest environment: MINIO_ROOT_USER: minioadmin MINIO_ROOT_PASSWORD: minioadmin ports: - "9000:9000" - "9001:9001" volumes: - minio:/minio command: server /data
milvus-standalone: image: milvusdb/milvus:latest ports: - "19530:19530" environment: ETCD_ENDPOINTS: etcd:2379 MINIO_ADDRESS: minio:9000 common.storageType: minio depends_on: - etcd - minio
attu: image: zilliz/attu:v0.3.0 ports: - "3000:3000" depends_on: - milvus-standalone
volumes: etcd: minio:Milvus Standalone
# Install Milvus standalone# Using Docker (recommended)
docker run -d --name milvus-standalone \ -p 19530:19530 \ -v /path/to/milvus:/milvus \ -e ETCD_ENDPOINTS=etcd:2379 \ -e MINIO_ADDRESS=minio:9000 \ milvusdb/milvus:latest
# Access Attu UI (web interface)# http://localhost:3000Milvus Operations
Collection Creation
from pymilvus import connections, Field, Collection, DataTypefrom pymilvus import MilvusClient
# Connect to Milvusclient = MilvusClient( alias="default", uri="http://localhost:19530")
# Define collection schemacollection_name = "ml_documents"
fields = [ Field("id", DataType.VARCHAR, is_primary=True, auto_id=True), Field("embedding", DataType.FLOAT_VECTOR, dim=768), Field("text", DataType.VARCHAR, max_length=65535), Field("category", dtype=DataType.VARCHAR, max_length=100), Field("author", dtype=DataType.VARCHAR, max_length=100), Field("date", dtype=DataType.VARCHAR, max_length=50)]
# Create collectionschema = CollectionSchema( fields=fields, description="ML document collection", enable_dynamic_field=True)
client.create_collection( collection_name=collection_name, schema=schema)
# Create index on vector fieldindex_params = { "index_type": "IVF_FLAT", "metric_type": "COSINE", "params": {"nlist": 128}}
client.create_index( collection_name=collection_name, index_name="vector_index", field_name="embedding", index_params=index_params)Insert Vectors
# Insert vectors into Milvus
import numpy as npfrom pymilvus import utility
# Generate random vectors (replace with actual embeddings)vectors = [np.random.rand(768).tolist() for _ in range(100)]
# Prepare dataentities = [ { "text": f"Document {i}", "category": "AI" if i % 2 == 0 else "ML", "author": f"Author {i}", "date": "2025-01-27", "embedding": vectors[i] } for i in range(100)]
# Insert entitiesclient.insert( collection_name=collection_name, data=entities)
# Flush to ensure data is persistedclient.flush(collection_name)Search Operations
# Search vectors
# Generate query vector (replace with actual embedding)query_vector = np.random.rand(768).tolist()
# Search parameterssearch_params = { "metric_type": "COSINE", "params": {"nprobe": 16}}
# Execute searchresults = client.search( collection_name=collection_name, data=[{"vector": query_vector}], limit=10, search_params=search_params, output_fields=["text", "category", "author", "date"])
# Process resultsfor result in results[0]: for hit in results[0]: entity = hit['entity'] score = hit['score'] print(f"Score: {score:.4f}, Text: {entity['text']}, Category: {entity['category']}")Milvus Index Types
Index Comparison
| Index Type | Description | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| FLAT | Brute-force search | Slow | 100% | Exact match, small datasets |
| IVF_FLAT | Inverted file | Fast | 100% | General purpose |
| IVF_SQ8 | IVF with scalar quantization | Faster | Slight loss | Large datasets |
| IVF_PQ | Product quantization | Fastest | Noticeable loss | Real-time |
| HNSW | Hierarchical NSW | Very fast | High | High-throughput |
Index Configuration
# Configure HNSW index (recommended for production)
index_params = { "index_type": "HNSW", "metric_type": "COSINE", "params": { "M": 16, # Number of bi-directional links "efConstruction": 512, # Size of dynamic list "full_refresh_rate": 100 }}
client.create_index( collection_name=collection_name, index_name="hnsw_index", field_name="embedding", index_params=index_params)
# HNSW configuration parameters:# - M: Number of bidirectional links (16-64)# Higher = Better recall, slower indexing# - efConstruction: Size of dynamic list (16-512)# Higher = Better recall, slower indexing# - full_refresh_rate: Frequency of graph refreshMilvus Performance
Scaling Strategies
Scaling Options:
- Standalone: Single node, development/testing
- Cluster: Multiple nodes, production on-premises
- Cloud-native: Kubernetes, cloud-managed (Milvus Cloud)
Performance Tuning
# Tune Milvus for performance
# 1. Choose appropriate index type# - FLAT: < 10K vectors# - IVF_FLAT: 10K-1M vectors# - IVF_SQ8: 1M-10M vectors# - HNSW: > 10M vectors
# 2. Configure index parametersindex_params = { "index_type": "HNSW", "metric_type": "COSINE", "params": { "M": 32, # More links = better recall "efConstruction": 512 # Larger list = better recall }}
# 3. Optimize search parameterssearch_params = { "metric_type": "COSINE", "params": { "ef": 512, # Search ef size (<= efConstruction) "nprobe": 16 # Number of probes (higher = better recall, slower) }}
# 4. Use load balancing# Deploy multiple query nodes for parallel queriesMilvus Cost Optimization
Self-Hosted vs. Managed
| Deployment | Cost | Complexity | Use Case |
|---|---|---|---|
| Self-hosted (Docker) | Free (hardware only) | High | Learning, testing |
| Self-hosted (K8s) | Cloud costs only | Very high | Production, control |
| Milvus Cloud (Zilliz) | $0.20-1.00/hour | Low | Production, managed |
Cost Optimization
# Cost optimization strategies
# 1. Use appropriate index type# HNSW for production (fastest, lowest cost per query)
# 2. Delete old data# Milvus supports data retention policies
# 3. Use scalar quantization (IVF_SQ8)# Reduces memory footprint, faster search
# 4. Use partitioning# Partition collections for better performance
# 5. Scale query nodes# Add query nodes for parallel queriesMilvus Monitoring
Metrics
milvus_metrics: - name: "Query latency (P50)" metric: milvus_query_latency_p50 alert: "If > 100ms"
- name: "Query latency (P99)" metric: milvus_query_latency_p99 alert: "UIf > 500ms"
- "name: "Index size" metric: milvus_index_size alert: "If > 10M vectors"
- name: "Memory usage" metric: milvus_memory_usage_bytes alert: "If > 100GB"
- name: "Query QPS" metric: milvus_query_qps alert: "If > 1000 qps"Milvus Best Practices
DO
# 1. Use HNSW for production# Fastest and most efficient
# 2. Set appropriate index parameters# M=32, efConstruction=512
# 3. Partition large collections# Partition by date, category, etc.
# 4. Use load balancing# Multiple query nodes for HA
# 5. Monitor performance# Track query latency and throughputDON’T
# 1. Don't use FLAT for large datasets# Too slow for production
# 2. Don't ignore index parameters# M and efConstruction matter
# 3. Don't forget to backup# Back up metadata and vectors
# 4. Don't skip monitoring# Essential for production
# 5. Don't use low-dimensional vectors# 768+ dimensions recommended for qualityKey Takeaways
- Open-source: Free to deploy, self-hosted option
- Scalable: Cluster mode for high throughput
- Index types: FLAT, IVF, HNSW for different use cases
- Cloud-native: Kubernetes-ready, cloud-managed option
- Cost: Self-hosted (free) or managed (pay-per-hour)
- Performance: HNSW for fastest queries
- Flexibility: Support for multiple index types and metrics
- Use When: Open-source preference, on-premises, Kubernetes
Back to Module 5