Skip to content

Milvus Guide

Open-Source Vector Database


Overview

Milvus is an open-source vector database built for scalable similarity search and AI applications. It provides high-performance vector search, cloud-native architecture, and supports multiple index types and distance metrics.


Milvus Architecture

Cloud-Native Architecture

Key Components:

  • Proxy: API gateway for load balancing
  • Root Coordinator: Manages query execution
  • Query Node: Executes search queries
  • Index Node: Manages vector indices
  • Object Storage: Stores vector data and metadata
  • Etcd: Metadata and coordination

Milvus Installation

Docker Deployment

docker-compose.yml
version: '3.5'
services:
etcd:
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- etcd:/etcd
command: etcd --advertise-client-urls=http://0.0.0.0:2379
minio:
image: minio/minio:latest
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio:/minio
command: server /data
milvus-standalone:
image: milvusdb/milvus:latest
ports:
- "19530:19530"
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
common.storageType: minio
depends_on:
- etcd
- minio
attu:
image: zilliz/attu:v0.3.0
ports:
- "3000:3000"
depends_on:
- milvus-standalone
volumes:
etcd:
minio:

Milvus Standalone

Terminal window
# Install Milvus standalone
# Using Docker (recommended)
docker run -d --name milvus-standalone \
-p 19530:19530 \
-v /path/to/milvus:/milvus \
-e ETCD_ENDPOINTS=etcd:2379 \
-e MINIO_ADDRESS=minio:9000 \
milvusdb/milvus:latest
# Access Attu UI (web interface)
# http://localhost:3000

Milvus Operations

Collection Creation

from pymilvus import connections, Field, Collection, DataType
from pymilvus import MilvusClient
# Connect to Milvus
client = MilvusClient(
alias="default",
uri="http://localhost:19530"
)
# Define collection schema
collection_name = "ml_documents"
fields = [
Field("id", DataType.VARCHAR, is_primary=True, auto_id=True),
Field("embedding", DataType.FLOAT_VECTOR, dim=768),
Field("text", DataType.VARCHAR, max_length=65535),
Field("category", dtype=DataType.VARCHAR, max_length=100),
Field("author", dtype=DataType.VARCHAR, max_length=100),
Field("date", dtype=DataType.VARCHAR, max_length=50)
]
# Create collection
schema = CollectionSchema(
fields=fields,
description="ML document collection",
enable_dynamic_field=True
)
client.create_collection(
collection_name=collection_name,
schema=schema
)
# Create index on vector field
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE",
"params": {"nlist": 128}
}
client.create_index(
collection_name=collection_name,
index_name="vector_index",
field_name="embedding",
index_params=index_params
)

Insert Vectors

# Insert vectors into Milvus
import numpy as np
from pymilvus import utility
# Generate random vectors (replace with actual embeddings)
vectors = [np.random.rand(768).tolist() for _ in range(100)]
# Prepare data
entities = [
{
"text": f"Document {i}",
"category": "AI" if i % 2 == 0 else "ML",
"author": f"Author {i}",
"date": "2025-01-27",
"embedding": vectors[i]
}
for i in range(100)
]
# Insert entities
client.insert(
collection_name=collection_name,
data=entities
)
# Flush to ensure data is persisted
client.flush(collection_name)

Search Operations

# Search vectors
# Generate query vector (replace with actual embedding)
query_vector = np.random.rand(768).tolist()
# Search parameters
search_params = {
"metric_type": "COSINE",
"params": {"nprobe": 16}
}
# Execute search
results = client.search(
collection_name=collection_name,
data=[{"vector": query_vector}],
limit=10,
search_params=search_params,
output_fields=["text", "category", "author", "date"]
)
# Process results
for result in results[0]:
for hit in results[0]:
entity = hit['entity']
score = hit['score']
print(f"Score: {score:.4f}, Text: {entity['text']}, Category: {entity['category']}")

Milvus Index Types

Index Comparison

Index TypeDescriptionSpeedAccuracyUse Case
FLATBrute-force searchSlow100%Exact match, small datasets
IVF_FLATInverted fileFast100%General purpose
IVF_SQ8IVF with scalar quantizationFasterSlight lossLarge datasets
IVF_PQProduct quantizationFastestNoticeable lossReal-time
HNSWHierarchical NSWVery fastHighHigh-throughput

Index Configuration

# Configure HNSW index (recommended for production)
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {
"M": 16, # Number of bi-directional links
"efConstruction": 512, # Size of dynamic list
"full_refresh_rate": 100
}
}
client.create_index(
collection_name=collection_name,
index_name="hnsw_index",
field_name="embedding",
index_params=index_params
)
# HNSW configuration parameters:
# - M: Number of bidirectional links (16-64)
# Higher = Better recall, slower indexing
# - efConstruction: Size of dynamic list (16-512)
# Higher = Better recall, slower indexing
# - full_refresh_rate: Frequency of graph refresh

Milvus Performance

Scaling Strategies

Scaling Options:

  • Standalone: Single node, development/testing
  • Cluster: Multiple nodes, production on-premises
  • Cloud-native: Kubernetes, cloud-managed (Milvus Cloud)

Performance Tuning

# Tune Milvus for performance
# 1. Choose appropriate index type
# - FLAT: < 10K vectors
# - IVF_FLAT: 10K-1M vectors
# - IVF_SQ8: 1M-10M vectors
# - HNSW: > 10M vectors
# 2. Configure index parameters
index_params = {
"index_type": "HNSW",
"metric_type": "COSINE",
"params": {
"M": 32, # More links = better recall
"efConstruction": 512 # Larger list = better recall
}
}
# 3. Optimize search parameters
search_params = {
"metric_type": "COSINE",
"params": {
"ef": 512, # Search ef size (<= efConstruction)
"nprobe": 16 # Number of probes (higher = better recall, slower)
}
}
# 4. Use load balancing
# Deploy multiple query nodes for parallel queries

Milvus Cost Optimization

Self-Hosted vs. Managed

DeploymentCostComplexityUse Case
Self-hosted (Docker)Free (hardware only)HighLearning, testing
Self-hosted (K8s)Cloud costs onlyVery highProduction, control
Milvus Cloud (Zilliz)$0.20-1.00/hourLowProduction, managed

Cost Optimization

# Cost optimization strategies
# 1. Use appropriate index type
# HNSW for production (fastest, lowest cost per query)
# 2. Delete old data
# Milvus supports data retention policies
# 3. Use scalar quantization (IVF_SQ8)
# Reduces memory footprint, faster search
# 4. Use partitioning
# Partition collections for better performance
# 5. Scale query nodes
# Add query nodes for parallel queries

Milvus Monitoring

Metrics

milvus_metrics:
- name: "Query latency (P50)"
metric: milvus_query_latency_p50
alert: "If > 100ms"
- name: "Query latency (P99)"
metric: milvus_query_latency_p99
alert: "UIf > 500ms"
- "name: "Index size"
metric: milvus_index_size
alert: "If > 10M vectors"
- name: "Memory usage"
metric: milvus_memory_usage_bytes
alert: "If > 100GB"
- name: "Query QPS"
metric: milvus_query_qps
alert: "If > 1000 qps"

Milvus Best Practices

DO

# 1. Use HNSW for production
# Fastest and most efficient
# 2. Set appropriate index parameters
# M=32, efConstruction=512
# 3. Partition large collections
# Partition by date, category, etc.
# 4. Use load balancing
# Multiple query nodes for HA
# 5. Monitor performance
# Track query latency and throughput

DON’T

# 1. Don't use FLAT for large datasets
# Too slow for production
# 2. Don't ignore index parameters
# M and efConstruction matter
# 3. Don't forget to backup
# Back up metadata and vectors
# 4. Don't skip monitoring
# Essential for production
# 5. Don't use low-dimensional vectors
# 768+ dimensions recommended for quality

Key Takeaways

  1. Open-source: Free to deploy, self-hosted option
  2. Scalable: Cluster mode for high throughput
  3. Index types: FLAT, IVF, HNSW for different use cases
  4. Cloud-native: Kubernetes-ready, cloud-managed option
  5. Cost: Self-hosted (free) or managed (pay-per-hour)
  6. Performance: HNSW for fastest queries
  7. Flexibility: Support for multiple index types and metrics
  8. Use When: Open-source preference, on-premises, Kubernetes

Back to Module 5