Skip to content

Latest commit

 

History

History
89 lines (61 loc) · 4.69 KB

File metadata and controls

89 lines (61 loc) · 4.69 KB

Similarity Search Optimization Guides

This section contains optimization guides for vector similarity search workloads on Intel hardware. These guides help users of popular vector search solutions achieve optimal performance on Intel Xeon processors.

Overview

Vector similarity search is a core component of modern AI applications including:

  • Retrieval-Augmented Generation (RAG)
  • Semantic search
  • Recommendation systems
  • Image and video similarity
  • Anomaly detection

Intel Scalable Vector Search (SVS)

Intel Scalable Vector Search (SVS) is a high-performance library for vector similarity search, optimized for Intel hardware. SVS can be used directly as a standalone library, and is integrated into popular solutions to bring these optimizations to a wider audience.

SVS features:

  • Vamana Algorithm: Graph-based approximate nearest neighbor search
  • Vector Compression: LVQ and LeanVec for significant memory reduction
  • Hardware Optimization: Best performance on servers with AVX-512 support

Understanding LVQ and LeanVec Compression

Traditional vector compression methods face limitations in graph-based search. Product Quantization (PQ) requires keeping full-precision vectors for re-ranking, defeating compression benefits. Standard scalar quantization with global bounds fails to efficiently utilize available quantization levels.

LVQ (Locally-adaptive Vector Quantization)

LVQ addresses these limitations by applying per-vector normalization and scalar quantization, adapting the quantization bounds individually for each vector. This local adaptation ensures efficient use of the available bit range, resulting in high-quality compressed representations.

Key benefits:

  • Minimal decompression overhead enables fast, on-the-fly distance computations
  • Significantly reduces memory bandwidth and storage requirements
  • Maintains high search accuracy and throughput
  • SIMD-optimized layout (Turbo LVQ) for efficient distance computations

LVQ achieves a four-fold reduction of vector size while maintaining search accuracy. A typical 768-dimensional float32 vector requiring 3072 bytes can be reduced to just a few hundred bytes.

LeanVec (LVQ with Dimensionality Reduction)

LeanVec builds on LVQ by first applying linear dimensionality reduction, then compressing the reduced vectors with LVQ. This two-step approach significantly cuts memory and compute costs, enabling faster similarity search and index construction with minimal accuracy loss—especially effective for high-dimensional deep learning embeddings.

Best suited for:

  • High-dimensional vectors (768+ dimensions)
  • Text embeddings from large language models
  • Cases where maximum memory savings are needed

Two-Level Compression

Both LVQ and LeanVec support two-level compression schemes:

  1. Level 1: Fast candidate retrieval using compressed vectors
  2. Level 2: Re-ranking for accuracy (LVQ encodes residuals, LeanVec encodes the full dimensionality data)

The naming convention reflects bits per dimension at each level:

  • LVQ4x8: 4 bits for Level 1, 8 bits for Level 2 (12 bits total per dimension)
  • LVQ8: Single-level, 8 bits per dimension
  • LeanVec4x8: 4-bit Level 1 encoding of reduced dimensionality data + 8-bit Level 2 encoding of full dimensionality data

Vector Compression Selection

Compression Best For Observations
LVQ4x4 Fast search and low memory use Consider LeanVec for even faster search
LeanVec4x8 Fastest search and ingestion LeanVec dimensionality reduction might reduce recall
LVQ4 Maximum memory saving Recall might be insufficient
LVQ8 Faster ingestion than LVQ4x4 Search likely slower than LVQ4x4
LeanVec8x8 Improved recall when LeanVec4x8 is insufficient LeanVec dimensionality reduction might reduce recall
LVQ4x8 Improved recall when LVQ4x4 is insufficient Slightly worse memory savings

Rule of thumb:

  • Dimensions < 768 → Use LVQ (LVQ4x4, LVQ4x8, or LVQ8)
  • Dimensions ≥ 768 → Use LeanVec (LeanVec4x8 or LeanVec8x8)

Available Guides

Software Description Guide
Redis Redis Query Engine with SVS-VAMANA Redis Guide

References