Table of Contents

Why is vector search needed? #

With the rise of LLMs, retrieval-augmented generation (RAG) has become a common application. RAG takes a question, gathers relevant context, and uses the LLM to answer the question.

For example,

Question: What is the longest river in the world?

Context:

The Amazon river is 3976 miles long

The Yangtze river is 3917 miles long.

The Nile river is 4130 miles long.

…(possibly more information)

Answer: The Nile river is the longest in the world.

That second step of gathering context needs a few things to happen.

Even before the question is asked, knowledge (such as a set of documents) is converted to embeddings¹. These embeddings are represented as vectors and stored in a vector database.

The asked question is converted to a vector and pieces of knowledge that are most relevant for that question are fetched.

What does vector search answer? #

The fundamental question is:

Given a query vector, which vectors in the database are most similar?

But what does similarity mean? How should these vectors be located?

A quick note on vector similarity #

Two vectors should be similar if the information they represent is similar. There are a few ways to quantify this:

Cosine similarity
Euclidean distance
Dot product similarity

Each of those factors in magnitude and direction differently, but all of them yield a number (similarity score) that’s useful for vector search.

Here’s a great overview about calculating these.

Vector search approaches #

Although there are several vector search algorithms, in practice, only a few come up:

k-nearest neighbors (k-NN)
locality-sensitive hashing (LSH)
hierarchical navigable small worlds (HNSW)

IBM Think article on vector embeddings ↩︎