Vector Database

A vector database is a database optimised for storing and querying high-dimensional vectors (embeddings) using similarity search rather than exact match. It returns the nearest results based on cosine similarity, dot product, or Euclidean distance, which is what makes RAG, semantic search, and AI-native AR workflows possible at scale.

Key Takeaways

  • Vector databases index embeddings so you can query by semantic similarity, not exact keyword match, which is the foundation of any AI-native RAG system.
  • Traditional SQL databases and keyword search engines cannot do this efficiently: high-dimensional vector similarity does not fit B-tree or inverted-text indexes.
  • Specialised indexes like HNSW and IVF trade a small amount of accuracy for huge speed gains, which is what makes sub-100ms similarity search at millions of vectors possible.
  • Major options include Pinecone, Weaviate, Chroma, Qdrant, Milvus, and pgvector (PostgreSQL extension); pgvector is often the right starting point for finance teams who already run Postgres.
  • In AR, vector databases power RAG over contracts, dispute history, and customer comms, enable semantic dispute and duplicate detection, and let agentic systems retrieve the right context before acting.

What a vector database is and why it is specialised

A vector database is a database built for one specific job: storing high-dimensional numerical vectors (typically embeddings of 384 to 3,072 dimensions) and finding the ones most similar to a query vector. Instead of asking does this row equal that value, you ask which rows are closest to this point in vector space. Closeness is measured with cosine similarity, dot product, or Euclidean distance.

Relational databases struggle with this because B-tree indexes only work for one-dimensional ordered keys. Keyword search engines like Elasticsearch are excellent at lexical matching but do not understand that the bill was wrong and pricing dispute mean the same thing. A vector database closes that gap by indexing meaning rather than exact tokens, which is the prerequisite for any AI-native retrieval pattern.

How vector indexes work at a high level

Brute-force comparison of a query vector against millions of stored vectors is too slow for production. Vector databases use approximate nearest neighbour (ANN) indexes that give up a small amount of recall in exchange for very large speed gains.

  • HNSW (Hierarchical Navigable Small World): builds a multi-layer graph where each node links to its closest neighbours. Queries traverse the graph from the top layer down, jumping quickly to the right neighbourhood. Excellent latency, higher memory usage.
  • IVF (Inverted File): clusters vectors into buckets at index time, then only searches the buckets nearest to the query. Lower memory, slightly higher latency than HNSW.
  • Product quantization: compresses vectors by splitting them into sub-vectors and quantising each, dramatically reducing storage at the cost of some accuracy.

You also choose a distance metric. Cosine similarity is the default for most embedding models because it ignores magnitude and focuses on direction. Dot product is faster on normalised vectors. Euclidean distance is rarely the right choice for modern text embeddings.

Major options and trade-offs

The market is now mature enough that the choice is rarely about capability, and more about operational fit.

  • Pinecone: managed SaaS, the simplest path to production, but you ship your data to their cloud.
  • Weaviate: open-source plus managed cloud, strong hybrid search (vector plus keyword), good schema model.
  • Chroma: developer-friendly, lightweight, popular for prototypes and small-to-mid workloads.
  • Qdrant: open-source, written in Rust, very high performance, strong filtering.
  • Milvus: open-source, designed for very large scale (billions of vectors), more operational overhead.
  • pgvector: a PostgreSQL extension that adds vector columns and ANN indexes to a database your team already runs. Co-locates vectors with structured business data, which is a huge win for finance use cases where access control and joins matter.

Use cases in finance and AR

For an AR or O2C team, the value of a vector database is not the technology itself, it is what it unlocks.

  • RAG knowledge base: index your contracts, master service agreements, dunning playbooks, and dispute history so an AI-native assistant can ground every answer in your real documents.
  • Semantic search across customer history: a collector typing customer disputing freight charge finds historic cases described as delivery fee complaint or shipping surcharge issue.
  • Duplicate detection: cluster incoming dispute claims or contract uploads to surface near-duplicates that exact-match logic misses.
  • Anomaly detection: embed payment behaviour patterns and flag customers whose vectors drift away from their historic cluster, an early warning of churn or credit risk.
  • Recommendations: customers similar to this one resolved disputes 12 days faster when offered a payment plan.

Implementation considerations

Vector databases look simple until you put them in production. A few things bite teams in the first 90 days.

  • Dimensions are fixed at index time: if you switch embedding models, you must re-embed every record and rebuild the index. Plan for this cost up front.
  • Metadata is critical: store customer ID, document type, region, and dates alongside each vector so you can pre-filter or post-filter results. Without filtering, you fetch globally relevant but contextually wrong chunks.
  • Hybrid search beats pure vector in most finance contexts: combining keyword and vector retrieval catches both invoice 12345 (exact match) and the unpaid bill from last quarter (semantic).
  • Cost scales with vector count and dimensions. A million 1,536-dimension vectors is roughly 6 GB of raw data before indexing overhead, and managed services price on storage plus query volume. Budget in euros per million vectors per month and benchmark before committing.
  • Data residency: some managed vector databases are US-hosted only. For European finance teams, pgvector inside an EU-region Postgres or a self-hosted Qdrant/Weaviate cluster is often the safer call.

Production guidance and common pitfalls

The pragmatic path for most finance organisations: start with pgvector if you already run Postgres. You get vector search next to your customer master and invoice tables, with the same backup, access control, and compliance posture you already trust. Move to a dedicated vector database only when query latency, vector count, or specialised features (high-end hybrid search, very large scale) demand it.

The pitfalls that derail projects are almost always the same: dimension mismatch after a model upgrade, the wrong distance metric for the embedding model in use, missing metadata filtering so the system retrieves irrelevant chunks at scale, no monitoring of recall and latency over time, and underestimating the operational cost of backup and restore (which is materially harder than for SQL). Treat the vector database as production infrastructure from day one, not as a notebook experiment that happens to be live.

Frequently asked questions

Do I need a vector database to use AI in AR?

If you are doing anything beyond a single prompt to an LLM (such as RAG over your contracts, semantic search across dispute history, or agentic workflows that retrieve context before acting), yes. The vector database is what lets the AI find the right piece of your data before it answers. Without it, you are limited to whatever fits in the model's context window, which does not scale to a real enterprise document set.

Can I just use PostgreSQL instead of a dedicated vector database?

Often yes. The pgvector extension adds vector columns and HNSW/IVF indexes to Postgres, and for many finance teams that is the right starting point. You keep vectors next to your structured data, you reuse existing backup and access controls, and you avoid a second system to operate. You typically only need a dedicated vector database when you outgrow Postgres on scale, latency, or need very advanced hybrid search.

What is the difference between a vector database and Elasticsearch?

Elasticsearch is built for keyword and full-text search using inverted indexes. It is excellent when the user's query shares words with the document. A vector database searches by meaning, so it finds documents that are semantically related even when the wording is completely different. Modern Elasticsearch does support vector search, and most vector databases now support keyword search, so the line is blurring. Hybrid search (both at once) is the production sweet spot.

How do HNSW and IVF differ in practice?

HNSW gives lower query latency and higher recall but uses more memory and is slower to build. IVF uses less memory and builds faster but typically has slightly higher query latency for the same recall target. For most AR use cases (hundreds of thousands to a few million vectors, sub-100ms latency target), HNSW is the default choice. IVF or IVF plus product quantization becomes attractive at very large scale or when memory cost dominates.

What does a vector database cost?

Costs come from three places: storage of the vectors and index, query volume, and (for managed services) a base platform fee. As a rough order of magnitude, expect tens to low hundreds of euros per month for a few hundred thousand vectors on a managed service, scaling roughly linearly with vector count. Self-hosted pgvector or Qdrant on an existing cloud database can be effectively free at small scale beyond the underlying compute. Always benchmark with your real embedding dimensions before committing.

What is the most common mistake teams make with vector databases?

Not storing enough metadata alongside each vector. Without fields like customer ID, document type, region, and date, you cannot filter the search, and the database happily returns the globally most-similar chunk even when it belongs to a different customer or a superseded contract version. The fix is cheap if you do it at ingestion and very expensive if you discover it after a few million vectors are already indexed.

Continue learning