Embeddings

Embeddings are numerical vector representations of text, images or audio that capture semantic meaning, so that items with similar meaning sit close together in mathematical space and can be searched, compared and classified by what they mean rather than the exact words they use.

Key Takeaways

  • Embeddings turn text into fixed-length vectors of floating-point numbers (typically 384 to 3072 dimensions) where semantic similarity equals geometric closeness.
  • They are the foundation of Retrieval Augmented Generation, semantic search and similarity matching across contracts, disputes and customer correspondence.
  • In AR they power email triage, deduction classification, dispute deduplication and contract clause search at a cost roughly one thousand times cheaper than LLM inference.
  • Leading models in 2026 include OpenAI text-embedding-3, Cohere Embed v3, Voyage AI voyage-finance-2 and open-source families like BGE and E5.
  • Production pitfalls cluster around chunking strategy, dimensionality mismatch between models, language coverage and silent drift when the embedding model is upgraded.

What embeddings are and how they capture meaning

An embedding is a list of numbers that represents the meaning of a piece of content. Feed the sentence customer disputes invoice due to short shipment into an embedding model and you receive back a vector of perhaps 1536 floating-point numbers. Feed in buyer raised a claim for missing units and you receive a different vector, but one that sits very close to the first in mathematical space. The two sentences share no keywords yet an embedding model recognises they describe the same business event.

This is the breakthrough that makes modern AI useful in finance. Traditional keyword search treats remittance and payment advice as unrelated strings. Embeddings treat them as near-identical concepts. That semantic understanding is the connective tissue beneath nearly every agentic AR capability, from email routing to dispute resolution to credit decisioning.

How embeddings work technically

An embedding model is a neural network, usually a transformer variant, that has been trained on enormous volumes of text to predict context. During training it learns that words appearing in similar contexts should produce similar internal representations. The final layer of the network exposes those representations as a fixed-length vector.

The dimensions are not human-readable. You cannot point to position 412 and say it represents urgency. What matters is the geometry of the full vector. Similarity is measured with cosine similarity or dot product, both of which return a score between roughly minus one and one. Scores above 0.8 typically indicate strong semantic overlap.

Common dimensionalities are 384, 768, 1024, 1536 and 3072. Higher dimensions capture more nuance but cost more to store and search. The same input text always produces the same vector from the same model, which makes embeddings deterministic and cacheable.

Why embeddings matter for finance and AR

Finance teams sit on mountains of unstructured language. Master service agreements, dunning correspondence, dispute case notes, remittance advice, credit memos and customer emails all carry critical information that keyword systems struggle to navigate. Embeddings change the economics of working with that text.

They are the retrieval engine inside RAG architectures, the matching layer behind find me similar disputes queries, and the routing logic that decides whether an incoming email is a payment confirmation, a short-pay justification or a logistics complaint. Without embeddings, every AI-native AR feature collapses back to brittle rules and regex patterns.

Use cases in AR

Several concrete patterns dominate AR deployments today.

  • Email triage and classification. Incoming customer messages are embedded and compared against labelled examples of disputes, queries, remittance advices and payment promises. The closest cluster wins, and the message routes to the right queue or agent.
  • Dispute deduplication and playbook retrieval. A new deduction is embedded and matched against historical cases. The collector sees the three most similar prior disputes, how they were resolved and what evidence won the chargeback. Resolution time falls because the team is not solving the same problem twice.
  • Contract clause search. Every clause across thousands of MSAs is embedded once and indexed. Treasury can then ask show me all early-payment-discount terms longer than thirty days and receive ranked results in seconds rather than chasing PDFs through a shared drive.
  • Customer similarity for credit decisions. A new applicant is embedded across their profile, industry, geography and trading history, then compared to the closest historical customers. The credit team inherits a richer baseline than a pure DSO or D and B score can provide.
  • Knowledge base Q and A. Internal AR policies, escalation matrices and SOX controls are embedded so that an agentic copilot can answer what is our policy on writing off deductions under 500 euros without anyone reading the wiki.

Major embedding models and cost

Several model families dominate enterprise deployments in 2026. OpenAI text-embedding-3-small produces 1536-dimensional vectors and text-embedding-3-large produces 3072-dimensional vectors, both with strong general-purpose performance. Cohere Embed v3 is widely used for multilingual workloads. Voyage AI offers voyage-3 and the domain-specialised voyage-finance-2, which is tuned on financial language and often outperforms general models on contract and credit text. Open-source families including BGE, E5 and GTE allow on-premise deployment for organisations that cannot send finance data to a hosted API.

Cost is one of the most attractive properties of embeddings. Pricing typically sits between 0.0001 and 0.001 euros per 1000 tokens, roughly three orders of magnitude cheaper than LLM inference. A finance organisation can embed a full year of customer correspondence and a complete contract repository for a few hundred euros. The vectors themselves are then stored in a vector database such as pgvector or a managed service, ready for instant retrieval.

Common pitfalls and production patterns

Embeddings look simple until production exposes the edges. Five issues recur.

  • Dimensionality mismatch. Vectors from different models live in different spaces. A 768-dimensional embedding cannot be compared to a 1536-dimensional embedding, and even two 1536-dim models produce incompatible geometries. Choose one model per index and document the decision.
  • Chunking strategy. Most documents are too large to embed whole. Chunks that are too small lose context, chunks that are too large dilute meaning. Paragraph-level chunking with light overlap is a sensible default for finance content.
  • Language coverage. An English-trained model applied to German remittance advice will silently underperform. Validate that the chosen model handles every language in the dataset, or run language-specific indexes.
  • Embedding drift. When the provider releases a new model version, the vector space shifts. Existing indexes must be re-embedded or the similarity scores quietly degrade. Treat model upgrades as schema migrations.
  • Metadata discipline. Store source document, chunk position, customer ID and timestamp alongside every vector. Retrieval without metadata produces answers that are impossible to audit, which is unacceptable in a finance control environment.

Done well, embeddings become the quiet substrate beneath every agentic AR workflow. Done badly, they produce a search engine that feels worse than the keyword system it replaced.

Frequently asked questions

What exactly is an embedding in plain language?

An embedding is a list of numbers that represents the meaning of a piece of text. Two sentences with similar meaning produce two lists of numbers that are mathematically close together, which lets software find related content based on what it means rather than the exact words used.

How are embeddings different from keyword search?

Keyword search matches strings. If a customer writes short shipment and your knowledge base uses the phrase missing units, keyword search returns nothing. Embeddings recognise both phrases describe the same concept and return the relevant result, because both map to similar regions of vector space.

Which embedding model should an AR team start with?

For general English and multilingual workloads, OpenAI text-embedding-3-small or Cohere Embed v3 are sensible defaults. For finance-heavy content such as contracts and credit memos, Voyage AI voyage-finance-2 often outperforms general models. Open-source options like BGE work well for on-premise deployments where data cannot leave the network.

How much do embeddings cost to run at enterprise scale?

Embedding pricing typically falls between 0.0001 and 0.001 euros per 1000 tokens. A full year of customer correspondence and a complete contract repository can usually be embedded for a few hundred euros. The ongoing cost is dominated by storage and re-embedding when documents change, not by the initial vectorisation.

Where are embeddings stored and how are they searched?

Embeddings live in vector databases such as pgvector, Pinecone, Weaviate or Chroma. These systems index vectors using approximate-nearest-neighbour algorithms so that a similarity search across millions of vectors returns results in tens of milliseconds. The vector database is the retrieval engine behind every RAG-based AR copilot.

What is the biggest mistake teams make when deploying embeddings?

Mixing dimensions and models inside the same index. Vectors from different embedding models live in different mathematical spaces and cannot be compared meaningfully. The second most common mistake is poor chunking, where documents are split into pieces that are either too small to carry context or too large to be specific. Both issues silently degrade retrieval quality without producing obvious errors.

Continue learning