Vector Search and RAG (vectorstore, rag)

Two source modules for working with embeddings: vectorstore stores dense vectors and ranks them by similarity, and rag layers chunking, indexing, retrieval, and prompt-context assembly on top for retrieval-augmented generation. Both are pure Geblang and run on the bytecode VM.

They build on the embedding vectors produced by the llm module (client.embed(text, opts)), but vectorstore is independent of any provider and rag accepts any embedder you supply.

vectorstore

A vector store keeps (id, vector, metadata) records and answers nearest-neighbour queries. Two implementations share one VectorStore interface:

Store Use
MemoryVectorStore(metric = "cosine") Brute-force in-memory search; mutex-guarded so one shared store is safe under concurrent requests. Ideal up to ~1e4-1e5 vectors.
SqliteVectorStore(conn, table = "vectors", metric = "cosine") Persistent store backed by a db Connection. Vectors are stored as little-endian float32 BLOBs and metadata as JSON; the table is created if absent and add upserts by id.
PgVectorStore(conn, table = "vectors", dimension, metric = "cosine") Postgres + pgvector backend with real approximate-nearest-neighbour search. See the pgvector section below.
HnswVectorStore(metric = "cosine", m = 16, efSearch = 20) In-process HNSW index: sublinear approximate search with no external service. See the HNSW section below.

Metric is "cosine" (default), "dot", or "euclidean"; all scores follow "higher = closer". Vectors are list<any> so they accept the decimal numbers that come back from JSON-parsed embeddings as well as float literals. Vectors are stored packed as float32 and ranked by the native vecmath kernel (below), so search is well off the interpreted path.

import vectorstore;

let store = vectorstore.MemoryVectorStore();
store.add("cats", [0.1, 0.2, 0.9], {"text": "about cats"});
store.add("cars", [0.9, 0.1, 0.1], {"text": "about cars"});

let hits = store.search([0.1, 0.2, 0.8], 1);
io.println(hits[0].record.metadata["text"]);   # about cats
io.println(hits[0].score);                      # similarity, higher = closer

VectorStore interface

Method Description
add(id, vector, metadata) Adds or replaces a record by id.
addAll(records) Adds or replaces a list of VectorRecord.
get(id) Returns the VectorRecord, or null.
delete(id) Removes id; returns true if it existed.
search(query, k) Top k records by descending similarity.
searchWhere(query, k, filter) Top k among records for which the callable filter(record) is true (in-process only).
searchFilter(query, k, criteria) Top k among records matching a portable dict criteria (pushed down server-side by external backends).
count() Number of stored records.
clear() Removes everything.

searchWhere takes an arbitrary callable and runs in process:

let hits = store.searchWhere(queryVector, 5, func(any rec): bool {
    return (rec as vectorstore.VectorRecord).metadata["lang"] == "en";
});

searchFilter takes a portable dict of criteria that external backends can push down to the database. A scalar value means equality; a nested operator dict supports eq, ne, gt, gte, lt, lte, and in. Multiple keys are ANDed.

/* lang == "en" AND year >= 2020 */
let hits = store.searchFilter(queryVector, 5, {"lang": "en", "year": {"gte": 2020}});

A persistent store is a drop-in replacement:

import db;
import vectorstore;

let conn  = db.connect("sqlite", "vectors.db");
let store = vectorstore.SqliteVectorStore(conn);
store.add("doc-1", embedding, {"text": "..."});
let hits = store.search(queryVector, 5);

The exported helper vectorstore.score(metric, a, b) computes a single similarity score between two vectors.

Postgres with pgvector

PgVectorStore is a production-scale backend using the pgvector extension for real approximate-nearest-neighbour search. It rides on the db module (no new dependency) and is a drop-in VectorStore.

import db;
import vectorstore;

let conn  = db.connect("postgres", dsn);
let store = vectorstore.PgVectorStore(conn, "items", 1536);   /* dimension required */
store.add("doc-1", embedding, {"source": "handbook", "year": 2024});

let hits = store.searchFilter(queryVector, 5, {"source": "handbook"});

On construction it runs CREATE EXTENSION IF NOT EXISTS vector, creates the table with a typed vector(D) column and a jsonb metadata column, and builds a metric-matched HNSW index (vector_cosine_ops / vector_l2_ops / vector_ip_ops). Searches use the index-backed distance operator (<=> / <-> / <#>) with ORDER BY embedding <op> query LIMIT k, and searchFilter pushes the criteria down to a SQL WHERE over the jsonb metadata (containment for equality/in, numeric casts for ranges). The dimension is fixed at table creation, so pass it to the constructor. The Postgres server must have the pgvector extension available.

In-process HNSW

HnswVectorStore gives sublinear approximate-nearest-neighbour search in memory, with no external service. It is the middle ground between the brute-force MemoryVectorStore (exact but O(n)) and a database backend: ideal when you have more vectors than brute force handles comfortably but do not want to run Postgres.

import vectorstore;

let store = vectorstore.HnswVectorStore("cosine");   /* or "dot" / "euclidean" */
store.add("doc-1", embedding, {"text": "..."});
let hits = store.search(queryVector, 5);

Results are approximate: tune recall versus speed with the constructor's m (graph degree, default 16) and efSearch (search breadth, default 20). The index holds the vectors; metadata is kept alongside in memory. searchFilter and searchWhere over-fetch from the index and then filter, so a very selective filter may return fewer than k hits; raise k or widen efSearch if needed. The store is not persistent; rebuild it from your source data on startup, or use PgVectorStore when you need durability.

rag

rag turns documents into retrievable, prompt-ready context. It is built on a small Embedder interface so it is not tied to any one provider:

interface Embedder { func embed(string text): list<any>; }

LlmEmbedder adapts an llm client to that interface; the options dict carries the embedding model.

import db;
import llm;
import rag;
import vectorstore;

let store    = vectorstore.MemoryVectorStore();
let embedder = rag.LlmEmbedder(
    llm.client({"provider": "openai", "apiKey": key}),
    {"model": "text-embedding-3-small"}
);

rag.index(store, embedder, "handbook", longText, {"source": "handbook"}, {});

let hits   = rag.retrieve(store, embedder, "how do I reset my password?", 4);
let prompt = "Answer using only this context:\n" + rag.context(hits, {})
           + "\n\nQuestion: how do I reset my password?";
let answer = llm.client({"provider": "openai", "apiKey": key})
    .chat([{"role": "user", "content": prompt}], {"model": "gpt-4o-mini"})["content"];

Functions

Function Description
chunk(text, opts) Splits text into overlapping chunks; returns list<string>.
index(store, embedder, docId, text, metadata, opts) Chunks, embeds, and stores a document. Returns the number of chunks.
retrieve(store, embedder, query, k) Embeds the query and returns the top k SearchHits.
context(hits, opts) Assembles hits into a prompt-ready block.

chunk options:

Key Default Meaning
by "words" "words", "chars", or "paragraphs".
size 200 words / 1000 chars Window size for the chosen unit.
overlap 40 words / 200 chars Overlap between consecutive windows (ignored for paragraphs).

index stores each chunk under id "<docId>#<i>" and attaches the caller's metadata plus text, docId, and chunk (the index), so retrieved hits carry their own source text.

context options: withSources (default true) prefixes each chunk with [n] (docId): ; separator (default a blank line) joins the chunks. Pass {"withSources": false} for the bare chunk text.

Testing without a network

rag depends only on the Embedder interface, so tests can supply a deterministic stub embedder and avoid any API calls:

class StubEmbedder implements rag.Embedder {
    func embed(string text): list<any> {
        let v = [];
        for (k in ["cat", "dog", "car"]) {
            if (text.lower().contains(k as string)) { v = v.push(1.0f); }
            else { v = v.push(0.0f); }
        }
        return v;
    }
}

vecmath

The float32 similarity kernel underpinning the stores. It scores in native code rather than the interpreted loop, and accepts vectors as either a list of numbers or a packed little-endian float32 BLOB (the stored form).

Function Description
score(metric, a, b) Similarity (higher = closer) between two vectors for "cosine" / "dot" / "euclidean".
topK(vectors, query, k, metric) Ranks vectors (a list of lists or float32 blobs) against query, returns up to k {index, score} dicts in descending order.
import vecmath;

vecmath.score("cosine", [1.0, 0.0], [1.0, 0.0]);   # 1.0
let hits = vecmath.topK([[1.0, 0.0], [0.0, 1.0]], [1.0, 0.0], 1, "cosine");
hits[0]["index"];                                   # 0

vectorstore.score(metric, a, b) delegates to vecmath.score; you rarely call vecmath directly unless building your own ranking.