TSemanticIndex - Semantic Search over DBF/SQL

Source: source/classes/tsemanticindex.prg

TSemanticIndex brings meaning-based search to ordinary FiveWin data apps. Instead of LIKE '%word%', it finds the records whose text is closest in meaning to a query — e.g. searching "unhappy about late delivery" surfaces a note that says "the order arrived two weeks late and the customer is furious", even with no shared words.

It works by turning text into embedding vectors and ranking records by cosine similarity. It is the first of the "FWAI" utility classes — see the PyTorch-lite roadmap.

How it works

flowchart LR subgraph Index F[DBF/SQL text field] --> E1[Embed each record] --> N1[Unit-normalize] --> S[(vectors + ids)] end subgraph Query Q[Query text] --> E2[Embed] --> N2[Unit-normalize] --> D[Dot product vs all] --> R[Top-N records by score] end S --> D

Vectors are stored unit-normalized, so cosine similarity is a plain dot product. Scores range from 1.0 (identical meaning) downward.

Backend-agnostic

New( bEmbed ) takes a codeblock { |cText| -> vector }, so the embeddings can come from any source — the HuggingFace Inference API, a local model, or your own function. To use HuggingFace MiniLM through TEmbeddings:

oEmb := TEmbeddings():New()                 // needs HF_API_KEY (384-dim MiniLM)
oIdx := TSemanticIndex():New( { |c| oEmb:GetEmbeddings( c ) } )

For private/offline data, plug a local embedding model behind the same codeblock (see the roadmap's local-vs-cloud guidance) — the index code does not change.

Methods

MethodDescription
New( bEmbed )Create an index using the given embedding codeblock.
Add( cText, uId )Embed cText and store it under id uId.
AddVector( aVec, uId )Store a precomputed vector under uId (skips embedding).
IndexDbf( cField, bId )Walk the current work area, indexing cField; id defaults to RecNo() (or Eval(bId)).
Search( cQuery, nTop )Return { { uId, nScore }, ... } for the nTop best matches, best first.
Size()Number of indexed records.
Save( cFile ) / Load( cFile )Persist / restore the index (ids + vectors).

Example: search customer notes

USE customers
oEmb := TEmbeddings():New()
oIdx := TSemanticIndex():New( { |c| oEmb:GetEmbeddings( c ) } )
oIdx:IndexDbf( "NOTES" )                     // one row per record, id = RecNo()
oIdx:Save( "customers.idx" )                 // build once, reuse later

aHits := oIdx:Search( "unhappy about late delivery", 10 )
for each h in aHits
   ( customers )->( dbGoto( h[ 1 ] ) )       // h[1] = RecNo, h[2] = score
   ? customers->NAME, h[ 2 ]
next

Notes

See Also