Tutorial: AI Integration
HarbourBuilder provides two complementary AI components: TOllama for connecting to local large language models via the Ollama service, and TTransformer for loading and running transformer model weights directly in-process. This tutorial covers both approaches with complete working examples.
Part 1: TOllama — Chat with a Local LLM
Step 1: Prerequisites
- Install Ollama on your machine.
- Pull a model: run
ollama pull llama3in your terminal. - Ensure Ollama is running (it listens on
http://localhost:11434by default).
TOllama connects to a locally running Ollama instance. All inference happens on your machine — no cloud services, no API keys, no data leaves your computer.
Step 2: Create the Chat UI
Build a form with a Memo for the conversation history, a TextBox for user input, and a Send button.
#include "hbbuilder.ch" function Main() local oForm, oMemo, oGetPrompt, oBtnSend, oOllama local cPrompt := "" // --- Connect to Ollama --- DEFINE OLLAMA oOllama ; URL "http://localhost:11434" ; MODEL "llama3" // --- Build the UI --- DEFINE FORM oForm TITLE "AI Chat" ; SIZE 700, 550 FONT "Segoe UI", 10 @ 10, 10 MEMO oMemo ; OF oForm SIZE 660, 420 ; READONLY @ 440, 10 GET oGetPrompt VAR cPrompt ; OF oForm SIZE 540, 28 @ 440, 560 BUTTON oBtnSend PROMPT "Send" ; OF oForm SIZE 110, 32 ; ACTION SendMessage( oOllama, oMemo, oGetPrompt ) ACTIVATE FORM oForm CENTERED return nil
Step 3: Send a Message and Get a Response
Use oOllama:Chat() for a simple request/response, or oOllama:ChatStream()
to receive the response token by token as it is generated.
Simple (blocking) chat completion:
static function SendMessage( oOllama, oMemo, oGetPrompt ) local cPrompt := oGetPrompt:GetValue() local cResponse if Empty( cPrompt ) return nil endif // Show user message oMemo:Append( "You: " + cPrompt + Chr( 13 ) + Chr( 10 ) ) oGetPrompt:SetValue( "" ) // Get AI response cResponse := oOllama:Chat( cPrompt ) oMemo:Append( "AI: " + cResponse + Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) ) return nil
Step 4: Streaming Response to a Memo
For a better user experience, stream the response so tokens appear in the Memo as they are generated — just like a real chat interface.
static function SendMessageStream( oOllama, oMemo, oGetPrompt ) local cPrompt := oGetPrompt:GetValue() if Empty( cPrompt ) return nil endif oMemo:Append( "You: " + cPrompt + Chr( 13 ) + Chr( 10 ) ) oMemo:Append( "AI: " ) oGetPrompt:SetValue( "" ) // Stream tokens one by one into the Memo oOllama:ChatStream( cPrompt, { |cToken| oMemo:Append( cToken ) } ) oMemo:Append( Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) ) return nil
TOllama maintains a conversation history internally. Each call to Chat() or
ChatStream() includes previous messages as context. Call oOllama:ClearHistory()
to start a fresh conversation.
Step 5: TOllama Properties and Methods
| Member | Type | Description |
|---|---|---|
cUrl |
Property | Ollama server URL (default "http://localhost:11434"). |
cModel |
Property | Model name (e.g. "llama3", "mistral", "codellama"). |
nTemperature |
Property | Sampling temperature (0.0 – 2.0, default 0.7). |
Chat( cPrompt ) |
Method | Send a prompt and return the full response as a string. |
ChatStream( cPrompt, bCallback ) |
Method | Stream the response, calling bCallback with each token. |
SetSystem( cText ) |
Method | Set the system prompt that guides the model's behavior. |
ClearHistory() |
Method | Reset the conversation history. |
Part 2: TTransformer — In-Process Inference
For scenarios where you need to run a model without an external service, HarbourBuilder provides TTransformer. This component loads model weights (GGUF format) directly into your application's memory and runs inference using the CPU (or GPU if available).
Step 1: Obtain a Model File
- Download a GGUF model file (e.g.
tinyllama-1.1b-chat.Q4_K_M.gguffrom Hugging Face). - Place it in your project folder or a known path.
Smaller quantized models (Q4_K_M, Q5_K_M) run well on most hardware. A 1B–3B parameter model with Q4 quantization needs only 1–2 GB of RAM. Larger models require more memory and a GPU.
Step 2: Load the Model
#include "hbbuilder.ch" function Main() local oForm, oMemo, oGetPrompt, oBtnRun, oTransformer // --- Load the transformer model --- DEFINE TRANSFORMER oTransformer ; MODEL "models/tinyllama-1.1b-chat.Q4_K_M.gguf" ; CONTEXT 2048 ; GPU_LAYERS 0 // Set > 0 to offload layers to GPU if .not. oTransformer:lLoaded MsgAlert( "Failed to load model: " + oTransformer:cError ) return nil endif // --- Build the UI --- DEFINE FORM oForm TITLE "TTransformer Demo" ; SIZE 700, 550 FONT "Segoe UI", 10 @ 10, 10 MEMO oMemo ; OF oForm SIZE 660, 420 ; READONLY @ 440, 10 GET oGetPrompt VAR cPrompt ; OF oForm SIZE 540, 28 @ 440, 560 BUTTON oBtnRun PROMPT "Run" ; OF oForm SIZE 110, 32 ; ACTION RunInference( oTransformer, oMemo, oGetPrompt ) ACTIVATE FORM oForm CENTERED return nil
Step 3: Run Inference
Use Generate() for a full response or GenerateStream() to stream tokens
into a Memo control as they are produced.
static function RunInference( oTransformer, oMemo, oGetPrompt ) local cPrompt := oGetPrompt:GetValue() if Empty( cPrompt ) return nil endif oMemo:Append( "Prompt: " + cPrompt + Chr( 13 ) + Chr( 10 ) ) oMemo:Append( "Response: " ) oGetPrompt:SetValue( "" ) // Stream output token by token oTransformer:GenerateStream( cPrompt, ; { |cToken| oMemo:Append( cToken ) }, ; // on each token 256 ) // max tokens oMemo:Append( Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) ) return nil
Step 4: TTransformer Properties and Methods
| Member | Type | Description |
|---|---|---|
cModel |
Property | Path to the GGUF model file. |
nContext |
Property | Context window size in tokens (default 2048). |
nGpuLayers |
Property | Number of layers to offload to GPU (0 = CPU only). |
lLoaded |
Property | .T. if the model loaded successfully. |
Generate( cPrompt, nMaxTokens ) |
Method | Run inference and return the full output string. |
GenerateStream( cPrompt, bCallback, nMaxTokens ) |
Method | Stream output tokens via callback. |
Tokenize( cText ) |
Method | Return an array of token IDs for the given text. |
When to Use Which
conversation history"| B["TOllama
Ollama service"] A -->|"Embedded inference
no external deps"| C["TTransformer
GGUF in-process"] B --> D["Pros: easy setup
many models
hot-swap models"] C --> E["Pros: no server needed
single executable
offline capable"] style A fill:#d2a8ff,stroke:#bc8cff,color:#0d1117 style B fill:#58a6ff,stroke:#388bfd,color:#0d1117 style C fill:#3fb950,stroke:#2ea043,color:#0d1117 style D fill:#58a6ff,stroke:#388bfd,color:#0d1117 style E fill:#3fb950,stroke:#2ea043,color:#0d1117
You can use TOllama for heavyweight chat tasks and TTransformer for lightweight, fast classification or embedding tasks — both in the same application.
See the AI Component Palette reference for the full list of AI components including TEmbedding, TRAGEngine, and TSpeechToText.