Tutorial: AI Integration

HarbourBuilder provides two complementary AI components: TOllama for connecting to local large language models via the Ollama service, and TTransformer for loading and running transformer model weights directly in-process. This tutorial covers both approaches with complete working examples.

Part 1: TOllama — Chat with a Local LLM

Step 1: Prerequisites

  1. Install Ollama on your machine.
  2. Pull a model: run ollama pull llama3 in your terminal.
  3. Ensure Ollama is running (it listens on http://localhost:11434 by default).
No API keys required

TOllama connects to a locally running Ollama instance. All inference happens on your machine — no cloud services, no API keys, no data leaves your computer.

Step 2: Create the Chat UI

Build a form with a Memo for the conversation history, a TextBox for user input, and a Send button.

#include "hbbuilder.ch"

function Main()

   local oForm, oMemo, oGetPrompt, oBtnSend, oOllama
   local cPrompt := ""

   // --- Connect to Ollama ---
   DEFINE OLLAMA oOllama ;
      URL "http://localhost:11434" ;
      MODEL "llama3"

   // --- Build the UI ---
   DEFINE FORM oForm TITLE "AI Chat" ;
      SIZE 700, 550 FONT "Segoe UI", 10

   @ 10, 10 MEMO oMemo ;
      OF oForm SIZE 660, 420 ;
      READONLY

   @ 440, 10 GET oGetPrompt VAR cPrompt ;
      OF oForm SIZE 540, 28

   @ 440, 560 BUTTON oBtnSend PROMPT "Send" ;
      OF oForm SIZE 110, 32 ;
      ACTION SendMessage( oOllama, oMemo, oGetPrompt )

   ACTIVATE FORM oForm CENTERED

return nil

Step 3: Send a Message and Get a Response

Use oOllama:Chat() for a simple request/response, or oOllama:ChatStream() to receive the response token by token as it is generated.

Simple (blocking) chat completion:

static function SendMessage( oOllama, oMemo, oGetPrompt )

   local cPrompt  := oGetPrompt:GetValue()
   local cResponse

   if Empty( cPrompt )
      return nil
   endif

   // Show user message
   oMemo:Append( "You: " + cPrompt + Chr( 13 ) + Chr( 10 ) )
   oGetPrompt:SetValue( "" )

   // Get AI response
   cResponse := oOllama:Chat( cPrompt )

   oMemo:Append( "AI: " + cResponse + Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) )

return nil

Step 4: Streaming Response to a Memo

For a better user experience, stream the response so tokens appear in the Memo as they are generated — just like a real chat interface.

static function SendMessageStream( oOllama, oMemo, oGetPrompt )

   local cPrompt := oGetPrompt:GetValue()

   if Empty( cPrompt )
      return nil
   endif

   oMemo:Append( "You: " + cPrompt + Chr( 13 ) + Chr( 10 ) )
   oMemo:Append( "AI: " )
   oGetPrompt:SetValue( "" )

   // Stream tokens one by one into the Memo
   oOllama:ChatStream( cPrompt, { |cToken| oMemo:Append( cToken ) } )

   oMemo:Append( Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) )

return nil
Conversation history

TOllama maintains a conversation history internally. Each call to Chat() or ChatStream() includes previous messages as context. Call oOllama:ClearHistory() to start a fresh conversation.

Step 5: TOllama Properties and Methods

Member Type Description
cUrl Property Ollama server URL (default "http://localhost:11434").
cModel Property Model name (e.g. "llama3", "mistral", "codellama").
nTemperature Property Sampling temperature (0.0 – 2.0, default 0.7).
Chat( cPrompt ) Method Send a prompt and return the full response as a string.
ChatStream( cPrompt, bCallback ) Method Stream the response, calling bCallback with each token.
SetSystem( cText ) Method Set the system prompt that guides the model's behavior.
ClearHistory() Method Reset the conversation history.

Part 2: TTransformer — In-Process Inference

For scenarios where you need to run a model without an external service, HarbourBuilder provides TTransformer. This component loads model weights (GGUF format) directly into your application's memory and runs inference using the CPU (or GPU if available).

Step 1: Obtain a Model File

  1. Download a GGUF model file (e.g. tinyllama-1.1b-chat.Q4_K_M.gguf from Hugging Face).
  2. Place it in your project folder or a known path.
Model size matters

Smaller quantized models (Q4_K_M, Q5_K_M) run well on most hardware. A 1B–3B parameter model with Q4 quantization needs only 1–2 GB of RAM. Larger models require more memory and a GPU.

Step 2: Load the Model

#include "hbbuilder.ch"

function Main()

   local oForm, oMemo, oGetPrompt, oBtnRun, oTransformer

   // --- Load the transformer model ---
   DEFINE TRANSFORMER oTransformer ;
      MODEL "models/tinyllama-1.1b-chat.Q4_K_M.gguf" ;
      CONTEXT 2048 ;
      GPU_LAYERS 0  // Set > 0 to offload layers to GPU

   if .not. oTransformer:lLoaded
      MsgAlert( "Failed to load model: " + oTransformer:cError )
      return nil
   endif

   // --- Build the UI ---
   DEFINE FORM oForm TITLE "TTransformer Demo" ;
      SIZE 700, 550 FONT "Segoe UI", 10

   @ 10, 10 MEMO oMemo ;
      OF oForm SIZE 660, 420 ;
      READONLY

   @ 440, 10 GET oGetPrompt VAR cPrompt ;
      OF oForm SIZE 540, 28

   @ 440, 560 BUTTON oBtnRun PROMPT "Run" ;
      OF oForm SIZE 110, 32 ;
      ACTION RunInference( oTransformer, oMemo, oGetPrompt )

   ACTIVATE FORM oForm CENTERED

return nil

Step 3: Run Inference

Use Generate() for a full response or GenerateStream() to stream tokens into a Memo control as they are produced.

static function RunInference( oTransformer, oMemo, oGetPrompt )

   local cPrompt := oGetPrompt:GetValue()

   if Empty( cPrompt )
      return nil
   endif

   oMemo:Append( "Prompt: " + cPrompt + Chr( 13 ) + Chr( 10 ) )
   oMemo:Append( "Response: " )
   oGetPrompt:SetValue( "" )

   // Stream output token by token
   oTransformer:GenerateStream( cPrompt, ;
      { |cToken| oMemo:Append( cToken ) }, ;  // on each token
      256 )  // max tokens

   oMemo:Append( Chr( 13 ) + Chr( 10 ) + Chr( 13 ) + Chr( 10 ) )

return nil

Step 4: TTransformer Properties and Methods

Member Type Description
cModel Property Path to the GGUF model file.
nContext Property Context window size in tokens (default 2048).
nGpuLayers Property Number of layers to offload to GPU (0 = CPU only).
lLoaded Property .T. if the model loaded successfully.
Generate( cPrompt, nMaxTokens ) Method Run inference and return the full output string.
GenerateStream( cPrompt, bCallback, nMaxTokens ) Method Stream output tokens via callback.
Tokenize( cText ) Method Return an array of token IDs for the given text.

When to Use Which

graph LR A{"Need AI?"} -->|"Full LLM chat
conversation history"| B["TOllama
Ollama service"] A -->|"Embedded inference
no external deps"| C["TTransformer
GGUF in-process"] B --> D["Pros: easy setup
many models
hot-swap models"] C --> E["Pros: no server needed
single executable
offline capable"] style A fill:#d2a8ff,stroke:#bc8cff,color:#0d1117 style B fill:#58a6ff,stroke:#388bfd,color:#0d1117 style C fill:#3fb950,stroke:#2ea043,color:#0d1117 style D fill:#58a6ff,stroke:#388bfd,color:#0d1117 style E fill:#3fb950,stroke:#2ea043,color:#0d1117
Combine both approaches

You can use TOllama for heavyweight chat tasks and TTransformer for lightweight, fast classification or embedding tasks — both in the same application.

Explore more

See the AI Component Palette reference for the full list of AI components including TEmbedding, TRAGEngine, and TSpeechToText.

On This Page

Getting Started Component Palette IDE Features Tutorials Reference Platforms Part 1: TOllama — Chat with a Local LLM Step 1: Prerequisites Step 2: Create the Chat UI Step 3: Send a Message and Get a Response Step 4: Streaming Response to a Memo Step 5: TOllama Properties and Methods Part 2: TTransformer — In-Process Inference Step 1: Obtain a Model File Step 2: Load the Model Step 3: Run Inference Step 4: TTransformer Properties and Methods When to Use Which