TLex
Source: source/classes/tlex.prg
Inherits from: TFile
TLex is a lexical analyzer (tokenizer) that reads source text from a file or string and breaks it into tokens. It supports configurable separator characters, skip-on-blank and skip-on-CRLF modes, identifier tables, and token ID mapping. The class returns tokens sequentially via nGetToken(), which returns a numeric token ID matching the entries in the aTokens array.
Key DATA Members
| DATA | Type | Description |
|---|---|---|
aTokens | Array | Array of token definitions (each entry: { cTokenString, nTokenId }) |
aIds | Array | Array of recognized identifier strings |
lSkipBlank | Logical | Skip whitespace characters when tokenizing |
lSkipCRLF | Logical | Skip carriage return and line feed characters |
cToken | Character | The last token text extracted |
cSeparators | Character | String of separator characters used to delimit tokens |
uValue | Any | Optional value associated with the current token |
cText | Character | The full source text being tokenized |
Methods
| Method | Description |
|---|---|
New( cFile, aTokens, aIds, cSeparators ) | Create a TLex from a file, with token table, identifier table, and separators |
nGetToken() | Extract the next token and return its numeric ID (0 = unrecognized, -1 = end) |
lEoF() | Return .T. when the end of the source text has been reached |
SetText( c ) | Set the source text to be tokenized (alternative to file-based constructor) |
Add( cToken, nId ) | Add a new token string with its numeric ID to the token table |
Example: Tokenize IF-THEN-ELSE Script
#include "FiveWin.ch"
function Main()
local oLex, nToken
local cScript := "IF x > 10 THEN print 'Hello' ELSE stop"
// Define tokens: { string, id }
local aTokens := { ;
{ "IF", 1 }, ;
{ "THEN", 2 }, ;
{ "ELSE", 3 }, ;
{ ">", 4 }, ;
{ "print", 5 }, ;
{ "stop", 6 } }
// Create lexer
oLex := TLex():New( , aTokens )
oLex:SetText( cScript )
oLex:lSkipBlank := .T.
// Tokenize
while ! oLex:lEoF()
nToken := oLex:nGetToken()
if nToken == 0
? "Unrecognized:", oLex:cToken
elseif nToken > 0
? "Token ID:", nToken, "Text:", oLex:cToken
endif
enddo
return nil
Notes
- TLex inherits from TFile, enabling file-based source text. Use
SetText()to tokenize a string instead. - The
aTokensarray maps string patterns to numeric IDs.nGetToken()returns the ID of the matched token, 0 for an unrecognized sequence, or -1 at end of input. cSeparatorsdefines which characters delimit tokens. Default separators include space, tab, and common punctuation if not specified.- The
cTokendata member always holds the last extracted token text, useful for diagnostic messages or symbol table lookups.