TYacc + TLex
Sources: source/classes/tyacc.prg, source/classes/tlex.prg
Standalone classes (TYacc, TLex)
TYacc and TLex implement a complete LALR(1) parser generator and lexical analyzer in pure Harbour. TYacc performs shift-reduce parsing using a generated state machine, while TLex handles tokenization by matching input against a set of token patterns. Together they enable parsing of structured languages, expressions, and configuration files.
TYacc Key DATA Members
| DATA | Type | Description |
|---|---|---|
oLex | Object | Reference to the associated TLex tokenizer |
lDebug | Logical | Enable debug trace of parser states and transitions |
nToken | Numeric | Current look-ahead token identifier |
nState | Numeric | Current parser state number |
aStates / aValues | Array | Parser state stack and semantic value stack |
bShift / bReduce | Code block | Callbacks executed on shift and reduce actions |
TYacc Methods
| Method | Description |
|---|---|
New( oLex ) | Create a parser associated with the given lexer object |
Parse( cText ) | Parse the input text and return the result value |
Shift( n ) | Shift the next state onto the parser stack |
Reduce( n ) | Reduce by grammar rule number n |
Accept() | Signal successful parse completion |
ProdValue( n ) | Retrieve the semantic value of a production symbol at position n |
TLex
TLex inherits from TFile and provides tokenization services: it reads input text and returns a stream of tokens with associated semantic values.
| DATA | Type | Description |
|---|---|---|
aTokens | Array | Array of token pattern strings |
aIds | Array | Array of numeric token IDs corresponding to each pattern |
cToken | Character | Current matched token text |
uValue | Various | Semantic value associated with the current token |
| Method | Description |
|---|---|
New( cFile, aTokens, aIds, cSeparators ) | Create a lexer with token patterns, IDs, and separator characters |
nGetToken() | Get the next token from the input; returns the token ID |
SetText( c ) | Set the input text to be tokenized |
Add( cToken, nId ) | Add a new token pattern with its numeric ID |
Example: Simple Expression Parser
#include "FiveWin.ch"
function Main()
local oLex, oYacc
local aTokens := { "\d+", "\+", "\-", "\*", "/", "\(", "\)" }
local aIds := { 1, 2, 3, 4, 5, 6, 7 } // NUM, PLUS, MINUS, MUL, DIV, LPAR, RPAR
oLex := TLex():New( , aTokens, aIds, " " )
oLex:SetText( "12 + 34 * ( 56 - 7 )" )
oYacc := TYacc():New( oLex )
oYacc:lDebug := .T.
// Parse the expression
oYacc:Parse( oLex:cToken )
return nil
Notes
- TYacc uses a table-driven LALR(1) parsing algorithm. The grammar rules and state transitions must be defined by subclassing or by setting up the action tables programmatically.
- The
bShiftandbReducecode blocks allow custom semantic actions to be executed during parsing. - TLex attempts token patterns in order; the first matching pattern determines the token. Place more specific patterns before general ones.
ProdValue( n )accesses semantic values from the right-hand side of a grammar rule, wherenis the 1-based symbol position in the production.- Enable
lDebugto trace the parser's state transitions and shift/reduce decisions for troubleshooting grammar definitions.