FiveTech Support Forums

FiveWin / Harbour / xBase community
Board index FiveWin for Harbour/xHarbour phpBB to LLM
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: phpBB to LLM
Posted: Wed Dec 27, 2023 09:52 AM
Dear Anton,

many thanks for your help!

I am reviewing the results :-)
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: phpBB to LLM
Posted: Wed Dec 27, 2023 10:10 AM
Here you have run.py to test the model:

run.py
Code (fw): Select all Collapse
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load the fine-tuned GPT-2 model and tokenizer
fine_tuned_model_path = "./fine-tuned-model"
model = GPT2LMHeadModel.from_pretrained(fine_tuned_model_path)
tokenizer = GPT2Tokenizer.from_pretrained(fine_tuned_model_path)

# Input prompt for text generation
prompt = "what is a star ?"

# Tokenize the input prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones_like(input_ids)
pad_token_id = tokenizer.eos_token_id
max_new_tokens = 50

# Generate text using the fine-tuned model
output = model.generate(input_ids, attention_mask=attention_mask, pad_token_id=pad_token_id, max_length=len(input_ids[0]) + max_new_tokens, num_beams=5, no_repeat_ngram_size=2)

# Decode the generated tokens back to text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the generated text
print("Generated Text:", generated_text)
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: phpBB to LLM
Posted: Tue Jan 02, 2024 05:40 AM
From posts.dbf and posts.fpt now we generate the dataset.json to be trained. We are using just 20 different topics, so the dataset is not too large and we can do quicker tests with it:

dataset.prg
Code (fw): Select all Collapse
#include "FiveWin.ch"

request dbfcdx

function Main()

    local aPosts := {}, n

    USE posts VIA "dbfcdx"

    INDEX ON posts->topic + posts->date + posts->time + posts->forum TO subject
    GO TOP

    for n = 1 to 20
       AAdd( aPosts, GetTopic() )
    next
    hb_memoWrit( "dataset.json", hb_jsonEncode( aPosts ) )
    XBrowser( aPosts )

return nil

function GetTopic()

    local hTopic := {=>}, cTopic := RTrim( posts->topic )

    hTopic[ "topic" ]    = RTrim( posts->topic ) 
    hTopic[ "messages" ] = {}

    AAdd( hTopic[ "messages" ], GetPost() )
    SKIP 
    while posts->topic == cTopic
       AAdd( hTopic[ "messages" ], GetPost() ) 
       SKIP 
    end

return hTopic    

function GetPost() 

    local hPost := {=>}

    hPost[ "topic" ]    = RTrim( posts->topic )
    hPost[ "forum" ]    = RTrim( posts->forum )
    hPost[ "username" ] = RTrim( posts->username )
    hPost[ "date" ]     = posts->date 
    hPost[ "time" ]     = posts->time
    hPost[ "text" ]     = posts->text

return hPost

The structure of the generated json file is as follows:
Code (fw): Select all Collapse
[
   {  "topic": the title of the topic,
      "messages":
      [ 
         {
            "topic": the title of the topic,
            "forum": the forum name,
            "username": name of the author,
            "date": date of the post,
            "time": time of the post,
            "text": text of the post
         },
        next posts for the same topic
      ]
   },
   next topic,
   ...
]
so basically it is a list of the topics, with the name of the topic and the list of messages for such topic.
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: phpBB to LLM
Posted: Sun Jan 07, 2024 09:28 AM
regards, saludos

Antonio Linares
www.fivetechsoft.com

Continue the discussion