FiveTech Support Forums

FiveWin / Harbour / xBase community
Board index FiveWin for Harbour/xHarbour random sentences generator
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
random sentences generator
Posted: Thu Nov 30, 2023 04:09 PM
Inspired in the idea of AI large languages models, this is a very simple and easy to understand sentences generator, quite funny :-)

The more sentences that you provide to it, the more "inspired" that it will be :-D

llml.prg
Code (fw): Select all Collapse
#include "FiveWin.ch"

static hTokens := {=>}

function Main()

聽 聽Tokenizer( "el gato subi贸 al arbol y maull贸 hasta que lleg贸 el bombero" )
聽 聽Tokenizer( "Me gusta aprender cosas nuevas todos los d铆as" )
聽 聽Tokenizer( "El cielo es azul y el sol brilla" )
聽 聽Tokenizer( "La m煤sica es una forma de expresi贸n art铆stica" )
聽 聽Tokenizer( "El chocolate es un dulce que se hace con cacao" )
聽 聽Tokenizer( "La Tierra es el tercer planeta del sistema solar y tiene una luna" )
聽 聽Tokenizer( "El agua es un l铆quido transparente e inodoro que se compone de hidr贸geno y ox铆geno" )
聽 聽Tokenizer( "Los gatos son animales dom茅sticos muy populares" )

聽 聽? Generate( "el" )

return nil

function Tokenizer( cSentence )

聽 聽local aTokens := hb_ATokens( cSentence )
聽 聽local n
聽 聽
聽 聽hb_HCaseMatch( hTokens, .F. )
聽 聽
聽 聽for n = 1 to Len( aTokens ) - 1
聽 聽 聽 if ! hb_HHasKey( hTokens, aTokens[ n ] )
聽 聽 聽 聽 聽hTokens[ aTokens[ n ] ] = { aTokens[ n + 1 ] }
聽 聽 聽 else
聽 聽 聽 聽 聽AAdd( hTokens[ aTokens[ n ] ], aTokens[ n + 1 ] )
聽 聽 聽 endif
聽 聽next
聽 聽
return nil 聽 
聽 聽
function Generate( cToken )

聽 聽local cSentence := cToken, n := 1

聽 聽while hb_hHasKey( hTokens, cToken ) .and. ! Empty( hTokens[ cToken ] ) .and. n++ < 20
聽 聽 聽 cSentence += " " + hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ] 
聽 聽 聽 cToken = hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ]
聽 聽end
聽 聽
return cSentence
Some funny results:
"el chocolate es un l铆quido que lleg贸 compone de expresi贸n art铆stica"
"el agua brilla"
"el gato planeta del sistema solar y el agua"
"el tercer es el y maull贸"
"el sol es un y el hasta que se hace de hidrogeno art铆stica"
"el sol planeta del sistema solar y el hasta que se el sol es azul dulce que se el chocolate"
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: random sentences generator
Posted: Thu Nov 30, 2023 05:22 PM
In this example we load all William Shakespeare's books into memory:

https://www.fivetechsoft.com/files/shakespeare.txt
Code (fw): Select all Collapse
#include "FiveWin.ch"

static hTokens := {=>}

function Main()

聽 聽local cText := hb_memoRead( "shakespeare.txt" )
聽 聽local cSentence 

聽 聽for each cSentence in hb_ATokens( cText, "." )
聽 聽 聽 聽Tokenizer( cSentence )
聽 聽next 聽 聽

聽 聽? Generate( "the" )

return nil
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 6983
Joined: Fri Oct 07, 2005 07:07 PM
Re: random sentences generator
Posted: Thu Nov 30, 2023 09:19 PM

Dear Antonio,

thank you very much.

I added

xtranslate hb_HHasKey( [<x,...>] ) => HHasKey( <x> )

and translated:

Tokenizer("The tomcat climbed the tree and meowed until the firefighter arrived.")

Tokenizer("I like learning new things every day.")

Tokenizer("The sky is blue and the sun is shining.")

Tokenizer("Music is a form of artistic expression.")

Tokenizer("Chocolate is a sweet made from cocoa.")

Tokenizer("The Earth is the third planet in the solar system and has a moon.")

Tokenizer("Water is a clear and odorless liquid made of hydrogen and oxygen.")

Tokenizer("Cats are very popular pets.")

However, as a result, I only get el

Should it work with xHarbour?

Best regards,

Otto

Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: random sentences generator
Posted: Fri Dec 01, 2023 03:39 AM

Dear Otto,

When you call function Generate( <cInitialWord> ) you have to provide an initial word that exists in your sentences,

in your case:

? Generate( "the" )

regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: random sentences generator
Posted: Fri Dec 01, 2023 05:01 AM
Dear Otto,

In this example you can visually review how we organize the tokens, so its easier to understand how it works :-)

llml.prg
Code (fw): Select all Collapse
#include "FiveWin.ch"

static hTokens := {=>}

function Main()

聽 聽local n

聽 聽Tokenizer( "The cat climbed the tree and meowed until the firefighter arrived" )
聽 聽Tokenizer( "I like learning new things every day" )
聽 聽Tokenizer( "The sky is blue and the sun is shining" )
聽 聽Tokenizer( "Music is a form of artistic expression" )
聽 聽Tokenizer( "Chocolate is a sweet made from cocoa" )
聽 聽Tokenizer( "The Earth is the third planet in the solar system and has a moon" )
聽 聽Tokenizer( "Water is a clear and odorless liquid made of hydrogen and oxygen" )
聽 聽Tokenizer( "Cats are very popular pets" )
聽 聽Tokenizer( "Paris is the capital of France and a popular tourist destination" )
聽 聽Tokenizer( "A triangle is a polygon with three sides and three angles" )
聽 聽Tokenizer( "I like to read books and watch movies" )
聽 聽Tokenizer( "A bicycle is a vehicle that has two wheels and pedals" )
聽 聽Tokenizer( "Microsoft is a technology company that makes software and hardware products" )
聽 聽Tokenizer( "Apples are a type of fruit that can be red, green, or yellow" )
聽 聽Tokenizer( "Elephants are the largest land animals" )
聽 聽Tokenizer( "The color wheel consists of primary colors like red, blue, and yellow" )

聽 聽XBrowser( hTokens )

聽 聽for n = 1 to 5
聽 聽 聽 ? Generate( "The" )
聽 聽 聽 ? Generate( "A" )
聽 聽 聽 ? Generate( "are" )
聽 聽next 聽 

return nil

function Tokenizer( cSentence )

聽 聽local aTokens := hb_ATokens( cSentence )
聽 聽local n, nAt
聽 聽
聽 聽hb_HCaseMatch( hTokens, .F. )
聽 聽
聽 聽for n = 1 to Len( aTokens ) - 1
聽 聽 聽 if ! hb_HHasKey( hTokens, aTokens[ n ] )
聽 聽 聽 聽 聽hTokens[ aTokens[ n ] ] = { aTokens[ n + 1 ] }
聽 聽 聽 else
聽 聽 聽 聽 聽AAdd( hTokens[ aTokens[ n ] ], aTokens[ n + 1 ] )
聽 聽 聽 endif
聽 聽next

return nil 聽 
聽 聽
function Generate( cToken )

聽 聽local cSentence := cToken, n := 1

聽 聽while hb_hHasKey( hTokens, cToken ) .and. ! Empty( hTokens[ cToken ] ) .and. n++ < 100
聽 聽 聽 cSentence += " " + ( cToken := hTokens[ cToken ][ hb_RandomInt( 1, Len( hTokens[ cToken ] ) ) ] )
聽 聽end
聽 聽
return cSentence
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 44158
Joined: Thu Oct 06, 2005 05:47 PM
Re: random sentences generator
Posted: Fri Dec 01, 2023 06:13 AM
You can use chatGPT to generate the sentences for you:
write 50 simple sentences to feed artificial intelligence using the format Tokenizer( sentence )
Tokenizer("The sun rises in the east.")
Tokenizer("Cats are furry animals.")
Tokenizer("Water freezes at zero degrees Celsius.")
Tokenizer("Mars is known as the red planet.")
Tokenizer("The moon orbits the Earth.")
Tokenizer("Bananas are a good source of potassium.")
Tokenizer("Football is a popular sport worldwide.")
Tokenizer("Stars twinkle in the night sky.")
Tokenizer("The Great Wall of China is a historic landmark.")
Tokenizer("Rivers flow from higher to lower elevations.")
Tokenizer("Bees play a crucial role in pollination.")
Tokenizer("Diamonds are a form of carbon.")
Tokenizer("The Pacific Ocean is the largest ocean on Earth.")
Tokenizer("The Statue of Liberty stands in New York Harbor.")
Tokenizer("The human body has 206 bones.")
Tokenizer("Photosynthesis is the process by which plants make food.")
Tokenizer("The Eiffel Tower is located in Paris.")
Tokenizer("Birds have feathers for flying.")
Tokenizer("The Amazon Rainforest is home to diverse wildlife.")
Tokenizer("Electricity is a form of energy.")
Tokenizer("The Sahara Desert is the largest hot desert.")
Tokenizer("Snowflakes have unique and intricate patterns.")
Tokenizer("Honey is produced by bees from flower nectar.")
Tokenizer("The Earth's atmosphere consists of nitrogen and oxygen.")
Tokenizer("Jupiter is the largest planet in our solar system.")
Tokenizer("A rainbow is formed by the refraction of light.")
Tokenizer("The Mona Lisa is a famous painting by Leonardo da Vinci.")
Tokenizer("Owls are nocturnal birds of prey.")
Tokenizer("Tigers are the largest cats in the world.")
Tokenizer("The Nile River is the longest river in Africa.")
Tokenizer("Penguins live in the Southern Hemisphere.")
Tokenizer("The Mona Lisa is displayed in the Louvre Museum.")
Tokenizer("Astronauts wear spacesuits for protection in space.")
Tokenizer("The Leaning Tower of Pisa is in Italy.")
Tokenizer("The human brain weighs about 3 pounds.")
Tokenizer("Ants are known for their organized colonies.")
Tokenizer("The Taj Mahal is a beautiful marble mausoleum.")
Tokenizer("Volcanoes erupt molten lava.")
Tokenizer("The speed of light is approximately 299,792 kilometers per second.")
Tokenizer("Chameleons can change color to blend into their surroundings.")
Tokenizer("The Celsius and Fahrenheit scales are used to measure temperature.")
Tokenizer("The Sydney Opera House is an iconic landmark in Australia.")
Tokenizer("The Hubble Space Telescope captures stunning images of the universe.")
Tokenizer("Pandas primarily eat bamboo.")
Tokenizer("The coldest temperature ever recorded on Earth was -128.6 degrees Fahrenheit.")
Tokenizer("The Mona Lisa has a mysterious smile.")
Tokenizer("Rainbows are a spectrum of light.")
Tokenizer("Dolphins are highly intelligent marine mammals.")
Tokenizer("The Berlin Wall divided East and West Berlin during the Cold War.")
Tokenizer("The human heart beats about 100,000 times per day.")
regards, saludos

Antonio Linares
www.fivetechsoft.com
Posts: 6983
Joined: Fri Oct 07, 2005 07:07 PM
Re: random sentences generator
Posted: Fri Dec 01, 2023 06:54 AM

Dear Antonio,

Thank you, it works.

And thank you very much for your research and development work and for exploring new techniques for us.

Best regards,

Otto

Continue the discussion