FiveTech Support Forums

FiveWin / Harbour / xBase community
Board index FiveWin for Harbour/xHarbour EXTRACT PLAIN TEXT FROM HTML FILE
Posts: 1091
Joined: Thu Nov 17, 2005 11:08 AM
EXTRACT PLAIN TEXT FROM HTML FILE
Posted: Fri May 10, 2024 02:35 PM

Hi,

Please I need, If it exist a freeware software that permits to me to extract plain text from an html file. Or other tips are welcome

Many Thanks

Marco

Marco Boschi
info@marcoboschi.it
Posts: 8515
Joined: Tue Dec 20, 2005 07:36 PM
Re: EXTRACT PLAIN TEXT FROM HTML FILE
Posted: Fri May 10, 2024 03:12 PM
João Santos - São Paulo - Brasil - Phone: +55(11)95150-7341
Posts: 1091
Joined: Thu Nov 17, 2005 11:08 AM
Re: EXTRACT PLAIN TEXT FROM HTML FILE
Posted: Fri May 10, 2024 03:49 PM
8)
Marco Boschi
info@marcoboschi.it
Posts: 8515
Joined: Tue Dec 20, 2005 07:36 PM
Re: EXTRACT PLAIN TEXT FROM HTML FILE
Posted: Fri May 10, 2024 03:54 PM
Code (fw): Select all Collapse
// C:\FWH\SAMPLES\HTML2TXT.PRG

#include "FiveWin.ch"

MEMVAR cINNText

FUNCTION Main()

   LOCAL cFile := ".\GMAP.HTML"

   IF FILE( "Boschi.txt" )

      FERASE( "Boschi.txt" )

   ENDIF

   MsgRun( "WAIT... Converting HTML to TEXT. ", ;
           "Please, Wait                     ", ;
           { || WinExec( CONVERT_HTML2TXT( cFile ) ), 3 } )

   MemoEdit( MemoRead( "Boschi.txt" ) )

RETURN NIL

FUNCTION CONVERT_HTML2TXT( cFile )

   LOCAL oExplorer := TOLEAuto():New( "InternetExplorer.Application" )

   PRIV cINNText

   oExplorer:Navigate2( cFile )

   DO WHILE oExplorer:ReadyState <> 4

      hb_idleSleep( 1 )

   ENDDO

   cINNText := oExplorer:Document:Body:InnerText

   MemoWrit( "Boschi.txt", cINNText )

   // MemoEdit( MemoRead( "Boschi.txt" ) )

   oExplorer:Quit()

RETURN NIL

// FIN / END
Regards, saludos.
João Santos - São Paulo - Brasil - Phone: +55(11)95150-7341

Continue the discussion