FiveTech Support Forums

FiveWin / Harbour / xBase community
Board index FiveWin for Harbour/xHarbour TGet() - UTF8 encoding fails [Solved]
Posts: 392
Joined: Tue Mar 10, 2009 11:54 AM
Re: TGet() - UTF8 encoding fails [Unsolved]
Posted: Fri Oct 13, 2023 07:32 AM

ok, I see.

Nevertheless the encoding should not be changed, in MHO this is a bug!

Since the Upper() function doesn't work 'properly' for UTF8, I have to use my own U82Upper() function for that!

But what about the VARCHAR clause? Does the same apply there?

Windows 11 Pro 22H2 22621.1848

Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384

Harbour 3.2.0dev (r2008190002)

FWH 23.10 x86
Posts: 10733
Joined: Sun Nov 19, 2006 05:22 AM
Re: TGet() - UTF8 encoding fails [Unsolved]
Posted: Sat Oct 14, 2023 05:27 PM
Since the Upper() function doesn't work 'properly' for UTF8, ...
Yes, Harbour's Upper() function does not work with UTF8 encoded Umlauts.
Even with Ansi encoded umlauts, Harbour Upper/Lower functions work only if the codepage is set to German.

But a Unicode Get control does not have to depend on Harbour's Upper() function for converting to Upper case when picture clause "@!" is used. Windows OS has its own built-in Upper/Lower case functionality. This functionality is used by a Unicode Get by setting the style to ES_UPPERCASE, so this upper case conversion is automatically done by Windows.
This explains how "üäö" is converted to "ÜÄÖ" inside the Get.

In the version to be released we are providing two new functions, WinUpper() and WinLower().
These functions are wrappers to Windows API functions CharUpper() and CharLower().
These functions work both with ANSI/UTF8 encoded texts.
If the parameter is ANSI encoded umlaut, the result is ANSI encoded umlaut and
if the parameter is UTF8 encoded umlaut, the result is UTF8 encoded umlaut.

Here is a preview of one of these functions.
Code (fw): Select all Collapse
#include "fivewin.ch"

#xtranslate enc(<c>) => If(isutf8(<c>),"UTF8", "ANSI" )

function Main()

   local cAnsiLower  := "üäö"
   local cUtf8Lower  := AnsiToUtf8( cAnsiLower )
   local cUtf8Upper, cAnsiUpper

   cUtf8Upper  := winUpper( cUtf8Lower )
   cAnsiUpper  := winUpper( cAnsiLower )

   ? cUtf8Upper, STRTOHEX( cUtf8Upper, " " ), enc( cUtf8Upper )
      // --> "ÜÄÖ", "C3 9C C3 84 C3 96", "UTF8"
   ? cAnsiUpper, STRTOHEX( cAnsiUpper, " " ), enc( cAnsiUpper )
      // --> "ÜÄÖ", "DC C4 D6", "ANSI"

return nil

#pragma BEGINDUMP

#include <windows.h>
#include <hbapi.h>
#include <fwh.h>

LPSTR UTF16toUTF8( LPWSTR utf16 );

HB_FUNC( WINUPPER )
{
   LPWSTR pStr;
   LPCSTR pRet;

   if HB_ISCHAR( 1 )
   {
      pStr = fw_parWide( 1 );
      CharUpperW( pStr );
      if ( isutf8( hb_parc( 1 ), hb_parclen( 1 ) ) )
      {
         pRet = UTF16toUTF8( pStr );
         hb_retc( pRet );
         hb_xfree( ( void * ) pRet );
      }
      else { fw_retWide( pStr ); }
      hb_xfree( ( void * ) pStr );
   } else { hb_retc( "" ); }
}

#pragma ENDDUMP
This works without setting any codepage and whether FW_SetUnicode() is set to .F. or .T.
Regards



G. N. Rao.

Hyderabad, India
Posts: 392
Joined: Tue Mar 10, 2009 11:54 AM
Re: TGet() - UTF8 encoding fails [Unsolved]
Posted: Sun Oct 15, 2023 10:35 AM
Tested with:
Code (fw): Select all Collapse
local cAnsiLower  := " Καλημέρα - Приве́ - ดีตอนเช้า"

I think that's really good :D

But of course I can't use it with TGet() and the picture clause "@!" because then the encoding changes :(
Windows 11 Pro 22H2 22621.1848

Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384

Harbour 3.2.0dev (r2008190002)

FWH 23.10 x86
Posts: 10733
Joined: Sun Nov 19, 2006 05:22 AM
Re: TGet() - UTF8 encoding fails [Unsolved]
Posted: Sun Oct 15, 2023 10:43 AM
because then the encoding changes
We intend to address all issues with your help and feedback.
Regards



G. N. Rao.

Hyderabad, India
Posts: 392
Joined: Tue Mar 10, 2009 11:54 AM
Re: TGet() - UTF8 encoding fails [Solved]
Posted: Sat Nov 04, 2023 09:36 AM
Dear Mr. Nageswara Rao,
now encoding is OK :D
Thanks
Frank
Windows 11 Pro 22H2 22621.1848

Microsoft (R) Windows (R) Resource Compiler Version 10.0.10011.16384

Harbour 3.2.0dev (r2008190002)

FWH 23.10 x86
Posts: 10733
Joined: Sun Nov 19, 2006 05:22 AM
Re: TGet() - UTF8 encoding fails [Solved]
Posted: Sat Nov 04, 2023 09:48 AM

Thank you.

Possible because of your feedback.

Regards



G. N. Rao.

Hyderabad, India

Continue the discussion