With Delphi 6/7, how can I convert an AnsiString in a different CharSet, to hex String UTF-8?

Question

With Delphi 6/7, how can I convert an AnsiString in a different CharSet, to hex String UTF-8?

1k views Asked by MyICQ At 08 April 2022 at 23:45

I need to draw a barcode (QR) with Delphi 6/7. The program can run in various windows locales, and the data is from an input box.

On this input box, the user can choose a charset, and input his own language. This works fine. The input data is only ever from the same codepage. Example configurations could be:

Windows is on Western Europe, Codepage 1252 for ANSI text
Input is done in Shift-JIS ANSI charset

I need to get the Shift-JIS across to the barcode. The most robust way is to use hex encoding.

So my question is: how do I go from Shift-JIS to a hex String in UTF-8 encoding, if the codepage is not the same as the Windows locale?

As example: I have the string 能ラ. This needs to be converted to E883BDE383A9 as per UTF-8. I have tried this but the result is different and meaningless:

String2Hex(UTF8Encode(ftext))

Unfortunately I can't just have an inputbox for WideStrings. But if I can find a way to convert the ANSI text to a WideString, the barcode module can work with Unicode Strings as well.

If it's relevant: I am using the TEC-IT TBarcode DLL.

Original Q&A

There are 1 answers

**AmigoJack** · Accepted Answer · 2022-04-09T10:11:04+00:00

Creating and accessing a Unicode text control

This is easier than you may think and I did so in the past with the brand new Windows 2000 when convenient components like Tnt Delphi Unicode Controls were not available. Having background knowledge on how to create a Windows GUI program without using Delphi's VCL and manually creating everything helps - otherwise this is also an introduction of it.

First add a property to your form, so we can later access the new control easily:

type
  TForm1= class(TForm)
...
  private
    hEdit: THandle;  // Our new Unicode control
  end;

Now just create it at your favorite event - I chose FormCreate:

  // Creating a child control, type "edit"
  self.hEdit:= CreateWindowW( PWideChar(WideString('edit')), PWideChar(WideString('myinput')), WS_CHILD or WS_VISIBLE, 10, 10, 200, 25, Handle, 0, HINSTANCE, nil );
  if self.hEdit= 0 then begin  // Failed. Get error code so we know why it failed.
    //GetLastError();
    exit;
  end;

  // Add a sunken 3D edge (well, historically speaking)
  if SetWindowLong( self.hEdit, GWL_EXSTYLE, WS_EX_CLIENTEDGE )= 0 then begin
    //GetLastError();
    exit;
  end;

  // Applying new extended style: the control's frame has changed
  if not SetWindowPos( self.hEdit, 0, 0, 0, 0, 0, SWP_FRAMECHANGED or SWP_NOMOVE or SWP_NOZORDER or SWP_NOSIZE ) then begin
    //GetLastError();
    exit;
  end;

  // The system's default font is no help, let's use this form's font (hopefully Tahoma)
  SendMessage( self.hEdit, WM_SETFONT, self.Font.Handle, 1 );

At some point you want to get the edit's content. Again: how is this done without Delphi's VCL but instead directly with the WinAPI? This time I used a button's Click event:

var
  sText: WideString;
  iLen, iError: Integer;
begin
  // How many CHARACTERS to copy?
  iLen:= GetWindowTextLengthW( self.hEdit );
  if iLen= 0 then iError:= GetLastError() else iError:= 0;  // Could be empty, could be an error
  if iError<> 0 then begin
    exit;
  end;

  Inc( iLen );  // For a potential trailing #0
  SetLength( sText, iLen );  // Reserve space
  if GetWindowTextW( self.hEdit, @sText[1], iLen )= 0 then begin  // Copy text
    //GetLastError();
    exit;
  end;

  // Demonstrate that non-ANSI text was copied out of a non-ANSI control
  MessageBoxW( Handle, PWideChar(sText), nil, 0 );
end;

There are detail issues, like not being able to reach this new control via Tab, but we're already basically re-inventing Delphi's VCL, so those are details to take care about at other times.

Converting codepages

The WinAPI deals either in codepages (Strings) or in UTF-16 LE (WideStrings). For historical reasons (UCS-2 and later) UTF-16 LE fits everything, so this is always the implied target to achieve when coming from codepages:

// Converting an ANSI charset (String) to UTF-16 LE (Widestring)
function StringToWideString( s: AnsiString; iSrcCodePage: DWord ): WideString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, nil, 0 );  // How much CHARACTERS are needed?
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, PWideChar(result), iLenDest )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

The source codepage is up to you: maybe

1252 for "Windows-1252" = ANSI Latin 1 Multilingual (Western Europe)
932 for "Shift-JIS X-0208" = IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301)
28595 for "ISO 8859-5" = Cyrillic
65001 for "UTF-8"

However, if you want to convert from one codepage to another, and both source and target shall not be UTF-16 LE, then you must go forth and back:

Convert from ANSI to WIDE
Convert from WIDE to a different ANSI

// Converting UTF-16 LE (Widestring) to an ANSI charset (String, hopefully you want 65001=UTF-8)
function WideStringToString( s: WideString; iDestCodePage: DWord= CP_UTF8 ): AnsiString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, nil, 0, nil, nil );
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, PChar(result), iLenDest, nil, nil )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

As per every Windows installation not every codepage is supported, or different codepages are supported, so conversion attempts may fail. It would be more robust to aim for a Unicode program right away, as that is what every Windows installation definitly supports (unless you still deal with Windows 95, Windows 98 or Windows ME).

Combining everything

Now you got everything you need to put it together:

you can have a Unicode text control to directly get it in UTF-16 LE
you can use an ANSI text control to then convert the input to UTF-16 LE
you can convert from UTF-16 LE (WIDE) to UTF-8 (ANSI)

Size

UTF-8 is mostly the best choice, but size wise UTF-16 may need fewer bytes in total when your target audience is Asian: in UTF-8 both 能 and ラ need 3 bytes each, but in UTF-16 both only need 2 bytes each. As per your QR barcode size is an important factor, I guess.

Likewise don't waste by turning binary data (8 bits per byte) into ASCII text (displaying 4 bits per character, but itself needing 1 byte = 8 bits again). Have a look at Base64 which encodes 6 bits into every byte. A concept that you encountered countless times in your life already, because it's used for email attachments.

TechQA.

With Delphi 6/7, how can I convert an AnsiString in a different CharSet, to hex String UTF-8?

There are 1 answers

Creating and accessing a Unicode text control

Converting codepages

Combining everything

Size

Related Questions in DELPHI

Related Questions in UNICODE

Related Questions in HEX

Related Questions in CODEPAGES

Related Questions in ANSISTRING

Popular Questions

Trending Questions