Welcome! Log In Create A New Profile

Advanced

Working with Chinese text

Posted by Hilltopper 
Working with Chinese text
November 19, 2019 12:53AM
Hi,

I have a client who needs to store some text in Chinese. I have simplified Chinese installed on my computer and can type and store it in the database. However, this text is stored in HTML blocks and I need to be able to convert it to plain text using HTMLTOTEXT. This renders the Chinese characters as ???????? in the result. I have tried using the language constants mentioned in the documentation but the explanation is pretty thin. Anyone ever do this or have any ideas?

Here is an example of what I'm trying to convert. Please note that the sample is jibberish as I do not speak Chinese

<!-- Generated by XStandard version 3.0.0.0 on 2019-11-18T15:54:22 -->

<p>我呃嗯嗯我</p>


Thanks!

Steve



Edited 1 time(s). Last edit at 11/19/2019 12:55AM by Hilltopper.
Re: Working with Chinese text
November 19, 2019 06:45AM
Hi, you should not use ANSI but rather Unicode!

Kind regards,
Guenter Predl
office@windev.at
Re: Working with Chinese text
November 19, 2019 05:41PM
Hi Guenter,

My project default is ANSI but I am using Unicode Strings to handle the Chinese Variables and I am able to store the Chinese text to the database and display it on the screen.

Any idea how I can configure HTMLtoText to work with this data?


This returns ????

HTMLTOTEXT("<!-- Generated by XStandard version 3.0.0.0 on 2019-11-18T15:54:22 -->

<p>我呃嗯嗯我</p>")

I have also tried with same result the following as well as using 'charsetChinese'

HTMLTOTEXT("<!-- Generated by XStandard version 3.0.0.0 on 2019-11-18T15:54:22 -->

<p>我呃嗯嗯我</p>",CharsetUTF8)

What am I missing?
Re: Working with Chinese text
November 19, 2019 05:54PM
Hi, the returned Chinese text ist the "text"! It's not getting better. What's the problem? What did you expect?

Kind regards,
Guenter Predl
office@windev.at
Re: Working with Chinese text
November 19, 2019 06:24PM
I'm getting ????? back rather than the Chinese characters

This is in a WD 24 project. I've tried with project configuration set to both unicode and ansi strings.

Guenter, you're getting the Chinese characters as a result?

I've tried testing this both by using Info(htmltext(ChineseString)) and by copying to a unicode text field on screen.



Edited 1 time(s). Last edit at 11/20/2019 03:28AM by Hilltopper.
Re: Working with Chinese text
November 20, 2019 07:45AM
Hi, I recommend asking PC Soft's free Technical Support! They have a Chinese version of WINDEV and one of their Chinese speaking employees should be able to help!

Kind regards,
Guenter Predl
office@windev.at
Peter Holemans
Re: Working with Chinese text
November 20, 2019 03:25PM
Hi Hilltopper,

I've been running international WX stuff since V17 including Chinese, Korean, etc. and using HTML text.
First of all there is some stuff I would do if you go Double Byte characters:
1) Make your project Unicode (in the configuration). There is not a single reason since more than half a decade to still do ANSI projects unless locking yourself up with numerous constraints and additional complexity...
2) Make sure the encoding is part of the HTML string as HTMLToString will use this information to correctly translate the html text
Old pre-html 5 style:
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
New html 5 style:
<head>
  <meta charset="UTF-8">
</head>

I hope this helps

Peter Holemans
Re: Working with Chinese text
November 20, 2019 10:05PM
Thanks Peter, thank was helpful

I am now able to get this working in a project set to use Unicode but unfortunately, converting my entire project to Unicode isn't feasible currently, it's a huge system that I've been working on since WD 9 and I just don't have to the bandwidth or budget to refactor all the issues that arise from this switch.

For now, I guess I'll just create a simple REST webservice project that is configured to run in Unicode and which can perform the HTMLTOTEXT on the string and return it.
Re: Working with Chinese text
November 21, 2019 02:57AM
Update:

using

ChangeCharset(charsetUTF8)

before the HTMLToText

did the trick
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: