UTF-8 encoding stored as 1250 (ANSI - Centraal Europa) in local variable

Hi,

I recieve an xml throughout http request which is utf-8.

When I store a variable in workflow the encoding isn’t utf-8 but it seems to be 1250 (ANSI - Centraal Europa).

How do I set the preferences for workflow to accept UTF-8?

I attached a sample workflow and XML

test_diacritical_mark.OL-workflow (29.7 KB) 20201209153605.xml (254 Bytes)

I also tested it with email output and in the subject the encoding isn’t working, but the body it is.

when you test with this file the output also in the body is broken.

I’m really out of options what to do now

20201210153125.xml (254 Bytes)

image

Workflow does not fully support UTF-8.
You have to convert your file using the Translator task.

I can confirm that. In one project I constantly have to convert from utf-8 to latin-1 and back. And if the data has to be generated as CSV for Excel in UTF-8, care has to be taken to add the BOM and for certain workflow conversions the BOM has to be removed again.

Meanwhile so many projects are connected with web page requests (Ajax requests). It would be nice if the workflow could take over this encoding.

The worst case is when you don’t know reliably in which character encoding the input file will come. Is there any way to determine the character encoding?

Except for XML, which defines an encoding attribute, the other text-based data formats have no way of natively storing the encoding of characters. Even the BOM that some software prepend to UTF-8 files is not actually part of the UTF-8 standard, so you can’t rely on its presence/absence to identify the encoding.

Why not use the XML directly?