Reading non-english characters from file

I have a plugin that auto replaces text from a CSV file.

With non-english characters, those get automatically converted to the ASCII placeholder 65533 code when the file is read.

So for example, the name “González” from the data sheet will read as such.

A text to ASCII convertor online shows codes - 71 111 110 122 195 161 108 101 122

The 195 161 is the character code for á which is alt+161.

However, when the file is read from the csv file in the plugin, that string will get changed to ASCII codes
71 111 110 122 65533 108 101 122

So the text layer then uses the placeholder box where the á should go when the text is replaced.

I know the arial font can display the characters. If I open the CSV file in Notepad on Windows and then copy/paste directed into the text layer, the arial font will display the characters.

The issue occurs when the file is read, before I do any parsing to the data at all.

This is the line of code from where it is reading the file. directly after this, the ASCII code has already been changed to 65533 for the characters. Is there a way to ready the file with a different character set to avoid the codes being auto converted to 65533?

var csvDataRead=await CSVFile.read();

OK, I figured out the issue myself so I am posting what I found.

The CSV file needed saved as UTF-8 and then the plugin will read it correctly and the text replacement works. The CSV file was saved with the “Western Europe” character set before.

2 Likes