Saving the configuration file in UTF-8 encoding with BOM

Advertisement

Segoro
Joined:
Posts:
4
Location:
Ukraine

Saving the configuration file in UTF-8 encoding with BOM

Hello. The WinSCP.ini configuration file on a Russian-language computer is saved in Windows-1251 encoding, and nothing can be done about it. I understand that previously this was due to the fact that this encoding is used exclusively for Cyrillic. However, it's already mid-2024, and we need to somehow move away from old encodings. UTF-8 perfectly supports not only Cyrillic but also Emoji. Please add the ability in the settings to specify which encoding to use when saving the WinSCP.ini configuration file.

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,041
Location:
Prague, Czechia

Re: Saving the configuration file in UTF-8 encoding with BOM

We would probably need more details.
In general the file does not really use any encoding, as it is purely ASCII.
If a value contains non-ASCII characters those are kind of URL-encoded in UTF-8 encoding (although indeed for backward compatibility reasons, it can read ANSI-encoded values too).

For example this is how "☺ Русский" note is saved:
Note=%EF%BB%BF%E2%98%BA%20%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9
The E2 98 BA is UTF-8 for ☺.
https://www.fileformat.info/info/unicode/char/263a/index.htm

Reply with quote

Segoro
Joined:
Posts:
4
Location:
Ukraine

I've long understood that any characters are encoded in URL encoding. However, the statement “In general, the file does not really use any encoding, as it is purely ASCII” is problematic. This is because if the system language is not English, the system itself will save the configuration in the encoding specific to that language. For example, the file for a Russian-language system will be saved with Windows-1251 encoding.

Yes, Windows 11 now has an experimental feature: “Use UTF-8 for all languages,” and if enabled, WinSCP.ini is saved with UTF-8 encoding. But firstly, this is an experimental feature of Windows 11; secondly, not everyone knows about it; and thirdly, not everyone is using Windows 11 yet.

The advantage of UTF-8 with BOM is support for characters from any language. You can copy and transfer the program to any computer, regardless of its default language setting. It also allows the use of emojis for naming custom commands in WinSCP. This is very convenient when you have many commands, as it's easier to search for them visually in the interface rather than by the command name text.

Therefore, it would be beneficial if WinSCP saved its configuration file not based on the system language but forcibly in UTF-8 with BOM. Or even better, provide an option in the settings for users to choose the encoding according to their preference.

Reply with quote

martin
Site Admin
martin avatar

As I have written before, the INI file is mostly ASCII (=7-bit). ASCII/7-bit characters are encoded the same in UTF-8 and Windows-1251 encodings. So I'm not really sure what the problem is. Can you post an example of an INI file with 1251-encoded contents (please point us to a specific line).

Reply with quote

Segoro
Joined:
Posts:
4
Location:
Ukraine

In an English-language operating system, your file may indeed be in ASCII (7-bit). However, in a Russian-language system, the WinSCP.ini file will be changed to Windows-1251 encoding as soon as even one setting is modified in the WinSCP interface. This occurs because your program doesn't actively control the WinSCP.ini text file. Since your program doesn't control it, the Windows operating system takes control over the encoding, and the file is saved in the encoding associated with the default language of that specific Windows system.

This is how it looks in systems with UTF-8


And this is how it will look if you decide to move the program to a Russian-language computer (Windows-1251) or vice versa, from a Russian-language to an English-language system. For example, when you have systems in different languages at home and at work.

Reply with quote

Advertisement

martin
Site Admin
martin avatar

Can you please attach an example INI file from that Russian system?!?
Are those custom commands? Do you have that problem with other texts? Like site names? Bookmarks? Or just the custom commands?

Reply with quote

Segoro

There are no problems with text display in the program itself. This only concerns the headers of commands written in Russian or marked with an emoji. You can test the file encoding yourself by trying to save WinSCP settings in an English-language system, a Russian-language system, or a Chinese one, for example. Then check which encoding WinSCP.ini is saved in, and see how Russian text or Chinese characters turn into gibberish in a system with a different language.

Reply with quote

Advertisement

You can post new topics in this forum