Prepares source files for conversion of encoding from EUC-KR to UTF-8.
Go to file
2024-04-06 00:36:53 +02:00
.gitignore initial commit 2024-04-06 00:12:15 +02:00
bun.lockb initial commit 2024-04-06 00:12:15 +02:00
index.ts initial commit 2024-04-06 00:12:15 +02:00
package.json initial commit 2024-04-06 00:12:15 +02:00
README.md fix example in readme to use 'correct' bytes 2024-04-06 00:36:53 +02:00
test.cpp add example to readme and sample file 2024-04-06 00:25:51 +02:00
tsconfig.json initial commit 2024-04-06 00:12:15 +02:00

Encoding Fix Tool

Prepares source files for conversion of encoding from EUC-KR to UTF-8.

Background

Most files in the source were originally written using the EUC-KR encoding. This would be fine if only comments were using characters that only exist in that encoding.

However, the original devs used EUC-KR also in string literals, which in turn are sent to the client or localized directly on the server and act as a lookup key.

If we simply convert the whole file from EUC-KR to UTF-8, these lookups will break since not all references are server-side and we want to keep compatibility with existing systems (client, quests, etc).

Therefore, we convert characters that are not valid UTF-8 characters used in string literals to their byte's string representation.

We leave comments untouched in order to convert those in bulk with a iconv

find . -name '*.cpp' -exec iconv -f EUC-KR -t UTF-8//TRANSLIT -o {}_u {} \; -exec mv {}_u {} \;

Repeat for the desired file extensions.

Example result

Original File Content (read as UTF-8)

// this string literal should be converted
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("檜剪 雖旎擎 寰噹等"));
// this line should stay untouched
DWORD dwOppList[8]; // 檜剪 雖旎擎 寰噹等

Original File Content (read as EUC-KR and converted to UTF-8)

// this string literal should be converted
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("?𦚯穇? 鴔?篣<>?? ?<3F>?楲?㫲"));
// this line should stay untouched
DWORD dwOppList[8]; // ?𦚯穇? 鴔?篣<>?? ?<3F>?楲?㫲

After running this script (read as UTF-8)

// this string literal should be converted
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("\xC0\xCC\xB0\xC5 \xC1\xF6\xB1\xDD\xC0\xBA \xBE\xC8\xBE\xB4\xB5\xA5"));
// this line should stay untouched
DWORD dwOppList[8]; // 檜剪 雖旎擎 寰噹等

After running iconv on the script output (read as UTF-8)

// this string literal should be converted
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("\xC0\xCC\xB0\xC5 \xC1\xF6\xB1\xDD\xC0\xBA \xBE\xC8\xBE\xB4\xB5\xA5"));
// this line should stay untouched
DWORD dwOppList[8]; // ?𦚯穇? 鴔?篣<>?? ?<3F>?楲?㫲

Usage

To install dependencies:

bun install

To run:

bun run index.ts

This project was created using bun init in bun v1.1.1. Bun is a fast all-in-one JavaScript runtime.