76 lines
2.4 KiB
Markdown
76 lines
2.4 KiB
Markdown
# Encoding Fix Tool
|
||
|
||
Prepares source files for conversion of encoding from EUC-KR to UTF-8.
|
||
|
||
## Background
|
||
|
||
Most files in the source were originally written using the EUC-KR encoding. This would be fine if only comments were using characters that only exist in that encoding.
|
||
|
||
However, the original devs used EUC-KR also in string literals, which in turn are sent to the client or localized directly on the server and act as a lookup key.
|
||
|
||
If we simply convert the whole file from EUC-KR to UTF-8, these lookups will break since not all references are server-side and we want to keep compatibility with existing systems (client, quests, etc).
|
||
|
||
Therefore, we convert characters that are not valid UTF-8 characters used in string literals to their byte's string representation.
|
||
|
||
We leave comments untouched in order to convert those in bulk with a `iconv`
|
||
|
||
```bash
|
||
find . -name '*.cpp' -exec iconv -f EUC-KR -t UTF-8//TRANSLIT -o {}_u {} \; -exec mv {}_u {} \;
|
||
```
|
||
|
||
Repeat for the desired file extensions.
|
||
|
||
## Example result
|
||
|
||
Original File Content (read as UTF-8)
|
||
|
||
```cpp
|
||
// this string literal should be converted
|
||
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("<22>̰<EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20>Ⱦ<EFBFBD><C8BE><EFBFBD>"));
|
||
// this line should stay untouched
|
||
DWORD dwOppList[8]; // <20>̰<EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20>Ⱦ<EFBFBD><C8BE><EFBFBD>
|
||
```
|
||
|
||
Original File Content (read as EUC-KR and converted to UTF-8)
|
||
|
||
```cpp
|
||
// this string literal should be converted
|
||
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("이거 지금은 안쓴데"));
|
||
// this line should stay untouched
|
||
DWORD dwOppList[8]; // 이거 지금은 안쓴데
|
||
```
|
||
|
||
After running this script (read as UTF-8)
|
||
|
||
```cpp
|
||
// this string literal should be converted
|
||
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("\xC0\xCC\xB0\xC5 \xC1\xF6\xB1\xDD\xC0\xBA \xBE\xC8\xBE\xB4\xB5\xA5"));
|
||
// this line should stay untouched
|
||
DWORD dwOppList[8]; // <20>̰<EFBFBD> <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD> <20>Ⱦ<EFBFBD><C8BE><EFBFBD>
|
||
```
|
||
|
||
After running `iconv` on the script output (read as UTF-8)
|
||
|
||
```cpp
|
||
// this string literal should be converted
|
||
chA->ChatPacket(CHAT_TYPE_INFO, LC_TEXT("\xC0\xCC\xB0\xC5 \xC1\xF6\xB1\xDD\xC0\xBA \xBE\xC8\xBE\xB4\xB5\xA5"));
|
||
// this line should stay untouched
|
||
DWORD dwOppList[8]; // 이거 지금은 안쓴데
|
||
```
|
||
|
||
## Usage
|
||
|
||
To install dependencies:
|
||
|
||
```bash
|
||
bun install
|
||
```
|
||
|
||
To run:
|
||
|
||
```bash
|
||
bun run index.ts
|
||
```
|
||
|
||
This project was created using `bun init` in bun v1.1.1. [Bun](https://bun.sh) is a fast all-in-one JavaScript runtime.
|