Binary patching Adobe After Effects to work with UTF‑8 as the default Windows code page

or why you shouldn't assume the default Windows code page.

Since build 17134 (aka the April 2018 update) you can change the default Windows code page to UTF‑8. This means the ANSI APIs (e.g. CreateWindowExA) will be able to handle UTF‑8 strings.

window with garbled title window with title in Japanese

However, not all is rosy – rarely certain applications may break. I first encountered this when trying to use Adobe After Effects (hereafter AE). During startup AE displays the following error while scanning font families and exits:

After Effects error: could not convert Unicode characters. ( 23 :: 46 )

Changing the default code page back from UTF‑8 results in AE starting up normally. Let's debug this.

We open AfterFX.exe with x64dbg and let it run till the error dialog, resuming manually on exceptions. Then we do "Search for → All Modules → String references" in the CPU tab. This takes a few minutes. Afterwards we filter for "could not convert Unicode characters", set breakpoints on all occurences and restart AE.

We hit a breakpoint before seeing the error dialog in what seems to be an error handling function, which uses the string $$$/AE/U/error_bad_unicode_conversion=could not convert Unicode characters. and calls U_ReportError. Going up the stack lands us in the caller U_UTF16ToMBString where WideCharToMultiByte is called, error checking done on its return value and our breakpointed function called.

breakpoint we hit
Function where we hit one of the breakpoints.
U_UTF16ToMBString with WideCharToMultiByte arguments tagged
Snippet of U_UTF16ToMBString with WideCharToMultiByte arguments tagged.

Inspection reveals that the arguments to WideCharToMultiByte are the same no matter the system code page. lpUsedDefaultChar and lpDefaultChar are 0. cpMultiByte is 16, cchWideChar is 8, lpMultiByteStr is a valid buffer and lpWideCharStr is "MS P" (2D FF 33 FF 20 00 30 FF) – truncated from MS Pゴシック. CodePage is 0 aka CP_APC, which uses the default system ANSI code page. Yet, with UTF‑8 as the default system code page WideCharToMultiByte returns 0 instead of 8 and LastError is set to ERROR_INSUFFICIENT_BUFFER.

The issue can be replicated using the following C++ code:

#include <cassert>
#include <Windows.h>

int main()
{
	char input[] = { 0x2D, 0xFF, 0x33, 0xFF, 0x20, 0x00, 0x30, 0xFF }; //MS P
	char output[0x10]{};
	char expected_output[0x10] = { 0x4D, 0x53, 0x20, 0x50, 0x3F, 0x3F, 0x3F, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }; //MS P????
	char actual_output[0x10] = { 0xEF, 0xBC, 0xAD, 0xEF, 0xBC, 0xB3, 0x20, 0xEF, 0xBC, 0xB0, 0xEC, 0xB3, 0x8C, 0xEC, 0xB3 }; //MS P쳌

	int required_length = WideCharToMultiByte(CP_ACP, 0, reinterpret_cast<wchar_t*>(input), sizeof(input), output, sizeof(output), nullptr, nullptr);
	assert(GetLastError() == 8);
}

The input is being converted to UTF‑8 because it's our ANSI code page, but the output doesn't fit in the 16 byte buffer given. This results in an error and AE bailing out. Funnily enough, the documentation for WideCharToMultiByte recommends that one shouldn't use the ANSI code page, because it may be differ from computer to computer, causing data corruption or unexpected results, as is the case here.

The correct way is to use a specific code page, for example Windows-1252, which on my machine is the ANSI code page when not using UTF‑8 as the default code page.

Helpfully there's some int 3 padding preceeding U_UTF16ToMBString allowing us to write some extra assembly there to set ecx (CodePage) to 1252 (the aforementioned Windows-1252 code page), and patch a call to that code just before WideCharToMultiByte. After assembling the required instructions in x64dbg and verifying it works, we can manually apply the changes to the module in question (U.dll) using a hex editor to make them permanent.
This fixes the bug and enables AE start on Windows systems both with the default code page as UTF‑8 and without.

code page modification detour
Picture of codepage_detour.
patched call to codepage_detour
Patched call to codepage_detour before WideCharToMultiByte.