Unicode File IO by Zed Lopez


To treat a file as unicode:

The file of reference is called "ref". The output-mode of the file of reference is unicode-mode.

Glk has separate *_uni functions for several file and stream handling calls.

For the non-uni ones, it's definitive that a character is one byte long, and a byte is the fundamental unit. In text mode, you may only output the values 10, 32 to 126, 160 to 255: linefeed, space, and the printable Latin-1 characters. (Behavior is undefined, hence implementation dependent, if you try to output an illegal character). In binary mode, you may output any value 0-255.

With the uni calls, binary mode uses the UTF-32 encoding form: every character is a 4-byte word. In text mode, version 0.7.5 of the Glk spec calls for UTF-8; in 0.7.4 and prior versions, the spec defined the behavior as implementation dependent. (Note that any implementation will be able to read the files it itself wrote; where there could be an issue is reading a file a different terp wrote, or wanting some external application to read the file.)

Glk implementations that use UTF-8 for unicode text include:

- Glkote 2.20+ - WindowsGlk 1.47+ - cheapglk 1.05+ - remglk 0.2.5+ - garglk 2022.1+

Glk implementations that use UTF-32 for unicode text include:

- glkterm - glktermw - CocoaGlk

The only IDE available that uses UTF-8 for unicode text is the beta release of the Windows IDE.

Some interpreters that use UTF-8 for unicode text (which is to say that come bundled with Glk libraries that do so):

- Gargoyle 2022.1 - Quixe 2.1.3+ - Lectrote (since the earliest)

If you would prefer to test for the Glk library's unicode capabilities at runtime you could do:

When play begins: if unicode is supported, now the output-mode of the file of reference is unicode-mode.

But if you wanted a Latin-1 fallback if unicode was unavailable, you'd probably be better off with:

The file of ref-uni is called "refuni". The output-mode of the file of ref-uni is unicode mode. The file of ref-latin is called "reflatin".

The output-file is initially the file of ref-latin.

When play begins: if unicode is supported, now the output-file is the file of ref-uni.

Beyond the ``if unicode is supported`` phrase, this extension adds:

``if <external file> is in/-- text mode`` ``if <external file> is in/-- binary mode``

Otherwise, the extension only modifies functions from FileIO. i6t to use Glk unicode library functions for files whose output-mode is unicode-mode.

Chapter Changelog

2/220219 updated documentation

2/220218 changed ascii-mode -> latin1-mode, output_mode -> extf_output_mode added some documentation