Unicode File IO by Zed Lopez
To treat a file as unicode:
The file of reference is called "ref". The output-mode of the file of reference is unicode-mode.
Glk has separate *_uni functions for several file and stream handling calls.
For the non-uni ones, it's definitive that a character is one byte long, and a byte is the fundamental unit. In text mode, you may only output the values 10, 32 to 126, 160 to 255: linefeed, space, and the printable Latin-1 characters. (Behavior is undefined, hence implementation dependent, if you try to output an illegal character). In binary mode, you may output any value 0-255.
With the uni calls, binary mode uses the UTF-32 encoding form: every character is a 4-byte word. In text mode, version 0.7.5 of the Glk spec calls for UTF-8; in 0.7.4 and prior versions, the spec defined the behavior as implementation dependent. (Note that any implementation will be able to read the files it itself wrote; where there could be an issue is reading a file a different terp wrote, or wanting some external application to read the file.)
Glk implementations that use UTF-8 for unicode text include:
- Glkote 2.20+ - WindowsGlk 1.47+ - cheapglk 1.05+ - remglk 0.2.5+ - garglk 2022.1+
Glk implementations that use UTF-32 for unicode text include:
- glkterm - glktermw - CocoaGlk
The only IDE available that uses UTF-8 for unicode text is the beta release of the Windows IDE.
Some interpreters that use UTF-8 for unicode text (which is to say that come bundled with Glk libraries that do so):
- Gargoyle 2022.1 - Quixe 2.1.3+ - Lectrote (since the earliest)
If you would prefer to test for the Glk library's unicode capabilities at runtime you could do:
When play begins: if unicode is supported, now the output-mode of the file of reference is unicode-mode.
But if you wanted a Latin-1 fallback if unicode was unavailable, you'd probably be better off with:
The file of ref-uni is called "refuni". The output-mode of the file of ref-uni is unicode mode. The file of ref-latin is called "reflatin".
The output-file is initially the file of ref-latin.
When play begins: if unicode is supported, now the output-file is the file of ref-uni.
Beyond the ``if unicode is supported`` phrase, this extension adds:
``if <external file> is in/-- text mode`` ``if <external file> is in/-- binary mode``
Otherwise, the extension only modifies functions from FileIO. i6t to use Glk unicode library functions for files whose output-mode is unicode-mode.
Chapter Changelog
2/220219 updated documentation
2/220218 changed ascii-mode -> latin1-mode, output_mode -> extf_output_mode added some documentation