Difference between revisions of "Manual:Unicode"
(Adding Unicode section for scripting) |
(rename MUD to game everywhere) |
||
(9 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{TOC right}} | ||
+ | {{#description2:Manual on how to handle unicode text and symbols.}} | ||
=Unicode= | =Unicode= | ||
+ | |||
+ | Reference our manual page on [[Manual:Supported_Protocols#Encoding | Encoding]] for more information on CHARSET negotiation in Mudlet. | ||
+ | |||
+ | == Changing encoding == | ||
+ | |||
+ | Mudlet is being developed to support displaying text in many languages but how the characters that conveys that language varies between games as does the languages they support. Plain ''vanilla'' Telnet actually only supports the 96-characters of ASCII by default, but other language can be supported if the way that they are converted into 8-bit bytes can be agreed upon by the use of what is called [https://www.w3.org/International/questions/qa-what-is-encoding encoding] - setting Mudlet (and the game server if approriate) to the correct encoding allows the correct display of characters like the Spanish ñ, the Russian я, and all other letters (or more properly [https://en.wikipedia.org/wiki/Grapheme grapheme]). | ||
+ | |||
+ | Go to Preferences > General to set the encoding: | ||
+ | |||
+ | [[File:Server_encoding.png|frame|none|Prefer UTF-8 if your game supports it.]] | ||
+ | |||
+ | The list of encodings supported by Mudlet is: | ||
+ | |||
+ | {| class="wikitable mw-collapsible mw-collapsed" | ||
+ | |- | ||
+ | ! scope="col"| Encoding | ||
+ | ! scope="col"| Mudlet version | ||
+ | |- | ||
+ | | ASCII | ||
+ | | 0.0.1 | ||
+ | |- | ||
+ | |- | ||
+ | | UTF-8 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-1 | ||
+ | | 0.0.1 | ||
+ | |- | ||
+ | |- | ||
+ | | CP850 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | CP866 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | CP874 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-10 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-11 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-13 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-14 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-15 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-16 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-2 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-3 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-4 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-5 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-6 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-7 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-8 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | ISO 8859-9 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | KOI8-R | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | KOI8-U | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | MACINTOSH | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1250 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1251 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1252 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1253 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1254 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1255 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1256 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1257 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |- | ||
+ | | WINDOWS-1258 | ||
+ | | 3.2.0 | ||
+ | |- | ||
+ | |||
+ | |} | ||
+ | |||
+ | == Scripting with Unicode == | ||
+ | |||
+ | Mudlet uses English in all of its Lua API to enable scripts scripts to be international - so a script written on a computer with German default will work on a computer with English default, for example. This means you can expect all API functions, error messages to be in English, and the number separator is always a period <code>.</code> Mudlet sets <code>os.setlocale("C")</code> by default, [https://www.lua.org/pil/22.2.html see background]. | ||
+ | |||
+ | Not all Lua functions beginning with <code>string.</code> will work with Unicode - Mudlet has <code>utf8.</code> equivalents for those. See [[Manual:String_Functions|String functions in Mudlet]] for a complete list. For example: | ||
+ | |||
+ | <syntaxhighlight lang="lua"> | ||
+ | print(string.len("слово")) | ||
+ | > 10 -- wrong! | ||
+ | print(utf8.len("слово")) | ||
+ | > 5 -- correct! | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | == Triggering with Unicode == | ||
+ | Mudlet's trigger engine fully supports Unicode in all types of patterns, including regex. | ||
+ | |||
+ | [[File:Mudlet unicode trigger highlight.png|thumb|right|Example of <code>(\w+)</code> matching a Cyrillic word]] | ||
+ | |||
+ | == Loading external Lua files == | ||
+ | |||
+ | Mudlet uses Unicode (utf8) for the trigger engine and Lua subsystem. If you have a file you're [https://www.lua.org/pil/8.html loading externally] with Lua, make sure it saved in utf8 encoding. |
Latest revision as of 11:13, 5 January 2021
Unicode
Reference our manual page on Encoding for more information on CHARSET negotiation in Mudlet.
Changing encoding
Mudlet is being developed to support displaying text in many languages but how the characters that conveys that language varies between games as does the languages they support. Plain vanilla Telnet actually only supports the 96-characters of ASCII by default, but other language can be supported if the way that they are converted into 8-bit bytes can be agreed upon by the use of what is called encoding - setting Mudlet (and the game server if approriate) to the correct encoding allows the correct display of characters like the Spanish ñ, the Russian я, and all other letters (or more properly grapheme).
Go to Preferences > General to set the encoding:
The list of encodings supported by Mudlet is:
Encoding | Mudlet version |
---|---|
ASCII | 0.0.1 |
UTF-8 | 3.2.0 |
ISO 8859-1 | 0.0.1 |
CP850 | 3.2.0 |
CP866 | 3.2.0 |
CP874 | 3.2.0 |
ISO 8859-10 | 3.2.0 |
ISO 8859-11 | 3.2.0 |
ISO 8859-13 | 3.2.0 |
ISO 8859-14 | 3.2.0 |
ISO 8859-15 | 3.2.0 |
ISO 8859-16 | 3.2.0 |
ISO 8859-2 | 3.2.0 |
ISO 8859-3 | 3.2.0 |
ISO 8859-4 | 3.2.0 |
ISO 8859-5 | 3.2.0 |
ISO 8859-6 | 3.2.0 |
ISO 8859-7 | 3.2.0 |
ISO 8859-8 | 3.2.0 |
ISO 8859-9 | 3.2.0 |
KOI8-R | 3.2.0 |
KOI8-U | 3.2.0 |
MACINTOSH | 3.2.0 |
WINDOWS-1250 | 3.2.0 |
WINDOWS-1251 | 3.2.0 |
WINDOWS-1252 | 3.2.0 |
WINDOWS-1253 | 3.2.0 |
WINDOWS-1254 | 3.2.0 |
WINDOWS-1255 | 3.2.0 |
WINDOWS-1256 | 3.2.0 |
WINDOWS-1257 | 3.2.0 |
WINDOWS-1258 | 3.2.0 |
Scripting with Unicode
Mudlet uses English in all of its Lua API to enable scripts scripts to be international - so a script written on a computer with German default will work on a computer with English default, for example. This means you can expect all API functions, error messages to be in English, and the number separator is always a period .
Mudlet sets os.setlocale("C")
by default, see background.
Not all Lua functions beginning with string.
will work with Unicode - Mudlet has utf8.
equivalents for those. See String functions in Mudlet for a complete list. For example:
print(string.len("слово"))
> 10 -- wrong!
print(utf8.len("слово"))
> 5 -- correct!
Triggering with Unicode
Mudlet's trigger engine fully supports Unicode in all types of patterns, including regex.
Loading external Lua files
Mudlet uses Unicode (utf8) for the trigger engine and Lua subsystem. If you have a file you're loading externally with Lua, make sure it saved in utf8 encoding.