[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]
it is obvious that a text editor needs ability to input text from keyboard, otherwise the text editor is entirely useless. Similarly, an internationalized text editor needs ability to input characters used for various languages. Other softwares such as shells, libraries such as readline, environments such as consoles and X terminal emulators, script languages such as perl, tcl/tk, python, and ruby, and application softwares such as word processors, draw and paints, file managers such as Midnight Commander, web browsers, mailers, and so on also need ability to input internationalized text. Otherwise these softwares are entirely useless.
There are various languages in the world. Thus, proper input methods vary from languages to languages.
Some languages such as English doesn't need any special input methods. All characters for the language can be inputted by a single key on a keyboard. Keymap is all which a user has to care.
Some other languages such as German need a simple extension. For example, u with umlaut can be inputted with two strokes of ':' and 'u'. A way to switch ordinal input mode (key strokes of ':' and 'u' inputs ':' and 'u') and the extension input mode (key strokes of ':' and 'u' bears u with umlaut) has to be supplied. Almost languages in the world can be inputted with this method.
Other languages such as Chinese and Japanese need a complicated input method, since they use thousands of characters. Since it is very difficult and challenging problem to develop a clever input method, a few companies are developing Japanese input methods. Typical Japanese input methods are shipped with tens of megabytes of conversion dictionary. It is often very troublesome to set up an input method for these languages. [26] You also have to be practiced to use these input methods.
Different technologies are used for these languages. The aim of this chapter is to introduce technologies for them.
Ideally, it is a responsibility for console and X terminal emulators to supply an input method. This situation is already achieved for simple languages which don't need complicated input methods. Thus, non-X softwares don't need to care about input methods.
There are a few Debian packages for consoles and X terminal emulators which supply input methods for particular languages.
Thai characters
Korean Hangul
Big5 traditional Chinese ideograms
CN-GB simplified Chinese ideograms
And more, there are a few softwares which supply input methods for existing console environment.
Japanese (needs SKK as a conversion engine)
Japanese (needs Wnn as a conversion engine; not avaliable as a Debian package)
Japanese (needs Canna as a conversion engine; not avaliable as a Debian package)
However, since input methods for complex languages have not been available historically, a few non-X softwares have been developed with input methods.
A text editor which can input Japanese (needs Canna as a conversion engine.)
A text editor which can input Japanese (needs Canna as a conversion engine.)
A text editor which can input Japanese (needs Canna as a conversion engine.)
You have to take care of the differences between number of characters,
columns, and bytes. For example, you can find immediately
that bash
cannot handle UTF-8 input properly when you invoke
bash
on UTF-8 Xterm and push BackSpace key. This is because
readline
always erase one column on the screen and one byte in the
internal buffer for one stroke of 'BackSpace' key. To solve this problem,
wide character should be used for internal processing. One
stroke of 'BackSpace' should erase wcwidth() columns on the screen
and one wchar_t unit in the internal buffer.
X11R5 is the first internationalized version of X Window System. However, X11R5 supplied two sample implements of international text input. They are Xsi and Ximp. Existence of two different protocols was an annoying situation. However, X11R6 determined XIM, a new protocol for internationalized text input, as the standard. Internationalized X softwares should support text input using XIM.
They are designed using server-client model. The client calls the server when necessary. The server supplies conversion from key stroke to internationalized text.
Kinput and kinput2 are protocols for Japanese
text input, which existed before X11R5. Some softwares such as
kterm
and so on supports kinput2 protocol. kinput2
is the server software. Since the current version of kinput2
supports XIM protocol, you don't need to support kinput protocol.
***** Not written yet *****
Development of XIM client is a bit complicated. You can read source code for
rxvt
and xedit
to study.
Programming for
Japanse characters input
is a good introduction to XIM programming.
The following are examples of softwares which can work as XIM clients.
X Terminal Emulators such as krxvt
, kterm
, and so on.
Text editors such as xedit
, gedit
, and so on.
Web rowser mozilla
.
The following are examples of softwares which can work as XIM servers.
kinput
and skkinput
for Japanese.
Here I will explain how to use XIM input with Debian system. This will help developers and package maintainers who want to test XIM facility of their softwares. Debian Woody or later systems are assumed.
At first, locale database has to be prepared. Uncomment ja_JP.EUC-JP
EUC-JP, ko_KR.EUC-KR EUC-KR, zh_CN.GB2312, and
zh_TW BIG5 lines in /etc/locale.gen and invoke
/usr/sbin/locale-gen
. This will prepare locale database under
/usr/share/locale/. For systems other than Debian Woody or later,
please take the valid procedure for these systems to prepare locale database.
Basic Chinese, Japanese, and Korean X fonts are included in
xfonts-base
package for Debian Woody and later.
XIM server must be installed. For Japanese,
kinput2
or skkinput
packages are available.
kinput2
supports Japanese input engines of Canna
and FreeWnn and skkinput
supports
SKK. For Korean, ami
is
available. For traditional Chinese and simplified
Chinese, xcin
is available.
Of course you need an XIM client software. xedit
in
xbase-clients
package is an example of XIM client.
Then, login as a non-root user. Environment variables of LC_ALL (or LANG) and XMODIFIERS must be set as following.
for Japanese/kinput2: LC_ALL=ja_JP.eucJP and XMODIFIERS=@im=kinput2
for Korean/ami: LC_ALL=ko_KR.eucKR and XMODIFIERS=@im=Ami
for traditional Chinese/xcin: LC_ALL=zh_TW.Big5 and XMODIFIERS=@im=xcin
for simplified Chinese/xcin: LC_ALL=zh_CN.GB2312 and XMODIFIERS=@im=xcin-zh_CN.GB2312
Then invoke the XIM server. Just invoke it with background mode (with &). kinput2 and ami don't open a new window while xcin opens a new window and outputs some messages.
Then invoke the XIM client. Focus on an input area of the software. Hit Shift-Space or Control-Space and type something. Did some strange characters appear? This document is too brief to explain how to input valid CJK characters and sentences with these XIM servers. Please consult documents of XIM servers.
GNU Emacs and XEmacs take an entirely different model for international input.
They supply all input methods for various languages. Instead of relying on console or XIM, they use these input methods. These input methods can be selected by M-x set-input-method command. The selected input method can be switched on and off by M-x toggle-input-method command.
GNU Emacs supplies input methods for British, Catalan, Chinese (array30, 4corner, b5-quick, cns-quick, cns-tsangchi, ctlau, ctlaub, ecdict, etzy, punct, punct-b5, py, py-b5, py-punct, py-punct-b5, qj, qj-b5, sw, tonepy, ziranma, zozy), Czech, Danish, Devanagari, Esperanto, Ethiopic, Finnish, French, German, Greek, Hebrew, Icelandic, IPA, Irish, Italian, Japanese (egg-wnn, skk), Korean (hangul, hangul3, hanja, hanja3), Lao, Norwegian, Portuguese, Romanian, Scandinavian, Slovak, Spanish, Swedish, Thai, Tibetan, Turkish, Vietnamese, Latin-{1,2,3,4,5}, Cyrillic (beylorussian, jcuken, jis-russian, macedonian, serbian, transit, transit-bulgarian, ulrainian, yawerty), and so on.
[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ next ]
Introduction to i18n
31 May 2018mailto:debian at tmail dot plala dot or dot jp (retired DD)