[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Multibyte encoding - what should a package provide?



Hi, 

From: sen_ml@eccosys.com
Subject: Re: Multibyte encoding - what should a package provide?
Date: Fri, 10 Sep 1999 00:22:19 +0900

> kubota> Please note, Unicode is not popular at all in Asia. I am sure
>                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> kubota> there are very very few people using Unicode in Japan. Instead,
> kubota> EUC-JP is popular for UNIX and SHIFT-JIS is the OS's coding
> kubota> system for Windows/Macintosh in Japan.
> 
> why is it not popular?  what are the reasons?  i keep hearing this, but
> i haven't come across an enumeration of those reasons.
> 
> any pointers to documents related to this would be much appreciated.

I said 'Asia', but I know only for Japan, Korea, and China.
How about other countries in Asia?  Are there any member from
these countries in Debian Project?  If there are, please add
comments.


1. Japan, Korea, and China have their own standard character codes.
   Unicode has no relation to them.  Unicode does not respect
   compatibility to these standard codes.

2. Japan, Korea, and China have similar but different characters which 
   have the same origin.  Unicode unified similar characters for a 
   technical reason -- 16bit is insufficient.  Though Japanese, Korean,
   and Chinese have similar characters, they are different.  Some of
   us don't care, and some cares -- for example, people whose name
   cannot be correctly expressed by Unicode, who research languages,
   and so on.

Large-scale softwares such as Tcl/Tk 8.1 sometimes uses Unicode as 
an INTERNAL codeset.  These softwares have automatic code-conversion
faculty which works for every input/output (against keyboard, display,
file, and everything).  Such an imprementation is acceptable because 
users need not treat Unicode.

There is a codeset which can express many languages at the same 
time --- ISO 2022-* series.  It respects compatibility to 
character sets (ASCII, Japanese, Korean, ...) it includes.
However, ISO 2022-* is STATEFUL codeset and it might be a complex
work to imprement ISO 2022-*.

An ideal multilingualized software should have an ability to choose 
Unicode, ISO 2022-*, and other local codesets, as Mule and (X)Emacs do.

As I said a several days ago at this mailing list, I am writing a
document on I18N.  I have already released drafts at Debian JP 
mailing list twice (but the drafts have empty chapters yet) and 
discussion is now running.

---
Tomohiro KUBOTA <kubota@debian.or.jp>


Reply to: