Re: Multibyte encoding - what should a package provide?
On Wednesday 8 September 1999, at 11 h 27, the keyboard of sen_ml@eccosys.com
wrote:
> -the current approach in unicode and iso 10646 is to treat certain
> characters (appearance - glyphs?) from different languages as the same
> character (byte representatoin - code point?).
...
> the most often cited example i hear of is for kanji (roughly,
> ideographs) -- some kanji from different locales are treated as
> identical.
There is a classical example which is even simpler to grasp for European-languages users: the decimal dot in english (3.14) is the same glyph (== shape) as the dot which marks the end of a sentence.
In SGML style, they would be two different elements, because they have different semantics (for instance, when producing a French version, you have to replace the decimal dot by a comma, but not the end-of-sentence dot). I
In Unicode, they are the same glyph and the same character. The rationale for this decision is explained in the standard (quick summary: it would be extremely long and complicated to list the possible semantics of a glyph).
Reply to: