Debian Bug report logs - #54389
debiandoc2html produces invalid text for russian locales

version graph

Package: debiandoc-sgml; Maintainer for debiandoc-sgml is Debian XML SGML Team <debian-sgml@lists.debian.org>; Source for debiandoc-sgml is src:debiandoc-sgml (PTS, buildd, popcon).

Reported by: cit@sensi.org

Date: Fri, 7 Jan 2000 23:03:45 UTC

Severity: normal

Found in version 1.1.38

Fixed in version debiandoc-sgml/1.1.39

Done: Ardo van Rangelrooij <ardo@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to Ardo van Rangelrooij <ardo@debian.org>:
Bug#54389; Package debiandoc-sgml. (full text, mbox, link).


Acknowledgement sent to cit@sensi.org:
New Bug report received and forwarded. Copy sent to Ardo van Rangelrooij <ardo@debian.org>. (full text, mbox, link).


Message #5 received at maintonly@bugs.debian.org (full text, mbox, reply):

From: Dmitry Tsitelov <ppcit@spb.cityline.ru>
To: maintonly@bugs.debian.org
Subject: debiandoc2html produces invalid text for russian locales
Date: Sat, 08 Jan 2000 01:36:54 +0300
Package: debiandoc-sgml
Version: 1.1.38

debiandoc2html produces html files full of entities instead of plain
russian characters for russian locale ru_RU.KOI8-R. 

for example:

=== test.sgml ===
...
<p>ôÅËÓÔ ÐÏ-ÒÕÓÓËÉ</p>
...
===

will be

=== test.html/ch1.html ===
...
<p>
&ocirc;&Aring;&Euml;&Oacute;&Ocirc;
&ETH;&Iuml;-&Ograve;&Otilde;&Oacute;&Oacute;&Euml;&Eacute;
</p>
...
=== 

in appropriate file. Tracing showed, that debiandoc2html detects locale
correctly and have no any problems with locale itself.


The problem is in .../Format/HTML.pm:

===
...
sub _cdata
{
    output( encode_entities( $_[0] ) );
}
...
===

encode_entities() produces invalid output for russian text. Just
changing this code to simple output solves the problem for russian, but
will loose functionality for Latin-1 based encodings.


Sincerely,
Dmitry Tsitelov


Information forwarded to debian-bugs-dist@lists.debian.org, Ardo van Rangelrooij <ardo@debian.org>:
Bug#54389; Package debiandoc-sgml. (full text, mbox, link).


Acknowledgement sent to Fumitoshi UKAI <ukai@debian.or.jp>:
Extra info received and forwarded to list. Copy sent to Ardo van Rangelrooij <ardo@debian.org>. (full text, mbox, link).


Message #10 received at 54389@bugs.debian.org (full text, mbox, reply):

From: Fumitoshi UKAI <ukai@debian.or.jp>
To: 54389@bugs.debian.org
Subject: Re: Bug#54389: debiandoc2html produces invalid text for russian locales
Date: Sat, 08 Jan 2000 19:04:51 +0900
Hi, 

I've got the same bug.  Not only, russian locale, but also Japanese 
locale ja_JP, we'll get broken html output by debiandoc2html.  
I think it would be better to have something overriding _cdata() in
/usr/lib/debiandoc-sgml/lib/DebianDoc_SGML/Locale/<locale>/HTML 

PS.
It would be better to rename ja_JP to ja_JP.eucJP, because ja_JP.eucJP
is canonical name. Other locale do this way, except ja_JP.
Note that it also needs to change in Aliases.pm.

Regards,
Fumitoshi UKAI


Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#54389; Package debiandoc-sgml. (full text, mbox, link).


Acknowledgement sent to Ardo van Rangelrooij <ardo@debian.org>:
Extra info received and forwarded to list. (full text, mbox, link).


Message #15 received at 54389@bugs.debian.org (full text, mbox, reply):

From: Ardo van Rangelrooij <ardo@debian.org>
To: Fumitoshi UKAI <ukai@debian.or.jp>
Cc: 54389@bugs.debian.org
Subject: Re: Bug#54389: debiandoc2html produces invalid text for russian locales
Date: 09 Jan 2000 20:42:09 +0100
Fumitoshi UKAI <ukai@debian.or.jp> writes:

> Hi, 
> 
> I've got the same bug.  Not only, russian locale, but also Japanese 
> locale ja_JP, we'll get broken html output by debiandoc2html.  
> I think it would be better to have something overriding _cdata() in
> /usr/lib/debiandoc-sgml/lib/DebianDoc_SGML/Locale/<locale>/HTML 

Read my other reply.

> PS.
> It would be better to rename ja_JP to ja_JP.eucJP, because ja_JP.eucJP
> is canonical name. Other locale do this way, except ja_JP.
> Note that it also needs to change in Aliases.pm.

Thanks for the tip!  It's been applied. :-)

> Regards,
> Fumitoshi UKAI
> 

-- 
Ardo van Rangelrooij
home email: avrangel@flevonet.nl, ardo@debian.org
home page:  http://home.flevonet.nl/avrangel
PGP fp:     3B 1F 21 72 00 5C 3A 73  7F 72 DF D9 90 78 47 F9


Reply sent to Ardo van Rangelrooij <ardo@debian.org>:
You have taken responsibility. (full text, mbox, link).


Notification sent to cit@sensi.org:
Bug acknowledged by developer. (full text, mbox, link).


Message #20 received at 54389-close@bugs.debian.org (full text, mbox, reply):

From: Ardo van Rangelrooij <ardo@debian.org>
To: 54389-close@bugs.debian.org
Subject: Bug#54389: fixed in debiandoc-sgml 1.1.39
Date: 9 Jan 2000 19:53:08 -0000
We believe that the bug you reported is fixed in the latest version of
debiandoc-sgml, which has been installed in the Debian FTP archive:
debiandoc-sgml_1.1.39.dsc
  to dists/potato/main/source/text/debiandoc-sgml_1.1.39.dsc
  replacing debiandoc-sgml_1.1.38.dsc
debiandoc-sgml_1.1.39_all.deb
  to dists/potato/main/binary-all/text/debiandoc-sgml_1.1.39.deb
  replacing debiandoc-sgml_1.1.38.deb
debiandoc-sgml_1.1.39.tar.gz
  to dists/potato/main/source/text/debiandoc-sgml_1.1.39.tar.gz
  replacing debiandoc-sgml_1.1.38.tar.gz

Note that this package is not part of the released stable Debian
distribution.  It may have dependencies on other unreleased software,
or other instabilities.  Please take care if you wish to install it.
The update will eventually make its way into the next released Debian
distribution.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 54389@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Ardo van Rangelrooij <ardo@debian.org> (supplier of updated debiandoc-sgml package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----

Format: 1.6
Date: Sun,  9 Jan 2000 14:56:58 +0100
Source: debiandoc-sgml
Binary: debiandoc-sgml
Architecture: source all
Version: 1.1.39
Distribution: unstable
Urgency: low
Maintainer: Ardo van Rangelrooij <ardo@debian.org>
Description: 
 debiandoc-sgml - DebianDoc SGML DTD and formatting tools
Closes: 54389
Changes: 
 debiandoc-sgml (1.1.39) unstable; urgency=low
 .
   * Format/LaTeX.pm: fixed broken <appendix> handling (Closes: 54390)
     (thanks Dmitry!)
   * Format/HTML.pm: fixed broken SDATA handling (Closes: #54389) (thanks
     Dmitry and Fumitoshi UKAI!)
   * Locale/Alias.pm: changed ja_JP into ja_JP.eucJP (thanks Fumitoshi UKAI!)
Files: 
 5f82e50e7b2188acefe0a94dfa19edaa 576 text optional debiandoc-sgml_1.1.39.dsc
 cecbab9c31c15f2b8f78d1cfcc0576e0 58512 text optional debiandoc-sgml_1.1.39.tar.gz
 4d2384d9de71dcae586370baa0c46b1d 55118 text optional debiandoc-sgml_1.1.39_all.deb

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: latin1

iQCVAwUBOHiT3T6XMRfcxSjpAQG9XQP/c+Ef1oThRbXIDj4nYnjbUhgd8r1b5byX
TN+uf1DnGkZhBFM06r1HPmcChpMMyaCY8gwHisLil9krxK6fqJ1PmxLy5L9ExLoE
xmF5Dx2j8DtKQMMs7Jb4T2JX6GuvP2U0rXxdLO+RVvCUDdqQSf3fP/tZg4KkvVnW
Ehz8Pr91vEc=
=qktq
-----END PGP SIGNATURE-----



Acknowledgement sent to Ardo van Rangelrooij <ardo@debian.org>:
Extra info received and filed, but not forwarded.

You requested that the message be sent to the package maintainer(s) but either the Bug report is not associated with any package (probably because of a missing Package psuedo-header field in the original Bug report), or the package(s) specified do not have any maintainer(s).

Your message has *not* been sent to any package maintainers; it has merely been filed in the Bug tracking system. If you require assistance please contact owner@bugs.debian.org quoting the Bug number 54389.

(full text, mbox, link).


Message #23 received at 54389-maintonly@bugs.debian.org (full text, mbox, reply):

From: Ardo van Rangelrooij <ardo@debian.org>
To: cit@sensi.org
Cc: 54389-maintonly@bugs.debian.org
Subject: Re: Bug#54389: debiandoc2html produces invalid text for russian locales
Date: 09 Jan 2000 20:41:23 +0100
Oops!  The function encode_entities apparently only handles iso-8859-1 
correctly. :-(  I changed the code back to the original version which
simply checks for the character "<>& and encodes them properly.

This is also fixed in the version uploaded today.

Dmitry Tsitelov <ppcit@spb.cityline.ru> writes:

> Package: debiandoc-sgml
> Version: 1.1.38
> 
> debiandoc2html produces html files full of entities instead of plain
> russian characters for russian locale ru_RU.KOI8-R. 
> 
> for example:
> 
> === test.sgml ===
> ...
> <p>ôÅËÓÔ ÐÏ-ÒÕÓÓËÉ</p>
> ...
> ===
> 
> will be
> 
> === test.html/ch1.html ===
> ...
> <p>
> &ocirc;&Aring;&Euml;&Oacute;&Ocirc;
> &ETH;&Iuml;-&Ograve;&Otilde;&Oacute;&Oacute;&Euml;&Eacute;
> </p>
> ...
> === 
> 
> in appropriate file. Tracing showed, that debiandoc2html detects locale
> correctly and have no any problems with locale itself.
> 
> 
> The problem is in .../Format/HTML.pm:
> 
> ===
> ...
> sub _cdata
> {
>     output( encode_entities( $_[0] ) );
> }
> ...
> ===
> 
> encode_entities() produces invalid output for russian text. Just
> changing this code to simple output solves the problem for russian, but
> will loose functionality for Latin-1 based encodings.
> 
> 
> Sincerely,
> Dmitry Tsitelov
> 

-- 
Ardo van Rangelrooij
home email: avrangel@flevonet.nl, ardo@debian.org
home page:  http://home.flevonet.nl/avrangel
PGP fp:     3B 1F 21 72 00 5C 3A 73  7F 72 DF D9 90 78 47 F9


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Mon Apr 29 14:42:51 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.