[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#802721: lintian: mojibake in HTML documentation



tags -1 patch
--

The only UTF-8 sequence that appears in these files is the sequence
for a section character, when referring to a section in the Policy
Manual.  It appears repeatedly in a few of those files.  This is
Unicode code point U+00A7, which in UTF-8 becomes octal 302 247.  That
two-byte binary sequence could just be converted in sed to the HTML
name (HTML "entity") for that character: "§".  The resulting file
will be plain ASCII.

Adding something like the following in a post-install script should
fix this issue:

find $(DESTDIR)/usr/share/doc/lintian/lintian.html/*.html -type f -exec \
sed -i 's/\o302\o247/\§/g' {} \;

The "-type f" will not be necessary if it is certain that the *.html
files will always be ordinary files, not directories.  I have tried
the above without the "-type f" on the command line in bash and it
worked fine.


Paul Hardy


Reply to: