[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Filtering out some untranslatable strings



Hi everyone and Wolfgang in particular ;)

I prepared a code snippet that is able to remove some untranslatable strings from the manuals.
I tested it locally (as a separate script) on the Debian Edu bullseye manual, but more rigorous testing is probably advisable before taking it into production.
I suppose the snippet could be appended to scripts/get_manual, although I could not test this here.

---- begin code snippet -----

# create $name-stripped.xml
# wich will remove some non-translatable strings
# ---remove untranslatable image names--- #
echo "removing image names"
sed -e 's#<imagedata.*</imageobject>#</imageobject>#g' $xmlfile > $name-stripped.xml
# ---remove paragraphs that just have a <ulink> and no other text--- #
echo "removing link paragraphs"
    #---# first copy those paragraphs to a tempfile #---#
    TMPFILE3=$(mktemp)
    cat $xmlfile | sed -n '/^<para><ulink/p' | sed -n '/> *$/p'  > $TMPFILE3
    #---# then replace those links with an empty string #---#
    #---# and keep only the <para> tag to prevent xml from being broken #---#
    while read line ;
        do sed -i "s#$line#<para>#" $name-stripped.xml
        done < $TMPFILE3
# ---remove FIXME: paragraphs--- #
# ---(currently that colon is missing in some FIXME paragraphs)--- #
echo "removing FIXME: paragraphs"
sed -i '/^FIXME\:/d' $name-stripped.xml

---- end code snippet -----

For this to be useful, also po4a.cfg needs a small addition.
It should look like this (with the added pot_in line):
[po_directory] .

[type: docbook] debian-edu-bullseye-manual.xml \
pot_in:debian-edu-bullseye-manual-stripped.xml \
$lang:$lang.xml \
add_$lang:?./$lang.add \
opt:"-o nodefault='<inlinemediaobject> <imagedata>' -o untranslated='<listitem> <inlinemediaobject> <imagedata>' -M UTF-8 -k 15"

With this enabled, the Debian Edu bullseye manual counts 1154 translatable strings
instead of 1210 strings now.
More untranslatable strings could be moved out with some adaptations to the wiki
(e.g. reword paragraphs that contain a link, so that those links could become separate paragraphs in a meaningful way).

-- 
Kind regards, Frans Spiesschaert

Reply to: