[wp-hackers] no-code-duplication i18n for WordPress

Lionel Elie Mamane lionel at mamane.lu
Tue Mar 4 08:04:12 GMT 2008


(Sorry for the quasi-dupe on wp-hackers, just making sure the whole
discussion spans the two mailing lists.)

Hello,

I'm Lionel Elie Mamane, I recently got involved in the Debian
packaging of WordPress, and I'd like to propose you an enhancement in
your full-i18n procedures, so that all languages are guaranteed to
use the same code, only strings differ. I come bearing "proof of
concept" patches, available from
http://people.debian.org/~lmamane/wordpress/ .

HISTORY
=======

Originally, all I wanted was to have French WordPress packaged in
Debian, but then I saw there were code differences (in version 2.3.2)
that might have been security issues (or not) between the English and
French versions. I'm talking about differences like:

-       <input type="hidden" name="redirect_to" value="<?php echo attribute_escape($_SERVER["REQUEST_URI"]); ?>" />
+       <input type="hidden" name="redirect_to" value="<?php echo wp_specialchars($_SERVER["REQUEST_URI"]); ?>" />

Maybe attribute_escape and wp_specialchars do the same thing, I dunno,
but, well, ... this was not very confidence-inspiring.

Then, Nikolay Bachiyski and I started discussing why things were as
they were in WordPress and started brainstorming for a way to improve
them that would not get in the way of English version development too
much. That discussion is at http://bugs.debian.org/461617, but its
result is summarised below, so you don't need to go read it all unless
you want to.

I18N PROBLEMS IN CURRENT WORDPRESS
==================================

As you probably all know, the main problem is that some strings
(mainly error messages) are output before gettext is loaded / ready to
be used, so these strings cannot be translated by gettext and are
nowadays hard-coded in the source. Translating WordPress into a new
language entails forking the code and translating these hard-coded
strings. A work continuously to be redone for new releases.

THE PROPOSAL
============

I absolutely want all language versions of WordPress to share the same
code. That is for several reasons:

 - less work Debian-wise to ship them all
 - security (and other) updates need to update only one copy of the
   code.

Only one copy -> less work, but also one is sure that none is
forgotten. I feel it is too easy for localisation teams to forget one
code change or the other in their version n to version n+1 work.


To achieve that, the idea is that fully-localised versions are
generated automatically from the English version by replacing the
hard-coded strings in a build stage. You can still ship
already-translated sources tarballs for your users. This allows Debian
to easily ship all supported languages (no 15 tarballs to download and
package, only one and then a build stage), this allows you to easily
get out security updates for all languages at once (correct in English
source, run build stage for every language, tar the 15 obtained trees,
upload, announce), ... To make implementation easier, the strings to
be statically translated are tagged with special comments.

I implemented a script to do this build stage; the patch (against
trunk as of Sunday) is at
http://people.debian.org/~lmamane/wordpress/no-code-dup-i18n-poc-v2.patch



For example, in the WordPress source,
 wp_die("Could not connect to DB", 'Fatal Error');
would become
 /* WP_I18N_START */ wp_die("Could not connect to DB", 'Fatal Error'); /* WP_I18N_END */
and the localisation teams get a .pot file with
 msgid "Could not connect to DB"
 msgstr ""
 msgid "Fatal Error"
 msgstr ""
they then do their po-work as usual, e.g.:
 msgid "Could not connect to DB"
 msgstr "Échec lors de la connexion à la base de données"
 msgid "Fatal Error"
 msgstr "Erreur fatale"
produce mo-file and the "translate-static" script I propose would go
through the source and replace
 /* WP_I18N_START */ wp_die("Could not connect to DB", 'Fatal Error'); /* WP_I18N_END */
by
 /* WP_I18N_START */ wp_die("Échec lors de la connexion à la base de données", 'Erreur fatale'); /* WP_I18N_END */



My current patch adds a "i18n-tools" directory to the wordpress source
tree, but we can also put it elsewhere (the code is
location-independent), no problem. For example in
http://svn.automattic.com/wordpress-i18n/tools/ .

It creates a README file in that explains how it works.

There is a (partial) example .po file for French for this static
translation at http://people.debian.org/~lmamane/wordpress/fr.v2.po;
it is only a quick example for tests, translation is not "release
quality".

The patch also tags two strings in the code (those that are in the
fr.po), for tests and such.


Let me know what you think, whether I should finish that thing up and
tag all remaining strings, etc.



REMAINING ISSUES
================

The list of languages supported by the spellchecker in TinyMCE is
hardcoded; it should be dynamically constructed to be what is
installed on this particular server. (Any volunteer to take care of
that, code-wise?)


Plugin descriptions will have to be converted to being translated
with gettext, too. (Any volunteer to take care of that, code-wise?)


Any other issue you see?


Best Regards,

-- 
Lionel


More information about the wp-hackers mailing list