Most efficient approach for multilingual PHP website - php

I am working on a large multilingual website and I am considering different approaches for making it multilingual. The possible alternatives I can think of are:
The Gettext functions with generation of .po files
One MySQL table with the translations and a unique string ID for each text
PHP-files with arrays containing the different translations with unique string IDs
As far as I have understood the Gettext functions should be most efficient, but my requirement is that it should be possible to change a text string in the original reference language (English) without the other translations of that string automatically reverting back to English just because a couple of words changed. Is this possible with Gettext?
What is the least resource demanding solution?
Is using the Gettext functions or PHP files with arrays more or less equally resource demanding?
Any other suggestions for more efficient solutions?

A few considerations:
1. Translations
Who will be doing the translations? People that are also connected to the site? A translation agency? When using Gettext you'll be working with 'pot' (.po) files. These files contain the message ID and the message string (the translation). Example:
msgid "A string to be translated would go here"
msgstr ""
Now, this looks just fine and understandable for anyone who needs to translate this. But what happens when you use keywords, like Mike suggests, instead of full sentences? If someone needs to translate a msgid called "address_home", he or she has no clue if this is should be a header "Home address" or that it's a full sentence. In this case, make sure to add comments to the file right before you call on the gettext function, like so:
/// This is a comment that will be included in the pot file for the translators
gettext("ready_for_lost_episode");
Using xgettext --add-comments=/// when creating the .po files will add these comments. However, I don't think Gettext is ment to be used this way. Also, if you need to add comments with every text you want to display you'll a) probably make an error at some point, b) you're whole script will be filled with the texts anyway, only in comment form, c) the comments needs to be placed directly above the Gettext function, which isn't always convient, depending on the position of the function in your code.
2. Maintenance
Once your site grows (even further) and your language files along with it, it might get pretty hard to maintain all the different translations this way. Every time you add a text, you need to create new files, send out the files to translators, receive the files back, make sure the structure is still intact (eager translators are always happy to translate the syntax as well, making the whole file unusable :)), and finish with importing the new translations. It's doable, sure, but be aware with possible problems on this end with large sites and many different languages.
Another option: combine your 2nd and 3rd alternative:
Personally, I find it more useful to manage the translation using a (simple) CMS, keeping the variables and translations in a database and export the relevent texts to language files yourself:
add variables to the database (e.g.: id, page, variable);
add translations to these variables (e.g.: id, varId, language, translation);
select relevant variables and translations, write them to a file;
include the relevant language file in your site;
create your own function to display a variables text:
text('var'); or maybe something like __('faq','register','lost_password_text');
Point 3 can be as simple as selecting all the relevant variables and translations from the database, putting them in an array and writing the serlialized array to a file.
Advantages:
Maintenance. Maintaining the texts can be a lot easier for big projects. You can group variables by page, sections or other parts within your site, by simply adding a column to your database that defines to which part of the site this variable belongs. That way you can quickly pull up a list of all the variables used in e.g. the FAQ page.
Translating. You can display the variable with all the translations of all the different languages on a single page. This might be useful for people who can translate texts into multiple languages at the same time. And it might be useful to see other translations to get a feel for the context so that the translation is as good as possible. You can also query the database to find out what has been translated and what hasn't. Maybe add timestamps to keep track of possible outdated translations.
Access. This depends on who will be translating. You can wrap the CMS with a simple login to grant access to people from a translation agency if need be, and only allow them to change certain languages or even certain parts of the site. If this isn't an option you can still output the data to a file that can be manually translated and import it later (although this might come with the same problems as mentioned before.). You can add one of the translations that's already there (English or another main language) as context for the translator.
All in all I think you'll find that you'll have a lot more control over the translations this way, especially in the long run. I can't tell you anything about speed or efficiency of this approach compared to the native gettext function. But, depending on the size of the language files, I don't think it'll be a big difference. If you group the variables by page or section, you can alway include only the required parts.

After some testing I finally decided to go more or less with the lines of Alecs' combination of the second and third alternative.
Gettext problem
I tried to set up the whole gettext-system first to try it out, but it turned out to be much more complicated then I thought. The problem is that Windows and Unix systems use different language shortnames for setlocale(). For the moment I'm running my dev-server on Windows with Wamp, while the final site will run on Linux. After I went through a couple of dozen guides, forums, questions etc. and restarting the server after each modification. I couldn't get it setup properly in any easy way it seemed. Additionally gettext is not threadsafe, to update the language file the server needs to be restarted or a hack needs to be used, there is no easy way of handling different versions of language files or handling the original English text without modifying the source or using Mikes suggestion, which as Alec pointed out isn't optimal.
Solution
So I ended up with what I think is the best solution based on Alecs response:
Save all the translations in a DB with the fields; language, page, var_key, version, revision and last_modified_time - where the version is corresponds to versions of the original translation (English), while revision allows the translator to modify/correct the finalized translations within a version.
Use a kind of CMS for translation, which is connected to the DB and handles different versions and allows for an easy overview of which languages are translated, in which version and how complete the translations are.
When a revision of a version is finalized a cache files are generated - each file contains an array with only the var_key and text-translation for one language and one page and are named with the ISO 639-1 names of the languages and the page name like: lang/en_index.php These language files are then simply included and wrapped in a function t($var_key) which allows for using the DB during the development, while then changed to only use the cache files.
Performance
I never got around to test gettext, but according to the link Mike posted the difference in performance between using an array and gettext is totally acceptable for me for the benefits which a custom system gives as described above. However, I compared using an array with 20 translated text-strings in an array compared to retrieving the same 20 text-strings from a MySQL DB. It turned out that using an array included from a file was aeound 6 times faster than retrieving all the 20 strings at the same time from the MySQL DB. It was no really scientific benchmark and the results may surely vary on different systems and setups, but it clearly shows exactly what I expected - that it would be much slower using a DB than using an array directly, which is why I choose to generate cache-files for the array instead of using the DB.
As a comparison I also tested how fast it was to only output simple echos with the same text. It turned out to be around 20 times faster than using arrays from an included file, but well - then it is not possible to translate without having different versions of the page for different languages, which defies the purpose of dynamic pages. Then it is better to also use a good cachesystem.
Performance test source files:
PHP: http://pastie.org/964082
MySQL table: http://pastie.org/964115
It is surely not perfect, but at least creates an idea about the performance differences.

Rather than having to use the English text as the keys you could arbitrarily do this but also provide english translations i.e.
gettext key is 'hello'
You then have your various language translations of this and an english translation of this that is also 'hello', then if you want to update the english version of the string you can leave the key alone and just update the english translation.

Related

Templating different languages in a PHP webapp

I'm writing a webapp in german, so all buttons,text,tooltips etc. are in german for now. But I want to use some kind of template file for the webapp so I can quick change to another language if needed. I thought about textfile that I explode with "\n" and load into a sessionvariable to have always all text the user will need in his session. An other approach would be to parse such a file i.e. a XML document like this:
<?xml version="1.0"?>
<phrase>
<placeholder></placeholder>
<value></value>
</phrase>
where every field has it's own name/value that represents a textsnippet or button or whatever on the website, and then cast it into an object an cache it for everyone. I think the second approach is the best for working with multiple languages for a webapp. Anybody perhaps some pointers what I could do even better, or just post how you did this kind of languagetemplating for mutlinational webpages/webapps in the past.
Since you are looking for a translation solution, I understand you don't use a framework to develop your site, since most of them provide you with solutions to handle translations.
Most frameworks and apps I've seen in php use arrays, where the original sentence is the key and the translation is the value. So, to make easier to translate it to several languages, the key is in english.
In case you use gettext as suggested, or another aproach, it'll be useful also to parse your code to catch all strings to be translated automatically, since it can be a mess doing it manually when the base code grows and you want to keep up to date your translations.
Take a look at GNU Gettext, its very handy for multilanguage support.
The main idea is that you just wrap your words or phrases into a function, like
echo _g('Hello');
so you do not have any engine changes. You will have to add translation files for each language you are using.
You've come up with 2 solutions for storing the data, but I suggest you need to think further about the architecture and take a more complete view of the lifecycle of each request.
Regarding architecture: neither solution scales up to describe an extensive vocabulary very well - although for one or 2 pages it will suffice. The alternative approach, to manage a translation database (such as gettext) which might be overkill - and performs less optimally with small numbers of pages but importantly performance does not deteriorate significantly with large/multiple dictionaries. A compromise solution might be to have a dataset for each URL/language (which might be extracted from a consolidated database).
If it were me, I would not use either method you proposed for storing the data: parsing XML creates a sginficant overhead for each page request : using \n as a delimiter precludes the use of \n within a translation. Using a serialized PHP array seems to be the least expensive solution.

What's the best way to add i18n to a web application?

I'm looking to i18n-ize a web app. The site will be constantly changing: text will be rewritten, new stuff added etc. The web app is written in PHP, but the same applies to any language.
Basically I want:
1) The code to be readable and maintainable
2) Translators to be sent an email when new stuff is added in English OR the English is changed
3) To know whether something is up-to-date or not.
4) Translators be able to update things online
I guess the best idea is to store everything in a database and handle things that way rtaher than PO files and gettext. But what's the best way:
$lang('contactus') has the disadvantage of being unreadable (code-wise), and slower to develop (as all English needs to be given a unique key and stored in the database)
$lang('Please contact us for more information') is readable and quicker, but if the English changes (typo, grammar edit, updated) then the translation disappears entirely.
How do other apps/frameworks handle it?
Avoid storing strings that need to be translated in the database. Did that myself and regretted it. Use external files that you can send to translators. Use strings in your preferred language as the keys. So as for your example $lang('Please contact us for more information') is the way to go. A text change in the English probably means a corresponding change in the translations so yeah it's a maintenance headache.
Plus there's more to it than just string translations. There are currency symbols and formats, number formats (decimal, digit grouping symbols, where the symbol appears), date formatting. Name formatting - in some locales people often have middle names in others they usually don't. Text reading left to right vs right to left as well. ugh.
For a web app it's sometimes just much simpler to have the web pages separated into language-locale directories and deal with it that way. You do have all you business logic separate from the view code right? This is where embedding business logic directly within html ala the typical PHP approach really starts to hurt.
Make some translation function with an easy to type name, for example t($key) or use existing solution (for example Zend_Translate) or PHP gettext, that provides the _($key) function.
The format of your translation files should not matter so much - whatever fits your translation process best.
Use your original language, for example English, for the key. For longer strings it makes sometimes sense to invent artificial keys, for example "introduction_text_1"
Never use constants or any shortened translation keys, don't use the database, avoid using just arrays. The only professional way to handle translations is a gettext(-like) function.
You should only use _("Original english text queries"). This has the advantage that at least the default language text is still available, should translation data/files be inaccessible.
Don't worry about changes in the text strings. In reality this is rare. If you use the gettext syntax, there are even tools to adapt the language files automatically then. Not many good tools, mind you. But more than for homebrew translation methods, and it gets the job done. If your PHP interpreter doesn't support native gettext, search for the "php-gettext" emulation or "upgradephp".

PHP Localization Best Practices? gettext?

We are in the process of making our website international, allowing multiple languages.
I've looked into php's "gettext" however, if I understand it right, I see a big flaw:
If my webpage has let's say "Hello World" as a static text. I can put the string as <?php echo gettext("Hello World"); ?>, generate the po/mo files using a tool. Then I would give the file to a translator to work on.
A few days later we want to change the text in English to say "Hello Small World"?
Do I change the value in gettext? Do I create an english PO file and change it there?
If you change the gettext it will consider it as a new string and you'll instantly loose the current translation ...
It seems to me that gradually, the content of the php file will have old text everywhere.
Or people translating might have to be told "when you see Hello World, instead, translate Hello Small World".
I don't know I'm getting confused.
In other programming languages, I've seen that they use keywords such as web.home.featured.HelloWorld.
What is the best way to handle translations in PHP?
Thanks
You basically asked and answered your own question, the answer might just be having a slightly better understanding of how PO files work.
Within the PO file you have a msgid and a msgstr. The msgid is the value which is replaced with the msgstr within the PHP file depending on the localization.
Now you can make those msgid's anything you would like, you could very well make it:
<?php echo _("web.home.featured.HelloWorld"); ?>
And then you would never touch this string again within the source, you only edit the string through the PO files.
So basically the answer to your question is you make the gettext values identifiers for what the string should say, however the translators typically use the default language files text as the basis for conversion, not the identifier itself.
I hope this is clear.
I know an answer has been accepted, and the above answer is good. But there is another issue with using permanent machine-style keys like thing.stuff.widget when working with Gettext.
While using permanent keys is a better approach to development, Gettext is not set up for that style of working and this can complicate your workflow.
If you present a translator with a PO file populated with keys in place of source text, they may not know what the English should be. So you'd have to provide them with a second file containing source language translations for them to compare to. Not the end of the world, but more fiddly for them and not how Gettext was designed. (square peg, round hole etc..)
I think PO is perfectly fine as a file format for translations in PHP, and especially recommended if you're not working with a framework that has a good l10n module, but that doesn't mean it's good for workflow and your translation process.
I suggest you arrive at a workflow that allows your programmers to work with permanent keys, your translators work with words, and gives you a MO file out the other end. Take a look at Loco for one solution to this.
Alternatively use a different interim file format that allows the use of keys and words. TMX is one example. If you still want to use Gettext at runtime you can convert the files.
Currently, I am dealing with the same issue. The common practice with gettext is to use the English text as the key. Recently, our copy editor changed whole bunch of English text (other languages are hardly touched) so we have to change all the source code all the PO files.
We are switching to a neutral key. Since we already have some sites on Java. We will use the same property name format.

Most effective way working with multiple natural languages

I am currently working with a codeigniter PHP based application and have come to the point where it's about to go off with multiple languages.
Is codeigniters own language class the most effective way to handle languages?
Is there any specific language-tools/libraries that are commonly used in PHP apps?
Thanks!
I've never used CI_Language but it appears to use language arrays to do the translation.
Overly simplified example of this type method:
$trans = array(
'MAIN_TITLE' => 'Title Here'
);
echo $trans['MAIN_TITLE'];
Personally I find this really annoying because you're then editing views that are cluttered with array key names instead of useful text. Which can be quite annoying at times. Not to mention you have to remember which keys correlate to which strings if you are using them in multiple places.
I use Gettext which I find much easier. You just have to wrap your strings with the translate method: _(). Then once you're done with your app, you open up PoEdit and create the new language file. PoEdit will parse all of my source files looking for strings wrapped like this <?php echo _('Title here') ?> and insert them into the .po language file. You can then go string by string and translate the text easily within PoEdit. The benefit of this is you have the source translation right there within PoEdit, instead of a meaningless array key name in some include file
This all makes my life much easier in that I can update my language files every Friday with one click. Any new or modified translations will automatically be added to my language file, and any unused translations will automatically be removed. I send the files off to my 3 international branchs for translation, and my changes and updated language files are ready to be deployed Monday morning
You may want to have a look into php intl library. http://php.net/intl

What's the most efficient way to setup a multi-lingual website

I'm developing a website that will be available in different languages. It is a LAMP (Linux, Apache, MySQL, PHP) setup, and it makes use of Smarty, mostly for the template engine.
The way we currently translate is by a self-written smarty plugin, which will recognize certain tags in the HTML files, and will find the corresponding tag in an earlier defined language file.
The HTML could look as follows:
<p>Hi, welcome to $#gamedesc;!</p>
And the language file could look like this:
gamedesc:Poing 2009$;
welcome:this is another tag$;
Which would then output
<p>Hi, welcome to Poing 2009!</p>
This system is very basic, but it is pretty hard to control, if I f.e. would like to keep track of what has been translated so far, or give certain users the rights to translate only certain tags.
I've been looking at some alternative ways to approach this, by either replacing the text-file with XML files which could store some extra meta-data, or by perhaps storing all the texts in the database, and retrieving it there.
My question is, what would be the best way to make this system both maintainable and perform well with high user-traffic? Are there perhaps any (lightweight) plugins I could take a look at?
You could give a shot at gettext. It is the way it is done in most C/C++ linux applications and it is an extension to PHP too. The idea is not very different from what you're already doing, but there are tools that ease the mantainance of translations (i.e. poedit).
For user rights to translations, gettext won't be of much help, I think you'll need to do it on your own or look at some frameworks if they have smarter solutions.
Maybe taking a look to gettext lib could help you get some hints http://php.net/manual/en/book.gettext.php hope it helps!
You will need to have a table in your database that you can use to store strings of text, each with an composite ID. the composite ID will be made up of language ID and text node ID.
You will need to give the user a chance to select a preferred language. You should make sure that you either have a default "this has not been translated" for every language you use, or a default language that your entire site can be vied in.
For every bit of text with in your web site, rather then store the text with in the page, you just assign it an ID.
When serving the page, look up the text node ID and preferred language ID and load that string of text, or the string for the default.
in our project, http://pkp.sfu.ca/ojs, we use XML files to store translation key-value pairs. Browse our code: http://github.com/pkp/pkp-lib/blob/master/classes/i18n/PKPLocale.inc.php
We use that class to read the XML files for each locale and in our code we use Locale::translate('locale.key.name');. Similar to gettext, but using an XML file for easier updating.
Looking around at web stuff today I came across this website: http://translateth.is/
It looks simple to use... copy paste in some javascript.

Categories