Which is better:
gettext
custom MySQL+cache based functionality
Gettext is a sort of builtin feature, so I assume it's tweaked for performance. Using poedit is a pain and impossible to show to any client.
A custom functionality allows for simple translation interface. But might be heavy on php/db usage.
I suppose, which one would you use when?
Localization is difficult. It is really difficult. It's not just "pairs of words" => "Wortpaare", it's a lot more complex than that. What most people forget when they look at gettext and go "Ugh, ugly" is that the localization process is a lot more important than the technical details of the implementation. That's because the actual translators are typically not programmers and are probably not even in-house. This causes a lot more headaches than you may think. gettext is really old, is battle tested and has a huge toolchain behind it that is tuned to support this process. If you want to do i18n and l10n properly, you need a powerful system. gettext is that and has support from a wide range of tools. Your Homebrewed Translation System™ does not.
First of all, you need a robust system to extract translatable strings. Without being able to automatically and reproducibly extract translatable strings from source code, you have a mountain of work for each new string you want to translate. In gettext, xgettext does that.
Next, you need a tool to synchronize the extracted strings with already existing translations in a way that no translations are lost and that only slightly changed translations are kept if possible. In gettext, msgmerge does that.
Next, you want a way to add extra information to strings. You want to be able to group them by category, "domain" and context, you may want to add comments for the translator to the source code and you may want translators to be able to add comments to the translations. gettext supports all that.
Next, you want a file format that has good support from a variety of tools, since you may be sending your files to China to get them translated there. The reason you may be sending them away to external translators is also the reason you need a good synching tool to merge changes, since this can be a very asynchronous process. PO files are very well supported, because gettext is so old. There are many open source and commercial tools that support the localization process at many levels, depending on your specific needs.
Do not underestimate the task of localization, choose a tool that is well suited for the process and learn it. gettext is a great tool, if admittedly not the most beginner friendly.
For what it's worth, here's my gettext extension for Twig, which makes gettext for PHP even better.
Maybe you should look into Memcached which you can use it in combination with MySQL.
It's very useful for fetching data which doesn't change too often, like translations.
Gettext is a very old format. Its using files to store translations. Its clumsy, especially when you have translations by the thousands lets say 20,000. Managing a PO file with 20,000 translation strings is a nightmare, across 50 languages is imposible. Then you have to actually compile it in a MO file. No thanks. It might have made sense back in the 1990, not now.
Databases instead are powerful. Like really powerful. Name what you need and you can get it. In a second they can tell you exactly:
Which of the translation strings are not translated in which language
When was the translation first created and by whom
When was the translation last updated and by whom
Full history of every translation with person who has done the change
You can have all texts pre-translated in materialized views and get them with one select statement
Order translation strings in alphabetic order, in pages for page by page view and edit
Set which user can update exactly which translations
With some simple HTML web forms, anyone anywhere in the world can translate your application in real time, within seconds, and with full history, comment on every translation pair, recieve and read replies, flag translations for todo, etc, etc
Have analytics in seconds of who has made how many translations, in the last day, week, month, year - so you can give incentives out
Still want PO files? You can get these created by your database on a schedule that you need them
Missing translations? Your database can send automatic emails, SMS to the responsible translator for this language.
A translation has been updated by the translator? Now the database can send an email to the responsible reviewer to approve
Need the translation fast? Your database can call an API to have it translated right now, then send email to the responsible person to review
Related
I am the project manager on a website that needs to be converted into multiple languages. I am trying to figure out what the best option to go with is. I don't have a problem paying for something, but I just want to make sure it will work properly.
The options that I have thought of was to either (somehow) integrate google translate that when the user clicks on the language they want to read the page in, it updates the language for google to translate into. I did work with Google translate a little bit, but I found it to be little clumsy. Maybe I am not using it properly.
Another alternative I had, definitely not the best idea, but a backup if need be is to have the content put in a database and pulling the content dependent on the user's language. The only problem I have is that changing one word on the English version would have to change on every other language.
I am open to any other idea. I can clarify the project more, if need be.
As someone who speaks several languages, I can assure you that Google Translate often misses the mark. In many cases their translations are embarrassing, especially when you try to translate individual words or phrases without a sufficient context. Some language pairs are better than others, but overall this is not an option at this point.
Compiled languages have an advantage of static i18n, when a different version of a code is compiled for each UI language.
Database-driven dynamic i18n is a bad option, and almost all programming frameworks try to avoid it. I would recommend, therefore, that you look for an i18n solution that works with properties (text) files to lookup translated strings. In PHP this is gettext or intl.
Note also that i18n involves not only translation of text, but it also requires appropriate localization of dates, numbers, currencies, etc.
I don't have a problem paying for something, but I just want to make
sure it will work properly.
Based on that statement of yours I would like to suggest that hiring a firm that specializes in translation will be your best bet, then just put a multiple links that will lead to multiple languages of your website.
Problems that you might encounter:
Adjusting contents, some translations might be too short, some might be too long.
Using google translate can ruin your site, because sometimes it fails especially for some languages.
Though there are lot of similar questions already asked here, I didn't find the answer i was looking for..
What's the best way to develop multi-language application, It should be very fast.. and i don't know how much text i will translate.
Method 1: create and keep all the text in an array for every language i want to support and include that file everywhere.
Method 2: Use gettext (.MO, .PO files)
Method 3: Store all the translations in a text file and write a function to go through all the text and when matched display its value
Method 4: Store all the text and its translations in database, But i don't think it will be faster than storage in Filesystem.
Method 5: Same as method 1 but i will create multiple files per language just to keep everything structured.
Though all of these will work, Which do you guys think will be the fastest method and do let me know if i missed any method.
This is a complicated problem and its not always as obvious as you might think. You may in some cases, with right to left languages or particular cultural reasons, need to develop separate layouts for a particular region.
Regardless of which method you choose, you will want to cache all of or parts of your pages and use a cached version if available before regenerating the page again.
I would probably avoid 3 and 4. You don't want to be reading from the disk more than you have to. If you can cache translation arrays in memcached, you can save yourself disk access in loading translation tables.
As a person managing localization projects for developers, I have to say that both sides (translators and developers) have been very happy with Gettext (.po files). It's relatively simple to implement in your code (basically a wrapper around any text you want localized), it's seamlessly fast, and most importantly: it scales and updates flawlessly.
The main advantage is that lots of tools exist for creating, updating, managing, and translating .po/.pot files, including the cross-platform PoEdit. When you have dozens of languages to do, it's as easy as extracting the latest .pot file and sending that to the translation team. They'll return individual files for each language. I haven't seen many systems that can scan and locate new strings as easily or present them to translators for use as simply.
I would recommend looking at PHP Frameworks which support multiple languages. Investigate the most popular first, Zend, Symphony and Yii. I have used Yii before and this has multi language support.
http://www.yiiframework.com/extension/yii-multilanguage/
I want to add multiple languages to my site.
I read somewhere that I can use translator(Google or babelfish) but I don't like this way.
Can anyone suggest me different ways?
You could learn the language and translate it yourself. Besides that you will need to use a translator.
You'll want to read up a bit on internationalization and localization (often referred to as i18n L10n). You'll need code to support serving your various translatinons, based on your users' preferences. You'll also want to give some thought to handling things like date and currency formats.
As far as PHP tools, you've got the gettext stuff, which can be compiled in to PHP. Gettext works, but was designed to handle translating interface text for locally-installed software -- it doesn't transition to web sites/apps terribly well.
There's also Zend_Translate, which is a pretty good library, and can easily be used without most of the rest of the Zend Framework. You might want to look at Zend_Locale and Zend_Date, as the three can play together nicely.
You could integrate a translation interface to your site and let the users of your site create their own translation. This way, you get the translation for free.
Or, as an alternative, you could open your website logic to a community (i.e. make it open source) and let it translate by them...
Another way would be to hire someone to translate it into their language :)
if you have members in your site, do what FB is doing ..
they ask the members to help translating to their language, they put the phrases for them, and collect the translations + votes (whether the translation is good or there's better translation).
I've been surprised by how little I've found on externalizing strings in PHP. Does everyone use gettext, or is there some other framework or tool that I'm not aware of?
Zend_Translate / Zend_Locale are nice and very flexible. They do not need the whole Zend Framework to be present. They support gettext moo/.po files but also CSV and other formats.
Hope this library helps you:
The i18n package is a punch of
classes for internationalization. It
gives you the possibility to maintain
multilanguage webpages more easily.
The translation strings are stored in
flat text files, special Gettext files
which are basically precompiled
translation files or in a MySQL
database. And it works independently
from PHP’s setlocale function.
I would say that you should use gettext because it is mature and easy to setup. Also BU using gettext you will be able to extend its useage for other type of sources than php. Consider the PO file format the standard for this.
Im working in i18n area for many years and I can tell you that gettext will provide you best results with minimal efforts if you have more than 50-100 strings in your project.
Once you've set the foundation for localizing your application, if you find yourself needing to manage and / or just get the actual translation done we have (what I like to think, obviously :) a pretty cool tool called String - http://mygengo.com/string
String is great for not just managing translations, where you can invite others to projects to help with translation, but you can order translations right in the service too. We've integrated our API into String to showcase our API and the ability to see status updates for numerous (100s...1000s) of jobs, translated by real people!
If you're interested in the API itself, we held a bounty contest not long ago with some fun winners for a number of platforms (Wordpress, Django, etc.): http://mygengo.com/services/api/lab/winners/
Just thought I'd share.
From your experience, is it better to use 1 language file or multiple smaller langauge files for each language in a PHP project using the gettext extension? I am not even sure if it is possible to use multiple files, it is hard for me to test since the server caches the language files.
I am doing multiple languages on a social network site, so far just the signup page which is about 1 out of 200 pages to go and it has 35 text strings to translate, at this pace the language file for each language wold be really large so I was thinking maybe it would be better to do different language files for differnt pages or perhaps sections like forums section and blogs section but if it makes no difference then I would ratther not waste my time in making multiple smaller files for each language.
I realize every situation is different and the only real answer is to test it but I am hoping to avoid that this time and just get some oppinions of people more experienced, this is my first time using gettext, thanks
I would have the language files module based. With gettext you need to specify locale for each language. It would fit best to have a separate .po/.mo files for each module or big parts of your site.
That's my opinion. :-)
I typically automate the process and have multiple languages in multiple files by using a database to edit the site (using a simple db lookup). This lets me hire translators to come in and verify the current translation easily. Deploying to production then is simply turning the database into a set of language files.
From experience i would break the languages down on a per file basis as the management overhead becomes heavy and there is great scope for duplication and mistakes.
The other advantage it that by using a directory structure and naming convention the correct language can be selected programatically more easily than the large file and it is easier to write management tools at a later stage in the project.
It is also worth looking at some of the formats other people use. Many of the Frameworks use this sort of structure, Dashcode, Symfony, Zend etc. And there is an xml format xliff which is built to handle translation and integrates with many of the tools that translators use.
Multiple files are the best way to go, but things can get disorganized.
We've just launched a free new service called String which solves most of the problems of managing multiple language files - like a basecamp for localization. You can either import existing files, or start from scratch with keys and strings in the system. When you're ready, you can export the files again to run your app. It works with PHP (array), PHP (define), po, yaml, ini and .strings formats.
String allows you to collaborate with translators easily - you just invite them to a project and set their language permissions. Translators can leave comments and questions on each string if they need more info - and you can revert strings back using the History function if things aren't quite right.
Anyway enough sales pitch!
Check it out at http://mygengo.com/string - we'd love your feedback.