From your experience, is it better to use 1 language file or multiple smaller langauge files for each language in a PHP project using the gettext extension? I am not even sure if it is possible to use multiple files, it is hard for me to test since the server caches the language files.
I am doing multiple languages on a social network site, so far just the signup page which is about 1 out of 200 pages to go and it has 35 text strings to translate, at this pace the language file for each language wold be really large so I was thinking maybe it would be better to do different language files for differnt pages or perhaps sections like forums section and blogs section but if it makes no difference then I would ratther not waste my time in making multiple smaller files for each language.
I realize every situation is different and the only real answer is to test it but I am hoping to avoid that this time and just get some oppinions of people more experienced, this is my first time using gettext, thanks
I would have the language files module based. With gettext you need to specify locale for each language. It would fit best to have a separate .po/.mo files for each module or big parts of your site.
That's my opinion. :-)
I typically automate the process and have multiple languages in multiple files by using a database to edit the site (using a simple db lookup). This lets me hire translators to come in and verify the current translation easily. Deploying to production then is simply turning the database into a set of language files.
From experience i would break the languages down on a per file basis as the management overhead becomes heavy and there is great scope for duplication and mistakes.
The other advantage it that by using a directory structure and naming convention the correct language can be selected programatically more easily than the large file and it is easier to write management tools at a later stage in the project.
It is also worth looking at some of the formats other people use. Many of the Frameworks use this sort of structure, Dashcode, Symfony, Zend etc. And there is an xml format xliff which is built to handle translation and integrates with many of the tools that translators use.
Multiple files are the best way to go, but things can get disorganized.
We've just launched a free new service called String which solves most of the problems of managing multiple language files - like a basecamp for localization. You can either import existing files, or start from scratch with keys and strings in the system. When you're ready, you can export the files again to run your app. It works with PHP (array), PHP (define), po, yaml, ini and .strings formats.
String allows you to collaborate with translators easily - you just invite them to a project and set their language permissions. Translators can leave comments and questions on each string if they need more info - and you can revert strings back using the History function if things aren't quite right.
Anyway enough sales pitch!
Check it out at http://mygengo.com/string - we'd love your feedback.
Related
Which is better:
gettext
custom MySQL+cache based functionality
Gettext is a sort of builtin feature, so I assume it's tweaked for performance. Using poedit is a pain and impossible to show to any client.
A custom functionality allows for simple translation interface. But might be heavy on php/db usage.
I suppose, which one would you use when?
Localization is difficult. It is really difficult. It's not just "pairs of words" => "Wortpaare", it's a lot more complex than that. What most people forget when they look at gettext and go "Ugh, ugly" is that the localization process is a lot more important than the technical details of the implementation. That's because the actual translators are typically not programmers and are probably not even in-house. This causes a lot more headaches than you may think. gettext is really old, is battle tested and has a huge toolchain behind it that is tuned to support this process. If you want to do i18n and l10n properly, you need a powerful system. gettext is that and has support from a wide range of tools. Your Homebrewed Translation System™ does not.
First of all, you need a robust system to extract translatable strings. Without being able to automatically and reproducibly extract translatable strings from source code, you have a mountain of work for each new string you want to translate. In gettext, xgettext does that.
Next, you need a tool to synchronize the extracted strings with already existing translations in a way that no translations are lost and that only slightly changed translations are kept if possible. In gettext, msgmerge does that.
Next, you want a way to add extra information to strings. You want to be able to group them by category, "domain" and context, you may want to add comments for the translator to the source code and you may want translators to be able to add comments to the translations. gettext supports all that.
Next, you want a file format that has good support from a variety of tools, since you may be sending your files to China to get them translated there. The reason you may be sending them away to external translators is also the reason you need a good synching tool to merge changes, since this can be a very asynchronous process. PO files are very well supported, because gettext is so old. There are many open source and commercial tools that support the localization process at many levels, depending on your specific needs.
Do not underestimate the task of localization, choose a tool that is well suited for the process and learn it. gettext is a great tool, if admittedly not the most beginner friendly.
For what it's worth, here's my gettext extension for Twig, which makes gettext for PHP even better.
Maybe you should look into Memcached which you can use it in combination with MySQL.
It's very useful for fetching data which doesn't change too often, like translations.
Gettext is a very old format. Its using files to store translations. Its clumsy, especially when you have translations by the thousands lets say 20,000. Managing a PO file with 20,000 translation strings is a nightmare, across 50 languages is imposible. Then you have to actually compile it in a MO file. No thanks. It might have made sense back in the 1990, not now.
Databases instead are powerful. Like really powerful. Name what you need and you can get it. In a second they can tell you exactly:
Which of the translation strings are not translated in which language
When was the translation first created and by whom
When was the translation last updated and by whom
Full history of every translation with person who has done the change
You can have all texts pre-translated in materialized views and get them with one select statement
Order translation strings in alphabetic order, in pages for page by page view and edit
Set which user can update exactly which translations
With some simple HTML web forms, anyone anywhere in the world can translate your application in real time, within seconds, and with full history, comment on every translation pair, recieve and read replies, flag translations for todo, etc, etc
Have analytics in seconds of who has made how many translations, in the last day, week, month, year - so you can give incentives out
Still want PO files? You can get these created by your database on a schedule that you need them
Missing translations? Your database can send automatic emails, SMS to the responsible translator for this language.
A translation has been updated by the translator? Now the database can send an email to the responsible reviewer to approve
Need the translation fast? Your database can call an API to have it translated right now, then send email to the responsible person to review
Though there are lot of similar questions already asked here, I didn't find the answer i was looking for..
What's the best way to develop multi-language application, It should be very fast.. and i don't know how much text i will translate.
Method 1: create and keep all the text in an array for every language i want to support and include that file everywhere.
Method 2: Use gettext (.MO, .PO files)
Method 3: Store all the translations in a text file and write a function to go through all the text and when matched display its value
Method 4: Store all the text and its translations in database, But i don't think it will be faster than storage in Filesystem.
Method 5: Same as method 1 but i will create multiple files per language just to keep everything structured.
Though all of these will work, Which do you guys think will be the fastest method and do let me know if i missed any method.
This is a complicated problem and its not always as obvious as you might think. You may in some cases, with right to left languages or particular cultural reasons, need to develop separate layouts for a particular region.
Regardless of which method you choose, you will want to cache all of or parts of your pages and use a cached version if available before regenerating the page again.
I would probably avoid 3 and 4. You don't want to be reading from the disk more than you have to. If you can cache translation arrays in memcached, you can save yourself disk access in loading translation tables.
As a person managing localization projects for developers, I have to say that both sides (translators and developers) have been very happy with Gettext (.po files). It's relatively simple to implement in your code (basically a wrapper around any text you want localized), it's seamlessly fast, and most importantly: it scales and updates flawlessly.
The main advantage is that lots of tools exist for creating, updating, managing, and translating .po/.pot files, including the cross-platform PoEdit. When you have dozens of languages to do, it's as easy as extracting the latest .pot file and sending that to the translation team. They'll return individual files for each language. I haven't seen many systems that can scan and locate new strings as easily or present them to translators for use as simply.
I would recommend looking at PHP Frameworks which support multiple languages. Investigate the most popular first, Zend, Symphony and Yii. I have used Yii before and this has multi language support.
http://www.yiiframework.com/extension/yii-multilanguage/
I'm in the process of creating a web app in PHP which will be available in many different languages (about 10 in total), and I'd like to know what you view as best practice for setting this up in more general terms.
My idea is to keep all languages under the same domain, with a suffix such as "http://myservice.com/de", where a script performs a location check upon site entering and redirects the user.
Editorial content will be shared between all languages as single posts in the database with a specific data column for each language.
Markup and scripts will all be documented in English, while pages and sections visible for the user will be translated into their respective language gathered from a common word library file.
A .htaccess file provides handling all rewrites for articles to display them in their appropriate language, i.e. "http://myservice.com/de/artikel/12345/" to "http://myservice.com/article?id=12345&lang=de".
What do you consider to be a clean and efficient multi-lingual setup?
Everybody has different opinions about how best to go about setting up an internationally-friendly website. However, I try not to reinvent the wheel by making my own system. Rather, I use the built in internationalisation and localisation tools in frameworks such as CakePHP.
From the CakePHP book;
One of the best ways for your applications to reach a larger audience is to cater for multiple languages. This can often prove to be a daunting task, but the internationalization and localization features in CakePHP make it much easier.
First, it’s important to understand some terminology. Internationalization refers to the ability of an application to be localized. The term localization refers to the adaptation of an application to meet specific language (or culture) requirements (i.e., a "locale"). Internationalization and localization are often abbreviated as i18n and l10n respectively; 18 and 10 are the number of characters between the first and last character.
http://book.cakephp.org/1.3/view/1228/Internationalization-Localization
Using the built-in tools, for me, offers an efficient way to translate applications without URL rewrites. It also means that a user can configure their localisation preferences and have them automatically applied every time they log in.
Such a method will also be considered more search-engine friendly because you won't get multilingual duplicates of the same content.
Hope this helps out.
The best advice i can think of is dont do this yourself
An existing open source CMS (Content Management System) might be a good solution, rather than building one yourself. Naming two leading CMS systems: Drupal, Joomla. (there any MANY more options)
These systems offer many features that work either out of the box with some configuration, of by an extension plugin (thousands of plugins).
Internationalization is just one of them. probably with a richer and more robust feature set than you can do yourself.
also, these systems offer a extensive API for extending them with your own business logic.
If you use ASP.NET (MVC 2 or 3) I suggest to read this article. I think it is one of the best practices in .NET
I am currently working on a project / website and I will need to make it available in several languages. The site was done with PHP / mysql and a lot of javascript (jQuery). I have no idea where to start and I was hoping somebody could give me some hints. I would like to know opinions about what is the best approach to take, if there are some good tools for such a php site, what to do with the existing scripts, or better, with the text inside of the scripts that need to be translated as well. Does anybody had to do something like this before that could guide me through the right path :) ??
thanks
There are a number of ways of tackling this. None of them "the best way" and all of them with problems in the short term or the long term. The very first thing to say is that multi lingual sites are not easy, translators and lovely people but hard to work with and most programmers see the problem as a technical one only. There is also another dimension, outside the scope of this answer, as to whether you are translating or localising. This involves looking at the target audiences cultural mores and then tailoring language, style, layout, colour, typeface etc., to that culture. Finally do not use MT, Machine Translation, for anything serious or if it needs to be accurate and when acquiring translators ensure that they are translating from a foreign language into their native language which means that they understand all the nuances of the target language.
Right. Solutions. On the basis that you do not want to rewrite the site then simply clone the site you have and translate the copies to the target language. Assuming the code base is stable you can use a VCS to manage any code changes. You can tweak individual parts of the site to fit the target language, for example French text is on average 30% larger than the equivalent English text so using one site to deliver this means you may (will) have formatting problems and need to swap a different css file in and out depending on the language. It might seem a clunky way to do it but then how long are the sites going to exist? The management overhead of doing it this way may well be less than other options.
Second way without rebuilding. Replace all content in the current site with tags and then put the different language in file or db tables, sniff the users desired language (do you have registered users who can make a preference or do you want to get the browser language tag, or is it going to be URL dot-com dot-fr, dot-de that make the choice) and then replace the tags with the target language. Then you need to address the sizing issues and the image issues separately. This solution is in effect when frameworks like Symfony and Zend do to implement l10n.
Then you could rebuild with a framework or with gettext and and possibly have a cleaner solution but remember frameworks were designed to solve other problems, not translation and the translation component has come into the framework as partial solution not the full one.
The big problem with all the solutions is ongoing maintenance. Because not not only do you have a code base but also multiple language bases to maintain. Unless you all in one solution is really clever and effective then to ongoing task will be difficult.
I am building a website and it need to be in 7 languages?
I was wondering if there is a good practice can be applied to get multilingual php script?
Easy for me
Easy for the translators
Also what do you think , should I Store it in DB , XML or in PHP file?
There are plenty of options for storing translations:
TMX: A relatively new XML format for translations. Seems to be gaining in popularity.
Gettext is another open format for translations. Been the de-facto standard for a long time.
ini files - easy to edit, very simple format
PHP files (arrays) - easy to edit for PHP programmers, good performance
CSV format - relatively simple to use.
I'd suggest you use something like Zend_Translate which supports multiple adapters and provides a basic approach to embedding translations in your application.
Contrary to daddz I would recommend against using gettext in PHP:
The locale setting is per-process. This means that when you are working with a multithreaded apache or any other multithreaded webserver running PHP in-process, calling setlocale in one thread will affect the other threads.
Because you can't know which thread/process is handling which request, you'll run into awful problems with users intermittently getting the wrong locale.
The locale you set in PHP has influence on functions like printf or even strtotime. You will certainly get bit by "strange" number formats arriving in your backend code if you work with gettext/setlocale
Use any of the other solutions lined to by Eran or quickly do something yourself (PHP arrays work very nicely). Also use the intl-extension which will be in core PHP 5.3 for number and date formating and collation.
Using gettext on a web based solution over and over proved to be quite like opening the proverbial can of worms.
I'd suggest Gettext.
It's cross-platform, open-source, widely used and available for php: PHP Gettext
I have built multilingual CMS. All content was stored in a database, with main tables for common (not language specific values) and separate tables for the language specific content.
For instance, let us imagine storing products - we have a 'products' table (contains unique_id, date created, image urls etc etc) and a 'product_local' table (contains any language specific fields).
Using this method it is very easy to maintain content.
I have no experience on gettext so no comment on that topic, but I have built a few multi-lingual sites using the following methods:
METHOD 1
I wouldn't say my format is the best, just that it's effective. I've also used array. Depending on where the content is stored.
For example, I'll have an associative array of text with the indexes identifying which text:
$text['english']['welcome'] = "Welcome to my site. blah blah blah";
$text['english']['login'] = "Please enter your username and password to login";
And maybe set your language with a constant or config variable.
METHOD 2
I've built two sites with identical structures and back-ends but each one used a different database and were maintained separately: data_french, data_english.
You may find this article on the topic an interesting read:
http://cubicspot.blogspot.com/2011/12/cross-platform-multilingual-support-in.html
The author advocates a "lazy programmer" strategy - do it only if you need multilingual stuff - and seems to recommend the PHP array approach with IANA language codes. The article is kind of vague though.
Check this forum. I think you'd probably need a different approach if you have somebody helps you with translation.
Most efficient approach for multilingual PHP website