i need to translate my site in multiple languages. i was thinking to use a database called language and put the translation there.
database : translation
tables: language
column: id, english, french, german, italian, spanish
or i was thinking about a php solution like:
english.php
french.php
german.php
italian.php
spanish.php
so you simply include the file you need.
now, i can see pros and cons for both, what i want to know is what is consider the standard in the industry to do something like this?
You can use gettext, this function is proposed for this feature, not a "standard" but fast enough.
The second options in the use of a PHP file with a big array (really big, for each string), this is the most common solution.
To the database content (the big problem here, don't forget), if all your content must have the translation, one column for each language, otherwise use a flag of language for each line on database.
There is no industry standard. I have seen (and implemented) solutions using flat files, XML, PHP code, a database, and gettext files to store the localized strings. It's a matter of what is more suitable for you.
My go-to method for PHP is simply files containing arrays of strings, for example
en.php
return array (
'How are you?' => 'How are you?',
'Goodbye' => 'Goodbye',
);
de.php
return array (
'How are you?' => 'Wie gehts?',
'Goodbye' => 'Auf wiedersehen',
);
This can be integrated into an application with reasonable granularity (there can be many such files, e.g. one for each component) and control (you can easily fall back to any other language if you don't find a string) and it is also very convenient to modify without need for special tools.
My favorite PHP framework (Yii) and a giant open source project I have worked on (Moodle) also use this approach.
Noone of the two solutions seems great to me. You should think in the long run when you think a solution.
What if you choose to translate your website in other languages different from those you thought as russian or chinese? In the first case you have to add more and more columns, in the second you've to create more and more file. Another cons is what if you translate a page in italian and spanish but not yet in french?
I think that a good thing is to have a database based solution and a main language. Now you can do something like this:
Create a table 'page' (id, title, ...) where you'll store the page in the main language and where you'll have the info of the translated page too
Create a table 'translation' (idsource, idtranslation, language)
Everytime check the available translations and give those to the users
In database localization you have four main strategies. Each has particular advantages and disadvantages. For the long term I would definitely recommend cloning. You can see the four methods at the link below:
http://www.sisulizer.com/localization/software/server-desktop-database.shtml
There are two main ideas you want to be sure to be implementing. The first, be sure you are integrating some form of translation memory. Your language vendor should be instructing you on how to do this and probably doing it for you.
The second, for each additional language you target, your data will get at least 2x more complex. Keep this in mind as you move forward. Not only your data, but your file sets, management, etc.
Hope that helps. Let me know if you have further questions.
Russell
Related
I faced this problem several times while building websites.
I will explain the using PHP and Laravel as an example but this problem is a common amoung multiple platforms.
This was already addressed in a few questions (post1, post2,post3, post4 and some others) but the posts didn't really get a good answer.
The question is: What is the best way of structuring translated content inside of language files?
I'm currently using Laravel (I'm not mentioning the version because both Laravel 4 and Laravel 5 have similar localisation functionalities, at least similar enough for the purpouses of this topic).
The localisation structures the content accross language files (en, es,de, fr...) inside which there can be multiple .php files that contain a return statement that returns a multi-level dictionary structure.
/lang
/en
messages.php
/es
messages.php
and the files contain something like this:
<?php
return [
'example1' => 'example message for value exaple-key',
'example2' => [
'sub-example' => 'example message for example1.sub.example',
],
];
and calling of this is done by doing something like this:
//Laravel 5
trans('messages.example1'); //outputs 'example message for value exaple-key'
trans('messages.example2.sub-example'); //outputs 'example message for example1.sub.example'
//Laravel 4
Lang::get('messages.example1'); //outputs 'example message for value exaple-key'
Lang::get('messages.example2.sub-example'); //outputs 'example message for example1.sub.example'
A few methods of grouping come to mind:
by website content
example: homepage.php, page1.php, page2.php...
by logical domain:
example: auth.php, validation.php, pagination.php...
by html:
example: buttons.php, popup_messages.php, form_data.php...
by straight traslation:
example: simple_words.php, phrases.php... and than contain content like 'password-to-short' => 'your password is to long'
Some hybrid/combination of the ones mentioned before
All of these have some obvious benefits and drawbacks and I won't try to go int that but the 5th option is most likely the best solution but there's still the problem of where to draw the line to get minimal duplication of phrases and content.
Annother problem is how to solve the problem of uppercase first characters in some cases and lowercase in other cases as well as punctuation characters at the ends.
I did reaserch regarding this problem but there are no definitive guidelines and/or good examples available to learn from.
All opinions are welcome.
I tend to group functionality in my Laravel apps into self-contained ‘components’. For example, I’ve been working on email campaign functionality for an application recently so put the service provider class, models, service classes in a folder at app/Email.
Bearing this in mind, I organise my translations in a similar fashion. So even though on this project we’re not translating strings, if we were I would create a resources/assets/lang/en/email.php file, and put translated strings for the email component in there.
So in another project, my directory structure might look like this:
/resources
/lang
/en
auth.php
email.php
events.php
news.php
pagination.php
passwords.php
validation.php
Hope this helps.
In my experience there is no reason to have different groups other than trying to use your translations somewhere else. I usually put all my project messages in a group named app and for each of my shared libraries I use a separate group name (because I might use them in other projects).
An example of a a failure login message in my website would be
trans('app.username_and_password_do_not_match')
and if it's in a third party library named Auth it would be
trans('auth.username_and_password_do_not_match')
And remember to write the full message as your message key instead of using short names (like app.login.fail). this way you don't need to check the website content for every translation.
I didn't fully understand your last problem so you might want to clarify it a bit.
I would go with option #4, so you'd have something like this:
/lang/
/en
messages.php
words.php
/fr
message.php
words.php
/de
messages.php
words.php
This does a few things:
It segments out everything very clearly. You know which language to find where. And you know what's in the file associated with the language.
The above makes maintenance easier in the future because you can find stuff.
It gives you files, by language, that can be translated separately.
It puts all the message in one clearly defined place.
One thing to note, is that if your app gets REALLY big and REALLY international, you may want to use ISO language codes instead. For example, european Portugese (pt_PT) and Brazilian Portugese are different and with a global audience you'd probably want to cover both.
As I've started building a project, there will be quite a few entries in the .po translation file. I use Poedit to build these.
My question is, what is the best practice for entries within this file? I was thinking, instead of referencing entries such as:
echo _('This is an entry.');
I was thinking of organizing them like:
echo _('error_pwd');
echo _('error_user_taken');
Which, once ran through the translation file, would output something like:
Password incorrect. Please try again.
Username is already taken. Please try another.
So, all my translations can be organized by type, such as error_, msg_, status_, tip_, etc.
Has anyone seen it done this way, or have any suggestions on a more organized method?
In fact it doesn't matter!
It's just up to you.
However, I advise you do not split translations in sections.
No there's any benefit in doing so. Actually, the most projects use the one file approach for all msgid entries.
Like django, see.
Of course, If you still want split translation by sections, might you want take a look on Domains:
From PHP doc:
This function (textdomain()) sets the domain to search within when calls are made to
gettext(), usually the named after an application.
Also, as earlier said, the advantage when using msgid as a real phrase ou word (instead underline or dotted notation key) is that it stays as default message if no there's a translation for entry.
And here goes some helpful links:
Django Porject - i18n, Definition
PHP textdomain function
What is bindtextdomain, textdomain in gettext?
How to determine which catalog to be used
This is a standard approach for other framework, e.g. Symfony/Laravel:
trans('error.validation');
But it has a downfall, if you forget to translate one phrase on your site it will appear like the keyword 'error.validation'
I am working on a website and the requirement is to make it in two languages i.e. icelandic and english.
just like facebook and other google, if a user selects a language, then the site is translated in that language.
I am not allowed to use google translator.
Any other way to do this in Php
Thanks in advance
Well, I never did it, but i did think about it :), for me i have to do something like this from scratch,
First, do not echo your String that will be displayed to your clients hardcoded, create a dictionary, this dictionary can be in any format, be it php file, xml file, json. You can also extend the functionality by adding Database in it. The main idea is to create a dictionary having all your messages that will be displayed to the user in all the languages you want to display it
consider if you do it using normal PHP FIle, use OOP built class say known as Message, then as attribute to the class add the several languages that you have to use and also some setters and getters
e.g.
Message
{
english;
french;
.....
}
then in PHP, when you echo your messages, try to get the language you want to use, and then
do something like this
echo message.getEnglishMessage();
Look, I've been very generic, now decide on the type of file that you'll use and build the dictionary
Hopes it helps :-)
I use an es.php (spanish not sure what icelandic is) and build all of the mod_rewrite off that. You treat it exactly same as you would if it were the index.php for english. For inputing data into the database have a column for language. All of your queries that call data will then have the language as a condition.
The "gettext" is the way you can go with but if you and your client are in nice understanding ask him to provide the data in language other than english as well and then in DB table there will be a column 'language' in which 'ic' or 'en' flag will be the data, and during fetching the data anywhere, according to language your sql query will contain the language as a where condition with desired flag as its value.
Trying to work this out, but I don't know what's the best practice for this kind of things.
I'm working on a website using 3 languages: English, French & Dutch. There are categories on the website and the category names are different for the 3 languages.
For example:
Stars -> English
Sterren -> Dutch
Stars -> French
So I was thinking about adding them to the database. It's also easier for me to add more categories later if needed.
Now I'm facing the problem how to do this. My solution is:
**Cat_lang (category languages)**
cat_lang_id
language
**Categories**
categories_id
cat_lang_id
cat_title
Using cat_lang_id I can link both tables to get the language I need.
Is this the best solution for this problem?
Thanks in advance.
So that you can expand your website more easily in the future, I dont recommend having a cat_lang table. Stick with a languages table that contains language_id and language_name, and have your categories table point to it. Doing it that way allows you to have other entity types in your database (e.g. articles) that also contain multiple languages.
This is a flexible and reasonable solution. You see the same type of design in large scale ERP systems that have to handle dozens of languages and the possibility of more being added at any time.
If I were doing a website in multiple languages, I would use Zend_Translate to do the translations. Basically, you create a Zend_Translate object which reads in data files. Then you make calls on that object to translate() giving it the english version and it will give the translation in the correct language. Zend_Translate will scan your source and find all references to requested translations which will make files that can be translated by hand.
You are going to have much more than just the category names to translate, so I would recommend an approach like this where you just read in the translate file.
If you don't plan for a massive scale website and that you don't plan to increase to 100 languages, you can do a simpler and 'less nice' solution that is to have only 1 table of categories, where you hard code the language code in the category_name, for instance:
**Categories**
categories_id
cat_title_fr
cat_title_en
cat_title_de
Then in your code you set a $language_code variable at the beginning of each page using an include, you can even analyze the domain name in the $_SERVER variables to asign the correct language an by default choose the one you like (if you leave the variable empty your queries will return no text).
and you generate your queries like this:
mysql_query("SELECT cat_title_".$lang." FROM categories;");
Yeah it is dirty because you hard code the language in your DB structure, but if you have the exact same categories in each language with just a translation of the name, it is simple to implement.
Besides to add a language you just need to add a field in your table with the new translation, for instance spanish would be
cat_title_es
I'm developing a website in PHP and I'd like to give the user to switch from German to English easily.
So, a translation politic must be considered:
Should I store the data and its translation in a database table ((1, "Hello", "hallo"), (2, "Good morning", "Guten Tag") etc .. ?
Or should I use the ".mo" Files to store it?
Which way is the best?
What are the pros and the cons?
After having just tackled this myself recently (12 languages and counting) on a production system and having run into some major performance issues along the way I would suggest a hybrid system.
1) Store the language strings and translations in a database--this will make it easy to interact with/update/remove items plus will be part of your normal backup routines.
2) Cache the languages into flat files on the server and draw those out as necessary to display on the page.
The benefits here are many--mostly it is fast! I am not dealing with connection overhead for MySQL or any traffic slowdowns during the transfer. (especially important if your DB server is not localhost).
This will also make it very easy to use. Store the data from your database in the file as a php serialized array and GZIP the contents of the file to shrink storage overhead (this also makes it faster in my benchmarking).
Example:
$lang = array(
'hello' => 'Hallo',
'good_morning' => 'Guten Tag',
'logout_message' = > 'We are sorry to see you go, come again!'
);
$storage_lang = gzcompress( serialize( $lang ) );
// WRITE THIS INTO A FILE SUCH AS 'my_page.de'
When a user loads your system for the first time do a file_exists('/files/languages/my_page.de'). If the file exists then load the content, un-gzip, and un-serialize and it is ready to go.
Example
$file_contents = get_contents( 'my_page.de' );
$lang = unserialize( gzuncompress( $file_contents ) );
As you can see you can make the caching specific to each page in the system keeping the overhead even smaller and use the file extension to denote language... (my_page.en, my_page.de, my_page.fr)
If the file DOESN'T exist then query the DB, build your array, serialize it, gzip it and write the missing file--at the same time you have just constructed the array that the page needed so continue on to display the page and everyone is happy.
Finally, this allows you to build in update pages accessible to non-programmers but you also control when changes appear by deciding when to remove cache files so they can be rebuilt by the system.
Warnings and Pitfalls
When I kept everything in the database directly we hit some MAJOR slowdowns when our traffic spiked.
Trying to keep them in flat-file arrays only was so much trouble because updates were painful and prone to errors.
Not GZIP compressing the contents of the cache files made the language system about 20% slower in my benchmarks.
Make sure all of your database fields containing languages are set to UTF8-general-ci (or at least one of the UTF8 options, I find general-ci best for my use). If you don't you will not be able to store non-unicode character sets in your database (like Chinese, Japanese, etc)
Extension:
In response to a comment below, be sure to set your database tables up with page level language strings in mind.
id string page global
1 hello NULL 1
2 good_morning my_page.php 0
Anything that shows up in headers or footers can have a global flag that will be queried in every cache file created, otherwise query them by page to keep your system responsive.
PHP arrays are indeed the fastest way to load translations. However, you really don't want to update these files by hand in an editor. This might work in the beginning, and for one or two languages, but when your site grows this gets really hard to maintain.
I advise you to setup a few simple tables in a database where you keep the translations, and build a simple app that lets you update the translations (some forms to add and update texts). As for the database: use one table to store translation variables; use another to link translations to these variables.
Example:
`text`
id variable
1 hello
2 bye
`text_translations`
id textId language translation
1 1 en hello
2 1 de hallo
3 2 en bye
4 2 de tschüss
So what you do is:
create the variable in the first table
add translations for it in the second table (in whatever language you want)
After you've updated the translations, create/update a language file for each language that you're using:
select the variables you need and its translation (tip: use English if there's no translation)
create a big array with all this stuff, e.g.:
$texts = array('hello' => 'hallo', 'bye' => 'tschüss');
write the array to a file, e.g.:
file_put_contents('de.php', serialize($texts));
in your PHP/HTML create the array from the file (based on selected language by user), e.g.:
$texts = unserialize(file_get_contents('de.php'));
in your PHP/HTML use the variables, e.g.:
<h1><?php echo $texts['hello']; ?></h1>
or if you like/enabled PHP short tags:
<p><?=$texts['bye'];?></p>
This setup is very flexible, and with a few forms to update the translations it's easy to keep your site up to date in multiple languages.
I'd also suggest Zend Framework Zend_Translate package.
The manual gives a good overview on How to decide which translation adapter to use. Even when not using ZF, this will give you some ideas about what is out there and what the pros and cons are.
Adapters for Zend_Translate
Array
Use PHP arrays Small pages;
simplest usage; only for programmers
Csv
Use comma separated (.csv/.txt) files
Simple text file format; fast; possible problems with unicode characters
Gettext
Use binary gettext (*.mo) files GNU standard for linux;
thread-safe; needs tools for translation
Ini
Use simple ini (*.ini) files
Simple text file format; fast; possible problems with unicode characters
Tbx
Use termbase exchange (.tbx/.xml) files
Industry standard for inter application terminology strings; XML format
Tmx
Use tmx (.tmx/.xml) files
Industry standard for inter application translation; XML format; human readable
Qt
Use qt linguist (*.ts) files
Cross platform application framework; XML format; human readable
Xliff
Use xliff (.xliff/.xml) files
A simpler format as TMX but related to it; XML format; human readable
XmlTm
Use xmltm (*.xml) files
Industry standard for XML document translation memory; XML format; human readable
There are some factors you should consider.
Will the website be updated frequenytly? if yes, by whom? you or the owner? how much data / information are you dealing with? and also... are you doing this frequently (for many clients) ?
I can hardly think that using a relational database can couse any serious speed impacts unless you are having VERY high traffic (several hundreds of thousands of pageviews per day).
Should you be doing this frequently (for lots of clients) think no further: build up a CMS (or use an existing one). If you really need to consider speed impact, you can customize it so that when you are done with the website you can export static HTML pages where possible.
If you are updating frequently, the same as above applies.
If the client has to update (and not you), again, you need a CMS.
If you are dealing with lots of infomration (big and lots of articles), you need a CMS.
All in all, a CMS will help you build up your website structure fast, add content fast and not worry that much about code since it will be reusable.
Now, if you just need to create a small website fast, you can easily do this with hardcoded arrays and datafiles.
If you need to provide web interface for adding/editting translations, then database is a good idea.
If, however, your translations are static, I would use gettext or even plain PHP array.
Either way you can take advantage of Zend_Translate.
Small comparison, the first two from Zend tutorial:
Plain PHP arrays: Small pages; simplest usage; only for programmers.
Gettext: GNU standard for linux; thread-safe; needs tools for translation.
Database: Dynamic; Worst performance.
I would recommend PHP arrays, they can be built around a GUI for easy access.
Be realize the everybody in the world when dealing with computer, they usually know some common English used in computer or internet like About Us, Home, Send, Delete, Read More etc. Question : Are they really need to be translated?
Ok, honestly, some translation to that words is actually not about 'required', it's all about 'style'.
Now, if it's really wanted, for the common words that no need to be changed forever, it's better use a php file which output lang array for only local and English. And for some contents such as blog, news and some descriptions, use database and save in as many as language translation required. You must do it manually.
Using and rely on Google Translate? I think you have to think 1000 times. At least for this decade.