How to efficiently translate website content with php, keeping it SEO friendly? - php

This is the scenario:
I have a website that I'll translate and eventually apply a good SEO on it.
Which method is best for translate the content (menu links, about 10 articles, alt tags, title tags, meta tags, html lang, etc) while being easely indexed by Google, Bing, Yandex and other search engines?
My first idea is to use a translate php function that consists of arrays made by myself (I have a prototyope of it already) that takes the content and displays it in the user's language.
Is this the right path? the problem here is that I wanted to be sure to have a dynamic system that allows me to add a new language in the future.
Maybe MySql is the right choice?
The website doesn't use a cms, I made it by myself with php though I have no problem to rely on MySql if I need to.
Thank you in advance :)

You've basically got 3 choices and there are pros and cons to each:
1: as Dainis Abols suggests, chuck it in the database - depending on how your server is set up this could be the slowest, most system heavy route (it's all relative though, it's unlikely to make any difference unless you're getting millions of view an hour).
2: use PHP library files; I tend to use library files for small, single items like field labels (forename, surname etc) and store larger things like CMS-managed HTML in the database... this reduces the database calls but adds a small overhead for each dictionary you load into a script <?php $this->page->dictionary->product = Dictionary::load("product"); ?> sort of thing.
3: finally, I personally think it's worth taking a look at PHP's implementation of gettext though you'll need something like poedit to maintain the PO (compressed translation files). This gives you the ability to very rapidly maintain translations as you just enter the text in your PHP document by wrapping it in a simple underscore function:
e.g. <?= _("Hello World"); ?>
You then maintain the translations in compressed PO files - it's very efficient (potentially faster than doing it with native PHP files) however it does have some drawbacks when it comes to the nuances of natural language.
As an example, if you have a field label "title" <?= _("Title"); ?> then all instances of _("Title") will be translated in the same way.
This means you can't use "Title" as both a form label for a person's title and as the title of a book; for instance, in German, you may want to use Anrede for one "Title" and Titel for the other.
Although, to really use gettext you'd probably need to be running your own server - it can require an Apache reboot when you change the PO files :\
As for Search Engines they read the output from your code so it doesn't really make a lot of difference which method you use to perform the translations but ideally you may want to keep the URLs RESTful so whether you're including PHP dictionaries, calling the database or using gettext (or changing your mind from one to another later), you'll be able to map the language to the URL with something like http://www.mysite.com/en_gb/widgets so you can change how the program works without changing the URLs.

Store all texts inside a db and apply another field for language:
+----+---------+---------+
| id | text_en | text_de |
+----+---------+---------+
| 1 | English | Deutch |
+----+---------+---------+
Now, when user switches languages, just use the field for that language:
$lang = 'en';
$query = "SELECT text_".$lang." FROM texts WHERE id = 1";
Something like that. So, all your client side texts will be stored inside the db at all times. So your output will be like:
<div id="header"><?=get_db_text_for_id(1)?></div>
Of course, you need precautions and some more field, but thats the general idea.

Related

Best way to organize your localized translation file

As I've started building a project, there will be quite a few entries in the .po translation file. I use Poedit to build these.
My question is, what is the best practice for entries within this file? I was thinking, instead of referencing entries such as:
echo _('This is an entry.');
I was thinking of organizing them like:
echo _('error_pwd');
echo _('error_user_taken');
Which, once ran through the translation file, would output something like:
Password incorrect. Please try again.
Username is already taken. Please try another.
So, all my translations can be organized by type, such as error_, msg_, status_, tip_, etc.
Has anyone seen it done this way, or have any suggestions on a more organized method?
In fact it doesn't matter!
It's just up to you.
However, I advise you do not split translations in sections.
No there's any benefit in doing so. Actually, the most projects use the one file approach for all msgid entries.
Like django, see.
Of course, If you still want split translation by sections, might you want take a look on Domains:
From PHP doc:
This function (textdomain()) sets the domain to search within when calls are made to
gettext(), usually the named after an application.
Also, as earlier said, the advantage when using msgid as a real phrase ou word (instead underline or dotted notation key) is that it stays as default message if no there's a translation for entry.
And here goes some helpful links:
Django Porject - i18n, Definition
PHP textdomain function
What is bindtextdomain, textdomain in gettext?
How to determine which catalog to be used
This is a standard approach for other framework, e.g. Symfony/Laravel:
trans('error.validation');
But it has a downfall, if you forget to translate one phrase on your site it will appear like the keyword 'error.validation'

translating a website in php or mysql?

i need to translate my site in multiple languages. i was thinking to use a database called language and put the translation there.
database : translation
tables: language
column: id, english, french, german, italian, spanish
or i was thinking about a php solution like:
english.php
french.php
german.php
italian.php
spanish.php
so you simply include the file you need.
now, i can see pros and cons for both, what i want to know is what is consider the standard in the industry to do something like this?
You can use gettext, this function is proposed for this feature, not a "standard" but fast enough.
The second options in the use of a PHP file with a big array (really big, for each string), this is the most common solution.
To the database content (the big problem here, don't forget), if all your content must have the translation, one column for each language, otherwise use a flag of language for each line on database.
There is no industry standard. I have seen (and implemented) solutions using flat files, XML, PHP code, a database, and gettext files to store the localized strings. It's a matter of what is more suitable for you.
My go-to method for PHP is simply files containing arrays of strings, for example
en.php
return array (
'How are you?' => 'How are you?',
'Goodbye' => 'Goodbye',
);
de.php
return array (
'How are you?' => 'Wie gehts?',
'Goodbye' => 'Auf wiedersehen',
);
This can be integrated into an application with reasonable granularity (there can be many such files, e.g. one for each component) and control (you can easily fall back to any other language if you don't find a string) and it is also very convenient to modify without need for special tools.
My favorite PHP framework (Yii) and a giant open source project I have worked on (Moodle) also use this approach.
Noone of the two solutions seems great to me. You should think in the long run when you think a solution.
What if you choose to translate your website in other languages different from those you thought as russian or chinese? In the first case you have to add more and more columns, in the second you've to create more and more file. Another cons is what if you translate a page in italian and spanish but not yet in french?
I think that a good thing is to have a database based solution and a main language. Now you can do something like this:
Create a table 'page' (id, title, ...) where you'll store the page in the main language and where you'll have the info of the translated page too
Create a table 'translation' (idsource, idtranslation, language)
Everytime check the available translations and give those to the users
In database localization you have four main strategies. Each has particular advantages and disadvantages. For the long term I would definitely recommend cloning. You can see the four methods at the link below:
http://www.sisulizer.com/localization/software/server-desktop-database.shtml
There are two main ideas you want to be sure to be implementing. The first, be sure you are integrating some form of translation memory. Your language vendor should be instructing you on how to do this and probably doing it for you.
The second, for each additional language you target, your data will get at least 2x more complex. Keep this in mind as you move forward. Not only your data, but your file sets, management, etc.
Hope that helps. Let me know if you have further questions.
Russell

Best way to build a multilingual site with codeigniter?

My current approach has been to use to _remap function provided by codeigniter to get the URI segment in order to check if the language is "en" or "np"
Here is a sample:
function _remap($url_title){
$this->_identify_language($this->uri->segment(1));
$data ['sub_categories'] = $this->category_model->get_category_list_by_url($url_title)->result_array();
$data ['news'] = $this->news_model->get_news_list_by_url($url_title)->result_array();
$data ['url_title'] = $url_title;
$this->_render_front_view('main',$data);
}
I am using this technique on every controller. Which is well not very efficient.
I wanted to ask if using sessions to store language codes would be better or is my current technique good enough?
Are there any other ways i can do this multi-lingual thing?
Of course my database is currently shaped for 2 lanaguages and i have seperated the fields. e.g:- title_en, title_np. these are echoed according to the language field used.
Lots of parts to this.
Your URL's do not really need to be /en/ and /fr/ unless you want it to be used for Google Analytics. Spidering doesn't make a lot of difference. Accept-Language headers can be just as reliable.
Globally parse this URL segment. You can use this method or the Accept-Language, but either way you need a hook, a MY_Controller or extend the Lang class.
Think about if you want the different languages to be totally seperate. For example, if I have an English page not translated to French, and the French page does not exist, should it show the English page or 404? You can either store the lang = fr in the database and take the value from a constant set in the hook/MY_Controller/etc.
WHERE lang = CURRENT_LANGUAGE
Structure your DB. title_en title_fr is one method, but it soon because unmanagable with lots of languages. Have a "pages" and "page_content" table, so that all generic information is in one table then all language specific (title, content, meta, etc) is in the page_conten table, which has a lang field.
There are a million ways to do all of this, but there is lots more to think about than just the URL. My favourite
I have been using this internationalization library for codeigniter and I find it suits my needs pretty well.
It extends the Lang class, and then in the constructor it parses the URI to figure out which language to use. So it is just loaded before you use any language files. You don't need to add any code to your controllers. It simply changes the setting in the language object. So you can retrieve the current language the same as you normally would:
$this->lang->lang();
If you have 500 news and 2 languages, changing url prefix in root will give you 1000 links, lets say "/en/hello-world" and "/np/hello-world" will have identical content and possibly the same title, which can be bad from SEO aspect. I would use session or cookies to store preferences, to preserve link juice.

Categories in different languages (PHP)

Trying to work this out, but I don't know what's the best practice for this kind of things.
I'm working on a website using 3 languages: English, French & Dutch. There are categories on the website and the category names are different for the 3 languages.
For example:
Stars -> English
Sterren -> Dutch
Stars -> French
So I was thinking about adding them to the database. It's also easier for me to add more categories later if needed.
Now I'm facing the problem how to do this. My solution is:
**Cat_lang (category languages)**
cat_lang_id
language
**Categories**
categories_id
cat_lang_id
cat_title
Using cat_lang_id I can link both tables to get the language I need.
Is this the best solution for this problem?
Thanks in advance.
So that you can expand your website more easily in the future, I dont recommend having a cat_lang table. Stick with a languages table that contains language_id and language_name, and have your categories table point to it. Doing it that way allows you to have other entity types in your database (e.g. articles) that also contain multiple languages.
This is a flexible and reasonable solution. You see the same type of design in large scale ERP systems that have to handle dozens of languages and the possibility of more being added at any time.
If I were doing a website in multiple languages, I would use Zend_Translate to do the translations. Basically, you create a Zend_Translate object which reads in data files. Then you make calls on that object to translate() giving it the english version and it will give the translation in the correct language. Zend_Translate will scan your source and find all references to requested translations which will make files that can be translated by hand.
You are going to have much more than just the category names to translate, so I would recommend an approach like this where you just read in the translate file.
If you don't plan for a massive scale website and that you don't plan to increase to 100 languages, you can do a simpler and 'less nice' solution that is to have only 1 table of categories, where you hard code the language code in the category_name, for instance:
**Categories**
categories_id
cat_title_fr
cat_title_en
cat_title_de
Then in your code you set a $language_code variable at the beginning of each page using an include, you can even analyze the domain name in the $_SERVER variables to asign the correct language an by default choose the one you like (if you leave the variable empty your queries will return no text).
and you generate your queries like this:
mysql_query("SELECT cat_title_".$lang." FROM categories;");
Yeah it is dirty because you hard code the language in your DB structure, but if you have the exact same categories in each language with just a translation of the name, it is simple to implement.
Besides to add a language you just need to add a field in your table with the new translation, for instance spanish would be
cat_title_es

What is the best way to put a translation system in php website?

I'm developing a website in PHP and I'd like to give the user to switch from German to English easily.
So, a translation politic must be considered:
Should I store the data and its translation in a database table ((1, "Hello", "hallo"), (2, "Good morning", "Guten Tag") etc .. ?
Or should I use the ".mo" Files to store it?
Which way is the best?
What are the pros and the cons?
After having just tackled this myself recently (12 languages and counting) on a production system and having run into some major performance issues along the way I would suggest a hybrid system.
1) Store the language strings and translations in a database--this will make it easy to interact with/update/remove items plus will be part of your normal backup routines.
2) Cache the languages into flat files on the server and draw those out as necessary to display on the page.
The benefits here are many--mostly it is fast! I am not dealing with connection overhead for MySQL or any traffic slowdowns during the transfer. (especially important if your DB server is not localhost).
This will also make it very easy to use. Store the data from your database in the file as a php serialized array and GZIP the contents of the file to shrink storage overhead (this also makes it faster in my benchmarking).
Example:
$lang = array(
'hello' => 'Hallo',
'good_morning' => 'Guten Tag',
'logout_message' = > 'We are sorry to see you go, come again!'
);
$storage_lang = gzcompress( serialize( $lang ) );
// WRITE THIS INTO A FILE SUCH AS 'my_page.de'
When a user loads your system for the first time do a file_exists('/files/languages/my_page.de'). If the file exists then load the content, un-gzip, and un-serialize and it is ready to go.
Example
$file_contents = get_contents( 'my_page.de' );
$lang = unserialize( gzuncompress( $file_contents ) );
As you can see you can make the caching specific to each page in the system keeping the overhead even smaller and use the file extension to denote language... (my_page.en, my_page.de, my_page.fr)
If the file DOESN'T exist then query the DB, build your array, serialize it, gzip it and write the missing file--at the same time you have just constructed the array that the page needed so continue on to display the page and everyone is happy.
Finally, this allows you to build in update pages accessible to non-programmers but you also control when changes appear by deciding when to remove cache files so they can be rebuilt by the system.
Warnings and Pitfalls
When I kept everything in the database directly we hit some MAJOR slowdowns when our traffic spiked.
Trying to keep them in flat-file arrays only was so much trouble because updates were painful and prone to errors.
Not GZIP compressing the contents of the cache files made the language system about 20% slower in my benchmarks.
Make sure all of your database fields containing languages are set to UTF8-general-ci (or at least one of the UTF8 options, I find general-ci best for my use). If you don't you will not be able to store non-unicode character sets in your database (like Chinese, Japanese, etc)
Extension:
In response to a comment below, be sure to set your database tables up with page level language strings in mind.
id string page global
1 hello NULL 1
2 good_morning my_page.php 0
Anything that shows up in headers or footers can have a global flag that will be queried in every cache file created, otherwise query them by page to keep your system responsive.
PHP arrays are indeed the fastest way to load translations. However, you really don't want to update these files by hand in an editor. This might work in the beginning, and for one or two languages, but when your site grows this gets really hard to maintain.
I advise you to setup a few simple tables in a database where you keep the translations, and build a simple app that lets you update the translations (some forms to add and update texts). As for the database: use one table to store translation variables; use another to link translations to these variables.
Example:
`text`
id variable
1 hello
2 bye
`text_translations`
id textId language translation
1 1 en hello
2 1 de hallo
3 2 en bye
4 2 de tschüss
So what you do is:
create the variable in the first table
add translations for it in the second table (in whatever language you want)
After you've updated the translations, create/update a language file for each language that you're using:
select the variables you need and its translation (tip: use English if there's no translation)
create a big array with all this stuff, e.g.:
$texts = array('hello' => 'hallo', 'bye' => 'tschüss');
write the array to a file, e.g.:
file_put_contents('de.php', serialize($texts));
in your PHP/HTML create the array from the file (based on selected language by user), e.g.:
$texts = unserialize(file_get_contents('de.php'));
in your PHP/HTML use the variables, e.g.:
<h1><?php echo $texts['hello']; ?></h1>
or if you like/enabled PHP short tags:
<p><?=$texts['bye'];?></p>
This setup is very flexible, and with a few forms to update the translations it's easy to keep your site up to date in multiple languages.
I'd also suggest Zend Framework Zend_Translate package.
The manual gives a good overview on How to decide which translation adapter to use. Even when not using ZF, this will give you some ideas about what is out there and what the pros and cons are.
Adapters for Zend_Translate
Array
Use PHP arrays Small pages;
simplest usage; only for programmers
Csv
Use comma separated (.csv/.txt) files
Simple text file format; fast; possible problems with unicode characters
Gettext
Use binary gettext (*.mo) files GNU standard for linux;
thread-safe; needs tools for translation
Ini
Use simple ini (*.ini) files
Simple text file format; fast; possible problems with unicode characters
Tbx
Use termbase exchange (.tbx/.xml) files
Industry standard for inter application terminology strings; XML format
Tmx
Use tmx (.tmx/.xml) files
Industry standard for inter application translation; XML format; human readable
Qt
Use qt linguist (*.ts) files
Cross platform application framework; XML format; human readable
Xliff
Use xliff (.xliff/.xml) files
A simpler format as TMX but related to it; XML format; human readable
XmlTm
Use xmltm (*.xml) files
Industry standard for XML document translation memory; XML format; human readable
There are some factors you should consider.
Will the website be updated frequenytly? if yes, by whom? you or the owner? how much data / information are you dealing with? and also... are you doing this frequently (for many clients) ?
I can hardly think that using a relational database can couse any serious speed impacts unless you are having VERY high traffic (several hundreds of thousands of pageviews per day).
Should you be doing this frequently (for lots of clients) think no further: build up a CMS (or use an existing one). If you really need to consider speed impact, you can customize it so that when you are done with the website you can export static HTML pages where possible.
If you are updating frequently, the same as above applies.
If the client has to update (and not you), again, you need a CMS.
If you are dealing with lots of infomration (big and lots of articles), you need a CMS.
All in all, a CMS will help you build up your website structure fast, add content fast and not worry that much about code since it will be reusable.
Now, if you just need to create a small website fast, you can easily do this with hardcoded arrays and datafiles.
If you need to provide web interface for adding/editting translations, then database is a good idea.
If, however, your translations are static, I would use gettext or even plain PHP array.
Either way you can take advantage of Zend_Translate.
Small comparison, the first two from Zend tutorial:
Plain PHP arrays: Small pages; simplest usage; only for programmers.
Gettext: GNU standard for linux; thread-safe; needs tools for translation.
Database: Dynamic; Worst performance.
I would recommend PHP arrays, they can be built around a GUI for easy access.
Be realize the everybody in the world when dealing with computer, they usually know some common English used in computer or internet like About Us, Home, Send, Delete, Read More etc. Question : Are they really need to be translated?
Ok, honestly, some translation to that words is actually not about 'required', it's all about 'style'.
Now, if it's really wanted, for the common words that no need to be changed forever, it's better use a php file which output lang array for only local and English. And for some contents such as blog, news and some descriptions, use database and save in as many as language translation required. You must do it manually.
Using and rely on Google Translate? I think you have to think 1000 times. At least for this decade.

Categories