Gettext: Translation of strings with HTML inside? - php

My current implementation, which is array based stores keys and values in a dictionary, example:
$arr = array(
'message' => 'Paste a flickr URL below.',
);
I realize that it was probably a bad idea storing html inside of a string such as this, but if I'm using gettext then in my .mo/.po files how should I handle storing a similar string? Should I just store words, such as 'Paste a' and 'URL below' and 'flickr' separately?

You should store something like
"Paste a %1 URL below"
and replace all 'vars' using something simple like str_replace('%1', $link, $message);
$link can also be translatable
"%1"
although that might be overkill (does flickr translate between languages?)
rationale behind this is that different languages have different grammatical structures and the ordering of the words wont always be the same.
Update:
as #alex and #chelmertz mention in the comments, try using the sprintf function, which is built for this very thing.

I'd go for this:
$arr = array(
'message' => _('Paste a %s URL below.'),
);
Having all translations as string literals within gettext function calls allows to use standard tools to update *.po catalogues.

Related

Page crashing from '#' as a GET-parameter

Ive been using URL-parameters to make a landingpage behind a searchform more personal. I felt relatively bulletproof validating stuff like this
$string = $_GET['city']
$res = preg_replace("/[^a-zA-Z0-9]/", "", $string);
until I tried something like ?city=# as a value and my whole page crashed and im not so sure anymore.
What is the way to go to validate without writing a whole engine or at least stop my page crashing from #?
Thanks
PHP has a lot of functionalities which help you avoid problems like this.
Whenever you create URL to be displayed in the browser it has to be urlencoded. If you are just appending the query string part to a fixed url you can build that string with http_build_query. For example:
$querystring = [
'param1' = 123,
'param2' = 'hello with a #'
];
$QS_encoded = http_build_query($querystring);
echo 'My link';
# in URL denotes another part of URL which is the hash part. This is not going to be a part of your $_GET superglobal.
If for any reason you would like to type out the URL with a query string containing # manually by hand, then you need to use the encoded version %23. e.g. http://php.net/manual-lookup.php?pattern=%23
On a side note. You shouldn't use regex for filtering data like this. PHP once again has already an extension for this: filters.

Translate url parameters as place holder

I wondering how to translate a URL in ZF2 that has a parameter on it.
For example:
/{:language_link-schools-:city_link}
The reason why I don't do:
/:language_link-{schools}-:city_link
It is because in some languages, for example, Spanish, the order of the words will change.
I am using PhpArray, and when I translate it, the parameters are not replaced, therefore the URL is rendered as (example in Spanish):
/:language_link-escuela-:city_link
Instead of the expected behaviour:
/ingles-escuela-miami
Edit:
The parameters are
:language_link and :city_link
So the idea is that in one language the rendered URL could be:
/:language_link-schools-:city_link
and in another language it could be:
/:language_link-:city_link-school
Similarly as it is done when you translate a statement doing:
sprintf($this->translate('My name is %s'), $name) ;
There is a function in PHP called strtr. It allows translating any pattern into values.
With your example, we can do the following:
If the string is like this: /:language_link-escuela-:city_link
Then you can do the following
<?php
$rawUrl = "/:language_link-escuela-:city_link";
$processedUrl = strtr($rawUrl, [
':language_link' => 'es',
':city_link' => 'barcelona',
]);
echo $processedUrl; // Output: /es-escuela-barcelona

Storing string templates

Template Strings.
This link might help a little bit:
Does PHP have a feature like Python's template strings?
What my main issue is, is to know if there's a better way to store Text Strings.
Now, is this normally done with one folder (DIR), and plenty of single standalone files with different strings, and depending on what one might need, grab the contents of one file, process and replace the {tags} with values.
Or, is it better to define all of them inside one single file array[]?
greetings.tpl.txt
['welcome'] = 'Welcome {firstname} {lastname}'.
['good_morning'] = 'Good morning {firstname}'.
['good_afternoon'] = 'Good afternoon {firstname}'.
Here's another example, https://github.com/oren/string-template-example/blob/master/template.txt
Thx in advance!
Answers that include solutions, that state that one should use include("../file.php"); are NEVER ACCEPTED HERE. A solution that shows how to read a LIST of defined strings into an array. The definition is already array based.
To add values to templates, you can use strtr. Example below:
$msg = strtr('Welcome {firstname} {lastname}', array(
'{firstname}' => $user->getFistName(),
'{lastname}' => $user->getLastName()
));
Regarding storing strings, you can save one array per language and then load only relevent one. E.g. you'll have a directory with 2 files:
language
en.php
de.php
Each file should contain the following:
<?php
return (object) array(
'WELCOME' => 'Welcome {firstname} {lastname}'
);
When you need translations, you can just do the following:
$dictionary = include('language/en.php');
And the dictionary will then have an object that you can address. Changing the example above, it will be something like this:
$dic = include('language/en.php');
$msg = strtr($dic->WELCOME, array(
'{firstname}' => $user->getFistName(),
'{lastname}' => $user->getLastName()
));
To avoid the situation when you don't have the template in dictionary, you can use a ternary operator with the default text:
$dic = include('language/en.php');
$tpl = $dic->WELCOME ?: 'Welcome {firstname} {lastname}';
$msg = strtr($tpl, array(
'{firstname}' => $user->getFistName(),
'{lastname}' => $user->getLastName()
));
What people usually do to be able to edit the texts in db, you can have a simple export (e.g. var_export) script to sync from db to files.
Hope this helps.
OK John I will elaborate.
The best way is to create a php file, for each language, containing the definition of an array of texts, using printf format for string substitution.
If the amount of text is very large, you might consider partitioning it further. (a few MB is usually fine)
This is efficient in production, assuming the OS has a well tuned file cash. Slightly more so, it you use numerical indexes to the array.
It is much more efficient to let php populate the array, then to do it your self, reading a text file. this is after all, I assume, static text?
If production performance is not an issue, please disregard this post.
greetings_tpl_en.php
$text_tpl={
'welcome' => 'Welcome %s %s'
,'good_morning' => 'Good morning %s'
,'good_afternoon' => 'Good afternoon %s'
};
your.php
$language="en";
require('greetings_tpl_'. $language .'php');
....
printf($text_tpl['welcome'],$first_name,$last_name);
printf i a nice legacy from the C language. sprintf returns a string instead of outputting it.
You can find the full description of the php printf format here: http://php.net/manual/en/function.sprintf.php
(Do read Josef Kufner post again, when this is solved. +1 :c)
Hope this helps?
First, take a look at gettext. It is widely used and there is plenty of tools to handle translation process, like xgettext and POEdit. It is more comfortable to use real english strings in source code and then extract them using xgettext tool. Gettext can handle plural forms of practically all languages, which is not possible when using simple arrays of strings.
Very useful function to combine with gettext is sprintf() (or printf(), if you want to output text directly).
Example:
printf(gettext('Welcome %s %s.'), $firstname, $lastname);
printf(ngettext('You have %d new message.', 'You have %d new messages.',
$number_of_new_messages), $number_of_new_messages);
Then, when you want to translate this into language where last name usually precedes first name, you can use this: 'Welcome %2$s, %1$s.'
The second example, the plural form, can be translated using more than two strings, because part of localization file is how plural forms are arranges. While for english it is nplurals=2; plural=(n != 1);, for example in czech it is nplurals=3; plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2; (three forms, first is for one item, second for 2 to 4 items and third for the rest). For example Irish language has five plural forms.
To extract strings from source code use xgettext -L php .... I recommend writing short script with the exact command fitting your project, something like:
# assuming this file is in locales directory
# and source code in src directory
find ../src -type f -iname '*.php' > "files.list"
xgettext -L php --from-code 'UTF-8' -f "files.list" -o messages.pot
You may want to add custom function names using -k argument.
You could store all the templates in one associative array and also the variables that are to replace the placeholders, like
$capt=array('welcome' => 'Welcome {firstname} {lastname}',
'good_morning' => 'Good morning {firstname}',
'good_afternoon' => 'Good afternoon {firstname}');
$vars=array('firstname'=>'Harry','lastname'=>'Potter', 'profession'=>'wizzard');
Then, you could transform the sentences through a simple preg_replace_callback call like
function repl($a){ global $vars;
return $vars[$a[1]];
}
function getcapt($type){ global $capt;
$str=$capt[$type];
$str=preg_replace_callback('/\{([^}]+)\}/','repl' ,$str);
echo "$str<br>";
}
getcapt('welcome');
getcapt('good_afternoon');
This example would produce
Welcome Harry Potter
Good afternoon Harry

Am I breaking any "php good practice" in the following php array which deals with 3 (human) languages?

This is the most optimal way of dealing with a multilingual website I can think of, right now (not sure) which doesn't involve gettext, zend_translate or any php plugin or framework.
I think its pretty straight forward: I have 3 languages and I write their "content" in different files (in form of arrays), and later, I call that content to my index.php like you can appreciate in the following picture:
alt text http://img31.imageshack.us/img31/1471/codew.png
I just started with php and I would like to know if I'm breaking php good practices, if the code is vulnerable to XSS attack or if I'm writing more code than necessary.
EDIT: I posted a picture so that you can see the files tree (I'm not being lazy)
EDIT2: I'm using Vim with the theme ir_black and NERDTree.
Looks all right to me, although I personally prefer creating and using a dictionary helper function:
<?php echo dictionary("showcase_li2"); ?>
that would enable you to easily switch methods later, and gives you generally more control over your dictionary. Also with an array, you will have the problem of scope - you will have to import it into every function using global $language; very annoying.
You will probably also reach the point when you have to insert values into an internationalized string:
You have %1 votes left in the next %2 hours.
Sie haben %1 stimmen übrig für die nächsten %2 stunden.
Sinulla on %1 ääntä jäljellä seuraavan %2 tunnin ajassa.
that is something a helper function can be very useful for:
<?php echo dictionary("xyz", $value1, $value2 ); ?>
$value1 and $value2 would be inserted into %1 and %2 in the dictionary string.
Such a helper function can easily be built with an unlimited number of parameters using func_get_args().
It's OK generally. For instance, punBB's localization works this way. It is very fast. Faster than calling a function or an object's method or property. But I see a problem with this approach, since it doesn't support language fallbacks easily. I mean, if you don't have a string for Chinese, let it be displayed in English.
This problem is topical when you upgrade your system and you don't have time to translate everything in every language.
I'd better use something like
lang.en.php
$langs['en'] = array(
...
);
lang.cn.php
$langs['cn'] = array(
...
);
[prepend].php (some common lib)
define('DEFAULT_LANG', 'en');
include_once('lang.' . DEFAULT_LANG '.php');
include_once('lang.' . $user->lang . '.php');
$lang = array_merge($langs[DEFAULT_LANG], $langs[$user->lang]);
Looks all right to me also, but:
Seems that you have localization for multiple modules/sites, so why not break it down to multidimensional array?
$localization = array(
'module' => (object)array(
'heading' => 'oh, no!',
'perex' => 'oh, yes!'
)
);
I personally like to creat stdClass out of arrays with
$localization = (object)$localization;
so you can use
$localization->module->heading;
:) my 2 cents
The only way that this could be xss is if you have register_globals=On and you don't set $lang['showcase_lil'] or other $lang's. But I don't think you have to worry about this. So I think your in the clear.
as an xss test:
http://127.0.0.1/whatever.php?lang[showcase_lil]=alert(/xss/)
Wouldn't it have been better to post code and briefly explain this issue to us?
Anyway, putting each language in its own file and loading it through some sort of language component seems okay. I'd prefer using some sort of gettext, but this is okay too, I guess.
You should make a function for calling the language keys rather than relying on an array, something like
<?php echo lang('yourKey'); ?>
One thing to watch for is interpolation; that's really the only place XSS could sneak in if your server settings are sensible. If you at any point need to do something along the lines of translating "$project->name has $project->member_count members", you'll have to make sure you escape all HTML that goes in there.
But other than that, you should be fine.

How does gettext handle dynamic content?

In php (or maybe gettext in general), what does gettext do when it sees a variable to dynamic content?
I have 2 cases in mind.
1) Let's say I have <?=$user1?> poked John <?=$user2?>. Maybe in some language the order of the words is different. How does gettext handle that? (no, I'm not building facebook, that was just an example)
2) Let's say I store some categories in a database. They rarely, but they are store in a database. What would happen if I do <?php echo gettext($data['name']); ?> ? I would like the translators to translate those category names too, but does it have to be done in the database itself?
Thanks
Your best option is to use sprintf() function. Then you would use printf notation to handle dynamic content in your strings. Here is a function I found on here a while ago to handle this easily for you:
function translate()
{
$args = func_get_args();
$num = func_num_args();
$args[0] = gettext($args[0]);
if($num <= 1)
return $args[0];
return call_user_func_array('sprintf', $args);
}
Now for example 1, you would want to change the string to:
%s poked %s
Which you would input into the translate() function like this:
<?php echo translate('%s poked %s', $user1, $user2); ?>
You would parse out all translate() functions with poEdit. and then translate the string "%s poked %s" into whatever language you wanted, without modifying the %s string placeholders. Those would get replace upon output by the translate() function with user1 and user2 respectively. You can read more on sprintf() in the PHP Manual for more advanced usages.
For issue #2. You would need to create a static file which poEdit could parse containing the category names. For example misctranslations.php:
<?php
_('Cars');
_('Trains');
_('Airplanes');
Then have poEdit parse misctranslations.php. You would then be able to output the category name translation using <?php echo gettext($data['name']); ?>
To build a little on what Mark said... the only problem with the above solution is that the static list must be always maintained by hand and if you add a new string before all the others or you completely change an existing one, the soft you use for translating might confuse the new strings and you could lose some translations.
I'm actually writing an article about this (too little time to finish it anytime soon!) but my proposed answer goes something like this:
Gettext allows you to store the line number that the string appears in the code inside the .po file. If you change the string entirely, the .po editor will know that the string is not new but it is an old one (thanks to the line number).
My solution to this is to write a script that reads the database and creates a static file with all the gettext strings. The big difference to Mark's solution is to have the primary key (let's call it ID) on the database match the line number in the new file. In that case, if you completely change one original translation, the lines are still the same and your translator soft will recognize the strings.
Of course there might be newer and more intelligent .po editors out there but at least if yours is giving you trouble with newer strings then this will solve them.
My 2 cents.
If you have somewhere in your code:
<?=sprintf(_('%s poked %s'), $user1, $user2)?>
and one of your languages needs to swap the arguments it is very simple. Simply translate your code like this:
msgid "%s poked %s"
msgstr "%2$s translation_of_poked %1$s"

Categories