preg_match() return the longest match - php

Disclaimer: Unless I'm not seeing, this is not the same as described in this topic and I'll need some time to fully explain the situation.
A very long time ago I asked a question about how to use the REGEXP operator in a SQLite statement. The operator is not implemented by default but it can be in runtime.
Well.. As you can see by the dates, that worked for me for years. Not entirely with the help of that topic (someone in SQLite mailing list showed me a trick) but it worked well.
Because preg_match() works differently of ereg() (as I've heard), by returning positively immediately after the first match, I took an additional precaution before populate that SQLite database by sorting the REGEXES (I'm sure this is not the right plural form) from the longest (more specific) to the shortest (more generic).
Not a big deal, just a simple uasort() using strlen().
Considering a fictional movie management catalog with two URLs like management/actors and management/actors/add, this sorting trick saved me from having false-positives while accessing the first URL and have SQLite responding with the second resultset only because both of them have the same immutable part management/actors
This is the current implementation:
$this -> dbh
-> sqliteCreateFunction(
'REGEXP',
function( $r, $s ) {
return ( preg_match( sprintf( '#^%s$#i', $r ), $s ) != 0 );
},
2
);
Being $this -> dbh the class property with the PDO instance used.
The situation now is different because this sorting trick did not consider one possibility: Two routes where the most generic is also the longest route. For example: management/actors/add and management/actors/overview.
Conceptually, the second route is more generic because it refers to a simple dashboard listing all the actors and should go to the bottom. In fact it's just an alias for management/actors.
And the second route is more specific because it routes the form responsible to add a new record and thus should go to the top
These routes are analyzed from PHP doc comments spread around Controller classes like this:
/**
* Overview
*
* !Route GET, management/actors
* !Route GET, management/actors/overview
*/
final public function overview() {}
/**
* Add
*
* !Route GET, management/actors/add
*/
final public function add() {}
And comes in this order:
management/actors
management/actors/overview
management/actors/add
Because I need to identify the Action method they're structurd in different array indexes with the class method name as key:
Array(
'overview' => array(
[0] => management/actors
[1] => management/actors/overview
),
'add' => array(
[0] => 'management/actors/add'
)
)
Like I said the sorting works and can make this structure become:
Array(
'overview' => array(
[0] => management/actors/overview
[1] => management/actors
),
'add' => array(
[0] => 'management/actors/add'
)
)
But the whole component fails to work because of preg_match() implementation of SQLite REGEX operator.
So far, the only way, I managed to workaround this issue was to develop the Controller classes like this:
/**
* Add
*
* !Route GET, management/projects/add
*/
final public function add() {}
/**
* Overview
*
* !Route GET, management/projects
* !Route GET, management/projects/overview
*/
final public function overview() {}
I mean, in the reverse order.
Fine! All applications in a certain way have their own oddities and I could live with that but, perhaps, in the future, someone else may open this code and give it some maintenance and may not be aware of this limitation.
That said (finally) I wish to know if there is a way to increase preg_match()'s gluttony and make it not stop in the first positive occurrence and match the longest possible like ereg does, or at least I think it does, I've never saw it in action.
Or an alternative solution, of course :p
As requested in comments, some examples of the REGEXES, listed in the order they're inserted in the SQLite database (after sorting):
management/projects/overview\b(.*?)
management/projects/add\b(.*?)
management/projects\b(.*?)
They're very simple REGEXES. They match mainly the Request URI. In the end of the REGEX I have only a border to distinguish the fixed string text from the variable part that might exists (like Xdebug profiling GET argument)

Related

Push into referenced associative, multidimensional array returned by class method

As part of my ongoing efforts to simplify the legacy codebase for a CodeIgniter3 application, I'm currently running into a problem. In short, I've dealt with an error statement earlier stating:
can't use method return value in write context
which might ring a bell for some readers. Nonetheless, I haven't seen this error since but I suspect that something is still going wrong. In short, I'm trying to push an associative array into another associative, multidimensional array which is the result of a method that returns a reference.
I've set up a system to easily alter the contents of a JSON, which is returned by reference through this method:
/**
* Function : items
* Target : Retrieves a reference to the items, decoded
*
* #author : Angev
* #since : 2.0
* #version : 1.0
*
* #return Referenced link to index 'items'.
*/
public function & items()
{
$list = (is_array($this->data)) ? $this->data : json_decode($this->data, true);
return $list['items'];
}
In short, the method $this->items() is part of a Model named 'CheckList'. The Checklist corresponds with a data-table in my database, which includes a 'data' column that represents the data belonging to a certain checklist: the JSON I'm trying to alter and which you can see being returned in this method. Another way of seeing the output of $this->items() is that it should return a reference to $this->data['items'].
This goes all and well, I've used this method many times during development - as a shorthand to accessing ['list'] - and it always returns exactly what I need it to return: a multidimensional array filled with unique indexes (strings) that contain the data belonging to each item of the checklist.
The problem however, arises in a method named update_checklist() in particular the following section:
$this->items()[$uid] = [
'parent_id' => $parent['id'],
... ,
];
I'd expect the method to add an index to the array returned by $this->items(), but it doesn't.
I'm not quite sure what goes wrong in this context, since I have earlier seen the error message written at the top of this question, but haven't seen it since.
However, no index is added to the array and whenever I do an immediate var_dump($this->items()) afterwards. It just shows the state of the array as it was before the execution of update_checklist().
In search of an answer, I've also tried wrapping the callback in parentheses, but to no avail:
( $this->items() )[$uid] = . . .
To temporary fix the problem, I've resorted to a more direct alteration of the ['items'] array by doing the following:
$this->data = json_decode($this->data, true);
$this->data['items'][$uid] = [
'parent_id' => $parent['id'],
... ,
];
Nonetheless, even though the code above works, I'm left wondering what the flaw is in my logic concerning the method reference return of $this->items() and why I cannot use this method when pushing into the referenced array.
How can I write the required changes to make $this->items()[] function as intended? Or I'd be interested in more clarity into the theory behind this structure and why it can't work.
As always, once you start formulating a question, you stumble upon the flaws in your logic. I've read over this question with a colleague and while discussing the solution just magically presented itself. I'll include the answer for future reference to anyone having the same problem.
/**
* Function : items
* Target : Retrieves a reference to the items, decoded
*
* #author : Angev
* #since : 2.0
* #version : 1.0
*
* #return Referenced link to index 'items'.
*/
public function & items()
{
$list = (is_array($this->data)) ? $this->data : json_decode($this->data, true);
return $list['items'];
}
The problem lies withing the items() method. This method surely returns a reference, but the reference is made to the preliminary variable $list, which in turn has no direct reference to $this->data. So instead of refering to $this->data['items'], the method returns a reference to $list, which is essentially a copy of $this->data, no no real reference.
To fix the problem, the following code was used:
public function & items()
{
if(!is_array($this->data) ) $this->data = json_decode($this->data, true);
return $this->data['items'];
}
As expected, the method now returns a reference to the actual data object.
So in short, what I've learned is that if you let a method return a reference, you need to make sure that whatever the method returns is actually a reference instead of a copy of the data you're trying to reference to.
I'll leave this question open for now to allow others to share any knowledge of insights in this matter.

PHPStorm code completion fails for array within array

I have this basic code:
$test = array(
'nested' => array('test' => 'nada');
);
function doit()
{
global $test;
$test['nested'][''];
}
PHPStorm correctly suggests me 'nested' when I press Ctrl+Space
However, I found no way to make it suggest 'test' as member of the 'nested' array.
PHPStorm does not seem to be aware that nested is an array and also has members.
Is that a bug, did I do something wrong ?
According to comment I tried another solution to get completion support:
With no luck either
class test
{
public static $nested = array('test' => 'nada');
}
function doit()
{
$completeme = test::$nested;
$completeme['']; // no completion working
/** #var test::nested $completeme */
$completeme = test::$nested;
$completeme['']; // no completion working
}
Same issue for me at the IDE, this time it is an array inside a class.
test::nested[''] << this works, I get completion. But as soon as I make a copy of it I found no way to get completion again or to specify the type using phpdoc.
This feature is not implemented (original array keys support ticket).
AFAIK even remembering keys for first level array adds some noticeable overhead (memory + CPU -- depends on actual project and how heavily arrays/variables are used). Considering this + the fact that in majority of cases first level is enough, the implementation for other levels was simply put on hold.
https://youtrack.jetbrains.com/issue/WI-6845 -- star/vote/comment to get notified on progress.

Best practice multi language website

I've been struggling with this question for quite some months now, but I haven't been in a situation that I needed to explore all possible options before. Right now, I feel like it's time to get to know the possibilities and create my own personal preference to use in my upcoming projects.
Let me first sketch the situation I'm looking for
I'm about to upgrade/redevelop a content management system which I've been using for quite a while now. However, I'm feeling multi language is a great improvement to this system. Before I did not use any frameworks but I'm going to use Laraval4 for the upcoming project. Laravel seems the best choice of a cleaner way to code PHP. Sidenote: Laraval4 should be no factor in your answer. I'm looking for general ways of translation that are platform/framework independent.
What should be translated
As the system I am looking for needs to be as user friendly as possible the method of managing the translation should be inside the CMS. There should be no need to start up an FTP connection to modify translation files or any html/php parsed templates.
Furthermore, I'm looking for the easiest way to translate multiple database tables perhaps without the need of making additional tables.
What did I come up with myself
As I've been searching, reading and trying things myself already. There are a couple of options I have. But I still don't feel like I've reached a best practice method for what I am really seeking. Right now, this is what I've come up with, but this method also has it side effects.
PHP Parsed Templates: the template system should be parsed by PHP. This way I'm able to insert the translated parameters into the HTML without having to open the templates and modify them. Besides that, PHP parsed templates gives me the ability to have 1 template for the complete website instead of having a subfolder for each language (which I've had before). The method to reach this target can be either Smarty, TemplatePower, Laravel's Blade or any other template parser. As I said this should be independent to the written solution.
Database Driven: perhaps I don't need to mention this again. But the solution should be database driven. The CMS is aimed to be object oriented and MVC, so I would need to think of a logical data structure for the strings. As my templates would be structured: templates/Controller/View.php perhaps this structure would make the most sense: Controller.View.parameter. The database table would have these fields a long with a value field. Inside the templates we could use some sort method like echo __('Controller.View.welcome', array('name', 'Joshua')) and the parameter contains Welcome, :name. Thus the result being Welcome, Joshua. This seems a good way to do this, because the parameters such as :name are easy to understand by the editor.
Low Database Load: Of course the above system would cause loads of database load if these strings are being loaded on the go. Therefore I would need a caching system that re-renders the language files as soon as they are edited/saved in the administration environment. Because files are generated, also a good file system layout is needed. I guess we can go with languages/en_EN/Controller/View.php or .ini, whatever suits you best. Perhaps an .ini is even parsed quicker in the end. This fould should contain the data in the format parameter=value;
. I guess this is the best way of doing this, since each View that is rendered can include it's own language file if it exists. Language parameters then should be loaded to a specific view and not in a global scope to prevent parameters from overwriting each other.
Database Table translation: this in fact is the thing I'm most worried about. I'm looking for a way to create translations of News/Pages/etc. as quickly as possible. Having two tables for each module (for example News and News_translations) is an option but it feels like to much work to get a good system. One of the things I came up with is based on a data versioning system I wrote: there is one database table name Translations, this table has a unique combination of language, tablename and primarykey. For instance: en_En / News / 1 (Referring to the English version of the News item with ID=1). But there are 2 huge disadvantages to this method: first of all this table tends to get pretty long with a lot of data in the database and secondly it would be a hell of a job to use this setup to search the table. E.g. searching for the SEO slug of the item would be a full text search, which is pretty dumb. But on the other hand: it's a quick way to create translatable content in every table very fast, but I don't believe this pro overweights the con's.
Front-end Work: Also the front-end would need some thinking. Of course we would store the available languages in a database and (de)active the ones we need. This way the script can generate a dropdown to select a language and the back-end can decide automatically what translations can be made using the CMS. The chosen language (e.g. en_EN) would then be used when getting the language file for a view or to get the right translation for a content item on the website.
So, there they are. My ideas so far. They don't even include localization options for dates etc yet, but as my server supports PHP5.3.2+ the best option is to use the intl extension as explained here: http://devzone.zend.com/1500/internationalization-in-php-53/ - but this would be of use in any later stadium of development. For now the main issue is how to have the best practics of translation of the content in a website.
Besides everything I explained here, I still have another thing which I haven't decided yet, it looks like a simple question, but in fact it's been giving me headaches:
URL Translation? Should we do this or not? and in what way?
So.. if I have this url: http://www.domain.com/about-us and English is my default language. Should this URL be translated into http://www.domain.com/over-ons when I choose Dutch as my language? Or should we go the easy road and simply change the content of the page visible at /about. The last thing doesn't seem a valid option because that would generate multiple versions of the same URL, this indexing the content will fail the right way.
Another option is using http://www.domain.com/nl/about-us instead. This generates at least a unique URL for each content. Also this would be easier to go to another language, for example http://www.domain.com/en/about-us and the URL provided is easier to understand for both Google and Human visitors. Using this option, what do we do with the default languages? Should the default language remove the language selected by default? So redirecting http://www.domain.com/en/about-us to http://www.domain.com/about-us ... In my eyes this is the best solution, because when the CMS is setup for only one language there is no need to have this language identification in the URL.
And a third option is a combination from both options: using the "language-identification-less"-URL (http://www.domain.com/about-us) for the main language. And use an URL with a translated SEO slug for sublanguages: http://www.domain.com/nl/over-ons & http://www.domain.com/de/uber-uns
I hope my question gets your heads cracking, they cracked mine for sure! It did help me already to work things out as a question here. Gave me a possibility to review the methods I've used before and the idea's I'm having for my upcoming CMS.
I would like to thank you already for taking the time to read this bunch of text!
// Edit #1:
I forgot to mention: the __() function is an alias to translate a given string. Within this method there obviously should be some sort of fallback method where the default text is loaded when there are not translations available yet. If the translation is missing it should either be inserted or the translation file should be regenerated.
Topic's premise
There are three distinct aspects in a multilingual site:
interface translation
content
url routing
While they all interconnected in different ways, from CMS point of view they are managed using different UI elements and stored differently. You seem to be confident in your implementation and understanding of the first two. The question was about the latter aspect - "URL Translation? Should we do this or not? and in what way?"
What the URL can be made of?
A very important thing is, don't get fancy with IDN. Instead favor transliteration (also: transcription and romanization). While at first glance IDN seems viable option for international URLs, it actually does not work as advertised for two reasons:
some browsers will turn the non-ASCII chars like 'ч' or 'ž' into '%D1%87' and '%C5%BE'
if user has custom themes, the theme's font is very likely to not have symbols for those letters
I actually tried to IDN approach few years ago in a Yii based project (horrible framework, IMHO). I encountered both of the above mentioned problems before scraping that solution. Also, I suspect that it might be an attack vector.
Available options ... as I see them.
Basically you have two choices, that could be abstracted as:
http://site.tld/[:query]: where [:query] determines both language and content choice
http://site.tld/[:language]/[:query]: where [:language] part of URL defines the choice of language and [:query] is used only to identify the content
Query is Α and Ω ..
Lets say you pick http://site.tld/[:query].
In that case you have one primary source of language: the content of [:query] segment; and two additional sources:
value $_COOKIE['lang'] for that particular browser
list of languages in HTTP Accept-Language (1), (2) header
First, you need to match the query to one of defined routing patterns (if your pick is Laravel, then read here). On successful match of pattern you then need to find the language.
You would have to go through all the segments of the pattern. Find the potential translations for all of those segments and determine which language was used. The two additional sources (cookie and header) would be used to resolve routing conflicts, when (not "if") they arise.
Take for example: http://site.tld/blog/novinka.
That's transliteration of "блог, новинка", that in English means approximately "blog", "latest".
As you can already notice, in Russian "блог" will be transliterated as "blog". Which means that for the first part of [:query] you (in the best case scenario) will end up with ['en', 'ru'] list of possible languages. Then you take next segment - "novinka". That might have only one language on the list of possibilities: ['ru'].
When the list has one item, you have successfully found the language.
But if you end up with 2 (example: Russian and Ukrainian) or more possibilities .. or 0 possibilities, as a case might be. You will have to use cookie and/or header to find the correct option.
And if all else fails, you pick the site's default language.
Language as parameter
The alternative is to use URL, that can be defined as http://site.tld/[:language]/[:query]. In this case, when translating query, you do not need to guess the language, because at that point you already know which to use.
There is also a secondary source of language: the cookie value. But here there is no point in messing with Accept-Language header, because you are not dealing with unknown amount of possible languages in case of "cold start" (when user first time opens site with custom query).
Instead you have 3 simple, prioritized options:
if [:language] segment is set, use it
if $_COOKIE['lang'] is set, use it
use default language
When you have the language, you simply attempt to translate the query, and if translation fails, use the "default value" for that particular segment (based on routing results).
Isn't here a third option?
Yes, technically you can combine both approaches, but that would complicate the process and only accommodate people who want to manually change URL of http://site.tld/en/news to http://site.tld/de/news and expect the news page to change to German.
But even this case could probable be mitigated using cookie value (which would contain information about previous choice of language), to implement with less magic and hope.
Which approach to use?
As you might already guessed, I would recommend http://site.tld/[:language]/[:query] as the more sensible option.
Also in real word situation you would have 3rd major part in URL: "title". As in name of the product in online shop or headline of article in news site.
Example: http://site.tld/en/news/article/121415/EU-as-global-reserve-currency
In this case '/news/article/121415' would be the query, and the 'EU-as-global-reserve-currency' is title. Purely for SEO purposes.
Can it be done in Laravel?
Kinda, but not by default.
I am not too familiar with it, but from what I have seen, Laravel uses simple pattern-based routing mechanism. To implement multilingual URLs you will probably have to extend core class(es), because multilingual routing need access to different forms of storage (database, cache and/or configuration files).
It's routed. What now?
As a result of all you would end up with two valuable pieces of information: current language and translated segments of query. These values then can be used to dispatch to the class(es) which will produce the result.
Basically, the following URL: http://site.tld/ru/blog/novinka (or the version without '/ru') gets turned into something like
$parameters = [
'language' => 'ru',
'classname' => 'blog',
'method' => 'latest',
];
Which you just use for dispatching:
$instance = new {$parameter['classname']};
$instance->{'get'.$parameters['method']}( $parameters );
.. or some variation of it, depending on the particular implementation.
Implementing i18n Without The Performance Hit Using a Pre-Processor as suggested by Thomas Bley
At work, we recently went through implementation of i18n on a couple of our properties, and one of the things we kept struggling with was the performance hit of dealing with on-the-fly translation, then I discovered this great blog post by Thomas Bley which inspired the way we're using i18n to handle large traffic loads with minimal performance issues.
Instead of calling functions for every translation operation, which as we know in PHP is expensive, we define our base files with placeholders, then use a pre-processor to cache those files (we store the file modification time to make sure we're serving the latest content at all times).
The Translation Tags
Thomas uses {tr} and {/tr} tags to define where translations start and end. Due to the fact that we're using TWIG, we don't want to use { to avoid confusion so we use [%tr%] and [%/tr%] instead. Basically, this looks like this:
`return [%tr%]formatted_value[%/tr%];`
Note that Thomas suggests using the base English in the file. We don't do this because we don't want to have to modify all of the translation files if we change the value in English.
The INI Files
Then, we create an INI file for each language, in the format placeholder = translated:
// lang/fr.ini
formatted_value = number_format($value * Model_Exchange::getEurRate(), 2, ',', ' ') . '€'
// lang/en_gb.ini
formatted_value = '£' . number_format($value * Model_Exchange::getStgRate())
// lang/en_us.ini
formatted_value = '$' . number_format($value)
It would be trivial to allow a user to modify these inside the CMS, just get the keypairs by a preg_split on \n or = and making the CMS able to write to the INI files.
The Pre-Processor Component
Essentially, Thomas suggests using a just-in-time 'compiler' (though, in truth, it's a preprocessor) function like this to take your translation files and create static PHP files on disk. This way, we essentially cache our translated files instead of calling a translation function for every string in the file:
// This function was written by Thomas Bley, not by me
function translate($file) {
$cache_file = 'cache/'.LANG.'_'.basename($file).'_'.filemtime($file).'.php';
// (re)build translation?
if (!file_exists($cache_file)) {
$lang_file = 'lang/'.LANG.'.ini';
$lang_file_php = 'cache/'.LANG.'_'.filemtime($lang_file).'.php';
// convert .ini file into .php file
if (!file_exists($lang_file_php)) {
file_put_contents($lang_file_php, '<?php $strings='.
var_export(parse_ini_file($lang_file), true).';', LOCK_EX);
}
// translate .php into localized .php file
$tr = function($match) use (&$lang_file_php) {
static $strings = null;
if ($strings===null) require($lang_file_php);
return isset($strings[ $match[1] ]) ? $strings[ $match[1] ] : $match[1];
};
// replace all {t}abc{/t} by tr()
file_put_contents($cache_file, preg_replace_callback(
'/\[%tr%\](.*?)\[%\/tr%\]/', $tr, file_get_contents($file)), LOCK_EX);
}
return $cache_file;
}
Note: I didn't verify that the regex works, I didn't copy it from our company server, but you can see how the operation works.
How to Call It
Again, this example is from Thomas Bley, not from me:
// instead of
require("core/example.php");
echo (new example())->now();
// we write
define('LANG', 'en_us');
require(translate('core/example.php'));
echo (new example())->now();
We store the language in a cookie (or session variable if we can't get a cookie) and then retrieve it on every request. You could combine this with an optional $_GET parameter to override the language, but I don't suggest subdomain-per-language or page-per-language because it'll make it harder to see which pages are popular and will reduce the value of inbound links as you'll have them more scarcely spread.
Why use this method?
We like this method of preprocessing for three reasons:
The huge performance gain from not calling a whole bunch of functions for content which rarely changes (with this system, 100k visitors in French will still only end up running translation replacement once).
It doesn't add any load to our database, as it uses simple flat-files and is a pure-PHP solution.
The ability to use PHP expressions within our translations.
Getting Translated Database Content
We just add a column for content in our database called language, then we use an accessor method for the LANG constant which we defined earlier on, so our SQL calls (using ZF1, sadly) look like this:
$query = select()->from($this->_name)
->where('language = ?', User::getLang())
->where('id = ?', $articleId)
->limit(1);
Our articles have a compound primary key over id and language so article 54 can exist in all languages. Our LANG defaults to en_US if not specified.
URL Slug Translation
I'd combine two things here, one is a function in your bootstrap which accepts a $_GET parameter for language and overrides the cookie variable, and another is routing which accepts multiple slugs. Then you can do something like this in your routing:
"/wilkommen" => "/welcome/lang/de"
... etc ...
These could be stored in a flat file which could be easily written to from your admin panel. JSON or XML may provide a good structure for supporting them.
Notes Regarding A Few Other Options
PHP-based On-The-Fly Translation
I can't see that these offer any advantage over pre-processed translations.
Front-end Based Translations
I've long found these interesting, but there are a few caveats. For example, you have to make available to the user the entire list of phrases on your website that you plan to translate, this could be problematic if there are areas of the site you're keeping hidden or haven't allowed them access to.
You'd also have to assume that all of your users are willing and able to use Javascript on your site, but from my statistics, around 2.5% of our users are running without it (or using Noscript to block our sites from using it).
Database-Driven Translations
PHP's database connectivity speeds are nothing to write home about, and this adds to the already high overhead of calling a function on every phrase to translate. The performance & scalability issues seem overwhelming with this approach.
I suggest you not to invent a wheel and use gettext and ISO languages abbrevs list. Have you seen how i18n/l10n implemented in popular CMSes or frameworks?
Using gettext you will have a powerful tool where many of cases is already implemented like plural forms of numbers. In english you have only 2 options: singular and plural. But in Russian for example there are 3 forms and its not as simple as in english.
Also many translators already have experience to work with gettext.
Take a look to CakePHP or Drupal . Both multilingual enabled. CakePHP as example of interface localization and Drupal as example of content translation.
For l10n using database isn't the case at all. It will be tons on queries. Standard approach is to get all l10n data in memory in early stage (or during first call to i10n function if you prefer lazy loading). It can be reading from .po file or from DB all data at once. And than just read requested strings from array.
If you need to implement online tool to translate interface you can have all that data in DB but than still save all data to file to work with it. To reduce amount of data in memory you can split all your translated messages/strings into groups and than load only that groups you need if it will be possible.
So you totally right in your #3. With one exception: usually it is one big file not a per-controller file or so. Because it is best for performance to open one file. You probably know that some highloaded web apps compiles all PHP code in one file to avoid file operations when include/require called.
About URLs. Google indirectly suggest to use translation:
to clearly indicate French content:
http://example.ca/fr/vélo-de-montagne.html
Also i think you need to redirect user to default language prefix e.g. http://examlpe.com/about-us will redirects to http://examlpe.com/en/about-us
But if your site use only one language so you don't need prefixes at all.
Check out:
http://www.audiomicro.com/trailer-hit-impact-psychodrama-sound-effects-836925
http://nl.audiomicro.com/aanhangwagen-hit-effect-psychodrama-geluidseffecten-836925
http://de.audiomicro.com/anhanger-hit-auswirkungen-psychodrama-sound-effekte-836925
Translating content is more difficult task. I think it will be some differences with different types of content e.g. articles, menu items etc. But in #4 you're in the right way. Take a look in Drupal to have more ideas. It have clear enough DB schema and good enough interface for translating. Like you creating article and select language for it. And than you can later translate it to other languages.
I think it isn't problem with URL slugs. You can just create separate table for slugs and it will be right decision. Also using right indexes it isn't problem to query table even with huge amount of data.
And it wasn't full text search but string match if will use varchar data type for slug and you can have an index on that field too.
PS Sorry, my English is far from perfect though.
It depends on how much content your website has. At first I used a database like all other people here, but it can be time-consuming to script all the workings of a database. I don't say that this is an ideal method and especially if you have a lot of text, but if you want to do it fast without using a database, this method could work, though, you can't allow users to input data which will be used as translation-files. But if you add the translations yourself, it will work:
Let's say you have this text:
Welcome!
You can input this in a database with translations, but you can also do this:
$welcome = array(
"English"=>"Welcome!",
"German"=>"Willkommen!",
"French"=>"Bienvenue!",
"Turkish"=>"Hoşgeldiniz!",
"Russian"=>"Добро пожаловать!",
"Dutch"=>"Welkom!",
"Swedish"=>"Välkommen!",
"Basque"=>"Ongietorri!",
"Spanish"=>"Bienvenito!"
"Welsh"=>"Croeso!");
Now, if your website uses a cookie, you have this for example:
$_COOKIE['language'];
To make it easy let's transform it in a code which can easily be used:
$language=$_COOKIE['language'];
If your cookie language is Welsh and you have this piece of code:
echo $welcome[$language];
The result of this will be:
Croeso!
If you need to add a lot of translations for your website and a database is too consuming, using an array can be an ideal solution.
I will suggest you not to really depend of database for translation it could be really a messy task and could be a extreme problem in case of data encoding.
I had face similar issue while ago and written following class to solve my problem
Object: Locale\Locale
<?php
namespace Locale;
class Locale{
// Following array stolen from Zend Framework
public $country_to_locale = array(
'AD' => 'ca_AD',
'AE' => 'ar_AE',
'AF' => 'fa_AF',
'AG' => 'en_AG',
'AI' => 'en_AI',
'AL' => 'sq_AL',
'AM' => 'hy_AM',
'AN' => 'pap_AN',
'AO' => 'pt_AO',
'AQ' => 'und_AQ',
'AR' => 'es_AR',
'AS' => 'sm_AS',
'AT' => 'de_AT',
'AU' => 'en_AU',
'AW' => 'nl_AW',
'AX' => 'sv_AX',
'AZ' => 'az_Latn_AZ',
'BA' => 'bs_BA',
'BB' => 'en_BB',
'BD' => 'bn_BD',
'BE' => 'nl_BE',
'BF' => 'mos_BF',
'BG' => 'bg_BG',
'BH' => 'ar_BH',
'BI' => 'rn_BI',
'BJ' => 'fr_BJ',
'BL' => 'fr_BL',
'BM' => 'en_BM',
'BN' => 'ms_BN',
'BO' => 'es_BO',
'BR' => 'pt_BR',
'BS' => 'en_BS',
'BT' => 'dz_BT',
'BV' => 'und_BV',
'BW' => 'en_BW',
'BY' => 'be_BY',
'BZ' => 'en_BZ',
'CA' => 'en_CA',
'CC' => 'ms_CC',
'CD' => 'sw_CD',
'CF' => 'fr_CF',
'CG' => 'fr_CG',
'CH' => 'de_CH',
'CI' => 'fr_CI',
'CK' => 'en_CK',
'CL' => 'es_CL',
'CM' => 'fr_CM',
'CN' => 'zh_Hans_CN',
'CO' => 'es_CO',
'CR' => 'es_CR',
'CU' => 'es_CU',
'CV' => 'kea_CV',
'CX' => 'en_CX',
'CY' => 'el_CY',
'CZ' => 'cs_CZ',
'DE' => 'de_DE',
'DJ' => 'aa_DJ',
'DK' => 'da_DK',
'DM' => 'en_DM',
'DO' => 'es_DO',
'DZ' => 'ar_DZ',
'EC' => 'es_EC',
'EE' => 'et_EE',
'EG' => 'ar_EG',
'EH' => 'ar_EH',
'ER' => 'ti_ER',
'ES' => 'es_ES',
'ET' => 'en_ET',
'FI' => 'fi_FI',
'FJ' => 'hi_FJ',
'FK' => 'en_FK',
'FM' => 'chk_FM',
'FO' => 'fo_FO',
'FR' => 'fr_FR',
'GA' => 'fr_GA',
'GB' => 'en_GB',
'GD' => 'en_GD',
'GE' => 'ka_GE',
'GF' => 'fr_GF',
'GG' => 'en_GG',
'GH' => 'ak_GH',
'GI' => 'en_GI',
'GL' => 'iu_GL',
'GM' => 'en_GM',
'GN' => 'fr_GN',
'GP' => 'fr_GP',
'GQ' => 'fan_GQ',
'GR' => 'el_GR',
'GS' => 'und_GS',
'GT' => 'es_GT',
'GU' => 'en_GU',
'GW' => 'pt_GW',
'GY' => 'en_GY',
'HK' => 'zh_Hant_HK',
'HM' => 'und_HM',
'HN' => 'es_HN',
'HR' => 'hr_HR',
'HT' => 'ht_HT',
'HU' => 'hu_HU',
'ID' => 'id_ID',
'IE' => 'en_IE',
'IL' => 'he_IL',
'IM' => 'en_IM',
'IN' => 'hi_IN',
'IO' => 'und_IO',
'IQ' => 'ar_IQ',
'IR' => 'fa_IR',
'IS' => 'is_IS',
'IT' => 'it_IT',
'JE' => 'en_JE',
'JM' => 'en_JM',
'JO' => 'ar_JO',
'JP' => 'ja_JP',
'KE' => 'en_KE',
'KG' => 'ky_Cyrl_KG',
'KH' => 'km_KH',
'KI' => 'en_KI',
'KM' => 'ar_KM',
'KN' => 'en_KN',
'KP' => 'ko_KP',
'KR' => 'ko_KR',
'KW' => 'ar_KW',
'KY' => 'en_KY',
'KZ' => 'ru_KZ',
'LA' => 'lo_LA',
'LB' => 'ar_LB',
'LC' => 'en_LC',
'LI' => 'de_LI',
'LK' => 'si_LK',
'LR' => 'en_LR',
'LS' => 'st_LS',
'LT' => 'lt_LT',
'LU' => 'fr_LU',
'LV' => 'lv_LV',
'LY' => 'ar_LY',
'MA' => 'ar_MA',
'MC' => 'fr_MC',
'MD' => 'ro_MD',
'ME' => 'sr_Latn_ME',
'MF' => 'fr_MF',
'MG' => 'mg_MG',
'MH' => 'mh_MH',
'MK' => 'mk_MK',
'ML' => 'bm_ML',
'MM' => 'my_MM',
'MN' => 'mn_Cyrl_MN',
'MO' => 'zh_Hant_MO',
'MP' => 'en_MP',
'MQ' => 'fr_MQ',
'MR' => 'ar_MR',
'MS' => 'en_MS',
'MT' => 'mt_MT',
'MU' => 'mfe_MU',
'MV' => 'dv_MV',
'MW' => 'ny_MW',
'MX' => 'es_MX',
'MY' => 'ms_MY',
'MZ' => 'pt_MZ',
'NA' => 'kj_NA',
'NC' => 'fr_NC',
'NE' => 'ha_Latn_NE',
'NF' => 'en_NF',
'NG' => 'en_NG',
'NI' => 'es_NI',
'NL' => 'nl_NL',
'NO' => 'nb_NO',
'NP' => 'ne_NP',
'NR' => 'en_NR',
'NU' => 'niu_NU',
'NZ' => 'en_NZ',
'OM' => 'ar_OM',
'PA' => 'es_PA',
'PE' => 'es_PE',
'PF' => 'fr_PF',
'PG' => 'tpi_PG',
'PH' => 'fil_PH',
'PK' => 'ur_PK',
'PL' => 'pl_PL',
'PM' => 'fr_PM',
'PN' => 'en_PN',
'PR' => 'es_PR',
'PS' => 'ar_PS',
'PT' => 'pt_PT',
'PW' => 'pau_PW',
'PY' => 'gn_PY',
'QA' => 'ar_QA',
'RE' => 'fr_RE',
'RO' => 'ro_RO',
'RS' => 'sr_Cyrl_RS',
'RU' => 'ru_RU',
'RW' => 'rw_RW',
'SA' => 'ar_SA',
'SB' => 'en_SB',
'SC' => 'crs_SC',
'SD' => 'ar_SD',
'SE' => 'sv_SE',
'SG' => 'en_SG',
'SH' => 'en_SH',
'SI' => 'sl_SI',
'SJ' => 'nb_SJ',
'SK' => 'sk_SK',
'SL' => 'kri_SL',
'SM' => 'it_SM',
'SN' => 'fr_SN',
'SO' => 'sw_SO',
'SR' => 'srn_SR',
'ST' => 'pt_ST',
'SV' => 'es_SV',
'SY' => 'ar_SY',
'SZ' => 'en_SZ',
'TC' => 'en_TC',
'TD' => 'fr_TD',
'TF' => 'und_TF',
'TG' => 'fr_TG',
'TH' => 'th_TH',
'TJ' => 'tg_Cyrl_TJ',
'TK' => 'tkl_TK',
'TL' => 'pt_TL',
'TM' => 'tk_TM',
'TN' => 'ar_TN',
'TO' => 'to_TO',
'TR' => 'tr_TR',
'TT' => 'en_TT',
'TV' => 'tvl_TV',
'TW' => 'zh_Hant_TW',
'TZ' => 'sw_TZ',
'UA' => 'uk_UA',
'UG' => 'sw_UG',
'UM' => 'en_UM',
'US' => 'en_US',
'UY' => 'es_UY',
'UZ' => 'uz_Cyrl_UZ',
'VA' => 'it_VA',
'VC' => 'en_VC',
'VE' => 'es_VE',
'VG' => 'en_VG',
'VI' => 'en_VI',
'VN' => 'vn_VN',
'VU' => 'bi_VU',
'WF' => 'wls_WF',
'WS' => 'sm_WS',
'YE' => 'ar_YE',
'YT' => 'swb_YT',
'ZA' => 'en_ZA',
'ZM' => 'en_ZM',
'ZW' => 'sn_ZW'
);
/**
* Store the transaltion for specific languages
*
* #var array
*/
protected $translation = array();
/**
* Current locale
*
* #var string
*/
protected $locale;
/**
* Default locale
*
* #var string
*/
protected $default_locale;
/**
*
* #var string
*/
protected $locale_dir;
/**
* Construct.
*
*
* #param string $locale_dir
*/
public function __construct($locale_dir)
{
$this->locale_dir = $locale_dir;
}
/**
* Set the user define localte
*
* #param string $locale
*/
public function setLocale($locale = null)
{
$this->locale = $locale;
return $this;
}
/**
* Get the user define locale
*
* #return string
*/
public function getLocale()
{
return $this->locale;
}
/**
* Get the Default locale
*
* #return string
*/
public function getDefaultLocale()
{
return $this->default_locale;
}
/**
* Set the default locale
*
* #param string $locale
*/
public function setDefaultLocale($locale)
{
$this->default_locale = $locale;
return $this;
}
/**
* Determine if transltion exist or translation key exist
*
* #param string $locale
* #param string $key
* #return boolean
*/
public function hasTranslation($locale, $key = null)
{
if (null == $key && isset($this->translation[$locale])) {
return true;
} elseif (isset($this->translation[$locale][$key])) {
return true;
}
return false;
}
/**
* Get the transltion for required locale or transtion for key
*
* #param string $locale
* #param string $key
* #return array
*/
public function getTranslation($locale, $key = null)
{
if (null == $key && $this->hasTranslation($locale)) {
return $this->translation[$locale];
} elseif ($this->hasTranslation($locale, $key)) {
return $this->translation[$locale][$key];
}
return array();
}
/**
* Set the transtion for required locale
*
* #param string $locale
* Language code
* #param string $trans
* translations array
*/
public function setTranslation($locale, $trans = array())
{
$this->translation[$locale] = $trans;
}
/**
* Remove transltions for required locale
*
* #param string $locale
*/
public function removeTranslation($locale = null)
{
if (null === $locale) {
unset($this->translation);
} else {
unset($this->translation[$locale]);
}
}
/**
* Initialize locale
*
* #param string $locale
*/
public function init($locale = null, $default_locale = null)
{
// check if previously set locale exist or not
$this->init_locale();
if ($this->locale != null) {
return;
}
if ($locale == null || (! preg_match('#^[a-z]+_[a-zA-Z_]+$#', $locale) && ! preg_match('#^[a-z]+_[a-zA-Z]+_[a-zA-Z_]+$#', $locale))) {
$this->detectLocale();
} else {
$this->locale = $locale;
}
$this->init_locale();
}
/**
* Attempt to autodetect locale
*
* #return void
*/
private function detectLocale()
{
$locale = false;
// GeoIP
if (function_exists('geoip_country_code_by_name') && isset($_SERVER['REMOTE_ADDR'])) {
$country = geoip_country_code_by_name($_SERVER['REMOTE_ADDR']);
if ($country) {
$locale = isset($this->country_to_locale[$country]) ? $this->country_to_locale[$country] : false;
}
}
// Try detecting locale from browser headers
if (! $locale) {
if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
$languages = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
foreach ($languages as $lang) {
$lang = str_replace('-', '_', trim($lang));
if (strpos($lang, '_') === false) {
if (isset($this->country_to_locale[strtoupper($lang)])) {
$locale = $this->country_to_locale[strtoupper($lang)];
}
} else {
$lang = explode('_', $lang);
if (count($lang) == 3) {
// language_Encoding_COUNTRY
$this->locale = strtolower($lang[0]) . ucfirst($lang[1]) . strtoupper($lang[2]);
} else {
// language_COUNTRY
$this->locale = strtolower($lang[0]) . strtoupper($lang[1]);
}
return;
}
}
}
}
// Resort to default locale specified in config file
if (! $locale) {
$this->locale = $this->default_locale;
}
}
/**
* Check if config for selected locale exists
*
* #return void
*/
private function init_locale()
{
if (! file_exists(sprintf('%s/%s.php', $this->locale_dir, $this->locale))) {
$this->locale = $this->default_locale;
}
}
/**
* Load a Transtion into array
*
* #return void
*/
private function loadTranslation($locale = null, $force = false)
{
if ($locale == null)
$locale = $this->locale;
if (! $this->hasTranslation($locale)) {
$this->setTranslation($locale, include (sprintf('%s/%s.php', $this->locale_dir, $locale)));
}
}
/**
* Translate a key
*
* #param
* string Key to be translated
* #param
* string optional arguments
* #return string
*/
public function translate($key)
{
$this->init();
$this->loadTranslation($this->locale);
if (! $this->hasTranslation($this->locale, $key)) {
if ($this->locale !== $this->default_locale) {
$this->loadTranslation($this->default_locale);
if ($this->hasTranslation($this->default_locale, $key)) {
$translation = $this->getTranslation($this->default_locale, $key);
} else {
// return key as it is or log error here
return $key;
}
} else {
return $key;
}
} else {
$translation = $this->getTranslation($this->locale, $key);
}
// Replace arguments
if (false !== strpos($translation, '{a:')) {
$replace = array();
$args = func_get_args();
for ($i = 1, $max = count($args); $i < $max; $i ++) {
$replace['{a:' . $i . '}'] = $args[$i];
}
// interpolate replacement values into the messsage then return
return strtr($translation, $replace);
}
return $translation;
}
}
Usage
<?php
## /locale/en.php
return array(
'name' => 'Hello {a:1}'
'name_full' => 'Hello {a:1} {a:2}'
);
$locale = new Locale(__DIR__ . '/locale');
$locale->setLocale('en');// load en.php from locale dir
//want to work with auto detection comment $locale->setLocale('en');
echo $locale->translate('name', 'Foo');
echo $locale->translate('name', 'Foo', 'Bar');
How it works
{a:1} is replaced by 1st argument passed to method Locale::translate('key_name','arg1')
{a:2} is replaced by 2nd argument passed to method Locale::translate('key_name','arg1','arg2')
How detection works
By default if geoip is installed then it will return country code by geoip_country_code_by_name and if geoip is not installed the fallback to HTTP_ACCEPT_LANGUAGE header
Just a sub answer:
Absolutely use translated urls with a language identifier in front of them: http://www.domain.com/nl/over-ons
Hybride solutions tend to get complicated, so I would just stick with it. Why? Cause the url is essential for SEO.
About the db translation: Is the number of languages more or less fixed? Or rather unpredictable and dynamic? If it is fixed, I would just add new columns, otherwise go with multiple tables.
But generally, why not use Drupal? I know everybody wants to build their own CMS cause it's faster, leaner, etc. etc. But that is just really a bad idea!
I am not going to attempt to refine the answers already given. Instead I will tell you about the way my own OOP PHP framework handles translations.
Internally, my framework use codes like en, fr, es, cn and so on. An array holds the languages supported by the website: array('en','fr','es','cn')
The language code is passed via $_GET (lang=fr) and if not passed or not valid, it is set to the first language in the array. So at any time during program execution and from the very beginning, the current language is known.
It is useful to understand the kind of content that needs to be translated in a typical application:
1) error messages from classes (or procedural code)
2) non-error messages from classes (or procedural code)
3) page content (usually store in a database)
4) site-wide strings (like website name)
5) script-specific strings
The first type is simple to understand. Basically, we are talking about messages like "could not connect to the database ...". These messages only need to be loaded when an error occurs. My manager class receives a call from the other classes and using the information passed as parameters simply goes to relevant the class folder and retrieves the error file.
The second type of error message is more like the messages you get when the validation of a form went wrong. ("You cannot leave ... blank" or "please choose a password with more than 5 characters"). The strings need to be loaded before the class runs.I know what is
For the actual page content, I use one table per language, each table prefixed by the code for the language. So en_content is the table with English language content, es_content is for spain, cn_content for China and fr_content is the French stuff.
The fourth kind of string is relevant throughout your website. This is loaded via a configuration file named using the code for the language, that is en_lang.php, es_lang.php and so on. In the global language file you will need to load the translated languages such as array('English','Chinese', 'Spanish','French') in the English global file and array('Anglais','Chinois', 'Espagnol', 'Francais') in the French file. So when you populate a dropdown for language selection, it is in the correct language ;)
Finally you have the script-specific strings. So if you write a cooking application, it might be "Your oven was not hot enough".
In my application cycle, the global language file is loaded first. In there you will find not just global strings (like "Jack's Website") but also settings for some of the classes. Basically anything that is language or culture-dependent. Some of the strings in there include masks for dates (MMDDYYYY or DDMMYYYY), or ISO Language Codes. In the main language file, I include strings for individual classes becaue there are so few of them.
The second and last language file that is read from disk is the script language file. lang_en_home_welcome.php is the language file for the home/welcome script. A script is defined by a mode (home) and an action (welcome). Each script has its own folder with config and lang files.
The script pulls the content from the database naming the content table as explained above.
If something goes wrong, the manager knows where to get the language-dependent error file. That file is only loaded in case of an error.
So the conclusion is obvious. Think about the translation issues before you start developing an application or framework. You also need a development workflow that incorporates translations. With my framework, I develop the whole site in English and then translate all the relevant files.
Just a quick final word on the way the translation strings are implemented. My framework has a single global, the $manager, which runs services available to any other service. So for example the form service gets hold of the html service and uses it to write the html. One of the services on my system is the translator service. $translator->set($service,$code,$string) sets a string for the current language. The language file is a list of such statements. $translator->get($service,$code) retrieves a translation string. The $code can be numeric like 1 or a string like 'no_connection'. There can be no clash between services because each has its own namespace in the translator's data area.
I post this here in the hope it will save somebody the task of reinventing the wheel like I had to do a few long years ago.
I had the same probem a while ago, before starting using Symfony framework.
Just use a function __() which has arameters pageId (or objectId, objectTable described in #2), target language and an optional parameter of fallback (default) language. The default language could be set in some global config in order to have an easier way to change it later.
For storing the content in database i used following structure: (pageId, language, content, variables).
pageId would be a FK to your page you want to translate. if you have other objects, like news, galleries or whatever, just split it into 2 fields objectId, objectTable.
language - obviously it would store the ISO language string EN_en, LT_lt, EN_us etc.
content - the text you want to translate together with the wildcards for variable replacing. Example "Hello mr. %%name%%. Your account balance is %%balance%%."
variables - the json encoded variables. PHP provides functions to quickly parse these. Example "name: Laurynas, balance: 15.23".
you mentioned also slug field. you could freely add it to this table just to have a quick way to search for it.
Your database calls must be reduced to minimum with caching the translations. It must be stored in PHP array, because it is the fastest structure in PHP language. How you will make this caching is up to you. From my experience you should have a folder for each language supported and an array for each pageId. The cache should be rebuilt after you update the translation. ONLY the changed array should be regenerated.
i think i answered that in #2
your idea is perfectly logical. this one is pretty simple and i think will not make you any problems.
URLs should be translated using the stored slugs in the translation table.
Final words
it is always good to research the best practices, but do not reinvent the wheel. just take and use the components from well known frameworks and use them.
take a look at Symfony translation component. It could be a good code base for you.
I've been asking myself related questions over and over again, then got lost in formal languages... but just to help you out a little I'd like to share some findings:
I recommend to give a look at advanced CMS
Typo3 for PHP (I know there is a lot of stuff but thats the one I think is most mature)
Plone in Python
If you find out that the web in 2013 should work different then, start from scratch. That would mean to put together a team of highly skilled/experienced people to build a new CMS.
May be you'd like to give a look at polymer for that purpose.
If it comes to coding and multilingual websites / native language support, I think every programmer should have a clue about unicode. If you don't know unicode you'll most certainly mess up your data. Do not go with the thousands of ISO codes. They'll only save you some memory. But you can do literally everything with UTF-8 even store chinese chars. But for that you'd need to store either 2 or 4 byte chars that makes it basically a utf-16 or utf-32.
If it's about URL encoding, again there you shouldn't mix encodings and be aware that at least for the domainname there are rules defined by different lobbies that provide applications like a browser. e.g. a Domain could be very similar like:
ьankofamerica.com or bankofamerica.com samesamebutdifferent ;)
Of course you need the filesystem to work with all encodings. Another plus for unicode using utf-8 filesystem.
If its about translations, think about the structure of documents. e.g. a book or an article. You have the docbook specifications to understand about those structures. But in HTML its just about content blocks. So you'd like to have a translation on that level, also on webpage level or domain level.
So if a block doesn't exist its just not there, if a webpage doesn't exist you'll get redirected to the upper navigation level. If a domain should be completely different in navigation structure, then.. its a complete different structure to manage.
This can already be done with Typo3.
If its about frameworks, the most mature ones I know, to do the general stuff like MVC(buzzword I really hate it! Like "performance" If you want to sell something, use the word performance and featurerich and you sell... what the hell) is Zend. It has proven to be a good thing to bring standards to php chaos coders. But, typo3 also has a Framework besides the CMS. Recently it has been redeveloped and is called flow3 now. The frameworks of course cover database abstraction, templating and concepts for caching, but have individual strengths.
If its about caching... that can be awefully complicated / multilayered. In PHP you'll think about accellerator, opcode, but also html, httpd, mysql, xml, css, js ... any kinds of caches. Of course some parts should be cached and dynamic parts like blog answers shouldn't. Some should be requested over AJAX with generated urls. JSON, hashbangs etc.
Then, you'd like to have any little component on your website to be accessed or managed only by certain users, so conceptually that plays a big role.
Also you'd like to make statistics, maybe have distributed system / a facebook of facebooks etc. any software to be built on top of your over the top cms ... so you need different type of databases inmemory, bigdata, xml, whatsoever.
well, I think thats enough for now. If you haven't heard of either typo3 / plone or mentioned frameworks, you have enough to study. On that path you'll find a lot of solutions for questions you haven't asked yet.
If then you think, lets make a new CMS because its 2013 and php is about to die anyway, then you r welcome to join any other group of developers hopefully not getting lost.
Good luck!
And btw. how about people will not having any websites anymore in the future? and we'll all be on google+? I hope developers become a little more creative and do something usefull(to not be assimilated by the borgle)
//// Edit ///
Just a little thought for your existing application:
If you have a php mysql CMS and you wanted to embed multilang support. you could either use your table with an aditional column for any language or insert the translation with an object id and a language id in the same table or create an identical table for any language and insert objects there, then make a select union if you want to have them all displayed. For the database use utf8 general ci and of course in the front/backend use utf8 text/encoding.
I have used url path segments for urls in the way you already explaned like
domain.org/en/about
you can map the lang ID to your content table. anyway you need to have a map of parameters for your urls so you'd like to define a parameter to be mapped from a pathsegment in your URL that would be e.g.
domain.org/en/about/employees/IT/administrators/
lookup configuration
pageid| url
1 | /about/employees/../..
1 | /../about/employees../../
map parameters to url pathsegment ""
$parameterlist[lang] = array(0=>"nl",1=>"en"); // default nl if 0
$parameterlist[branch] = array(1=>"IT",2=>"DESIGN"); // default nl if 0
$parameterlist[employertype] = array(1=>"admin",1=>"engineer"); //could be a sql result
$websiteconfig[]=$userwhatever;
$websiteconfig[]=$parameterlist;
$someparameterlist[] = array("branch"=>$someid);
$someparameterlist[] = array("employertype"=>$someid);
function getURL($someparameterlist){
// todo foreach someparameter lookup pathsegment
return path;
}
per say, thats been covered already in upper post.
And to not forget, you'd need to "rewrite" the url to your generating php file that would in most cases be index.php
The real challenge when making a multilingual website is the content. How are you going to store different versions of the same article? Are you using a relational database or a non-relational one?
Using a relational DB such as MySQL, you can take advantage of the JSON data type to store all different versions of the same field altogether.
When using a non-relational DB you can simply store different versions in the same object identifiable by their keys.
If you are using Laravel you may find Laravel Translatable package to be useful when working with traditional relational databases.
If you're hosting static content, then Google's Firebase Hosting supports i18n hosting rules that return either Country, Language or Language + Country specific content including index.html, 404.html or manifest.json files (for progressive web apps).
Their hosting will select the Country based on users IP address and select the language based on their browser's Accept-Language header and then apply prioritised rules to return each requested file content.
Language code + Country code (for example, content from fr_ca/)
Country code only (for example, content from ALL_ca/)
Language code only (for example, content from fr/ or es_ALL/)
"Default" content that's outside the "i18n content" directory, like at the root of the public directory.
Rules 1 & 3 are applied in order of quality values for each language in the request's Accept-Language header.
Example
public/
index.html // Default homepage
manifest.json // Default manifest.json
404.html // Default custom 404 page
localized-files/
ALL_ca/
index.html
es_ALL/
index.html
404.html
manifest.json << Spanish
fr/
index.html
404.html
manifest.json << French
fr_ca/
index.html
manifest.json
// firebase.json
"hosting": {
"public": "public",
"ignore": [
"firebase.json",
"**/.*",
"**/node_modules/**"
],
"i18n": {
"root": "/localized-files" // <<< "i18n content" folder
}
...
}
Full details ...
Configure internationalization (i18n) rewrites
Database work:
Create Language Table ‘languages’:
Fields:
language_id(primary and auto increamented)
language_name
created_at
created_by
updated_at
updated_by
Create a table in database ‘content’:
Fields:
content_id(primary and auto incremented)
main_content
header_content
footer_content
leftsidebar_content
rightsidebar_content
language_id(foreign key: referenced to languages table)
created_at
created_by
updated_at
updated_by
Front End Work:
When user selects any language from dropdown or any area then save selected language id in session like,
$_SESSION['language']=1;
Now fetch data from database table ‘content’ based on language id stored in session.
Detail may found here http://skillrow.com/multilingual-website-in-php-2/
As a person who live in Quebec where almost all site is french and english... i have try many if not most multilanguage plugin for WP... the one an only usefull solution that work nive with all my site is mQtranslate... i live and die with it !
https://wordpress.org/plugins/mqtranslate/
What about WORDPRESS + MULTI-LANGUAGE SITE BASIS(plugin) ?
the site will have structure:
example.com/eng/category1/....
example.com/eng/my-page....
example.com/rus/category1/....
example.com/rus/my-page....
The plugin provides Interface for Translation all phrases, with simple logic:
(ENG) my_title - "Hello user"
(SPA) my_title - "Holla usuario"
then it can be outputed:
echo translate('my_title', LNG); // LNG is auto-detected
p.s. however, check, if the plugin is still active.
A really simple option that works with any website where you can upload Javascript is www.multilingualizer.com
It lets you put all text for all languages onto one page and then hides the languages the user doesn't need to see. Works well.

Controlling the execution order of Frontcontroller plugins

The situation:
My application contains several modules, each of which should be as much self-contained as possible.
On each request, the application should parse a file. Based on the contents of the file, some database entities should be created, updated or removed.
Current approach:
I register a front controller plugin in one of my module bootstraps which takes care of the above.
The problem:
Other modules should be able to perform some routines based on the database entities that are modified in the file parse routine described above; how can I register a front controller plugin that executes after this is done?
Zend_Controller_Plugin objects are executed in LIFO order. Should I take another approach?
Note: I do realize that registerPlugin() takes a second $stackIndex argument, but since there is no way of knowing the current position of the stack, this is really is not a very clean way to solve the problem.
There is a way to know what stack indices have been used already. I recently wrote the following method for just such a case:
/**
*
* Returns the lowest free Zend_Controller_Plugin stack index above $minimalIndex
* #param int $minimalIndex
*
* #return int $lowestFreeIndex | $minimalIndex
*/
protected function getLowestFreeStackIndex($minimalIndex = 101)
{
$plugins = Zend_Controller_Front::getInstance()->getPlugins();
$usedIndices = array();
foreach ($plugins as $stackIndex => $plugin)
{
$usedIndices[$stackIndex] = $plugin;
}
krsort($usedIndices);
$highestUsedIndex = key($usedIndices);
if ($highestUsedIndex < $minimalIndex)
{
return $minimalIndex;
}
$lowestFreeIndex = $highestUsedIndex + 1;
return $lowestFreeIndex;
}
Basically, what you're asking for is the part Zend_Controller_Front::getInstance()->getPlugins(); With that you can do whatever you want, the array contains all the used stack indices as keys.
The function starts returning stack indices from 101 because the Zend Framework error controller plugin uses 100 and I need to register mine with higher indices. That's of course a bit of a magic number, but even the Zend Framework tutorials/manuals don't have a better solution for the 101 stack index problem. A class constant would make it a bit cleaner/more readable.

Drupal 6 Views 2: Setting Date Arguments

Passing uid as an argument works fine with this code:
$bouts = views_get_view_result('Results', 'page_1', array($user->uid));
The key line in views_get_view_result that sets arguments is:
$view->set_arguments($args);
But what about passing date ranges?
Also, if something is specified as a filter on a view, is there a way to prorammatically alter it?
views_get_view_result:
/**
* Investigate the result of a view.
* from Drupal.org.
*
* #param string $viewname
* The name of the view to retrieve the data from.
* #param string $display_id
* The display id. On the edit page for the view in question, you'll find
* a list of displays at the left side of the control area. "Defaults"
* will be at the top of that list. Hover your cursor over the name of the
* display you want to use. A URL will appear in the status bar of your
* browser. This is usually at the bottom of the window, in the chrome.
* Everything after #views-tab- is the display ID, e.g. page_1.
* #param array $args
* Array of arguments. (no keys, just args)
* #return
* array
* An array containing an object for each view item.
* string
* If the view is not found a message is returned.
*/
function views_get_view_result($viewname, $display_id = NULL, $args = NULL) {
$view = views_get_view($viewname);
if (is_object($view)) {
if (is_array($args)) {
$view->set_arguments($args);
}
if (is_string($display_id)) {
$view->set_display($display_id);
}
else {
$view->init_display();
}
$view->pre_execute();
$view->execute();
/* print "<pre> $viewname: $display_id";
print_r(get_class_methods($view)); */
return $view->result;
}
else {
return t('View %viewname not found.', array('%viewname' => $viewname));
}
}
As for passing data ranges and given the posted function definition, you could pass date ranges to that only if the view would accept them as arguments. I'm not 100% sure, but afaik date ranges can only be defined as filters, not as arguments, which leads to your second Question:
Programmatically altering the views filter settings is possible, but a bit messy, given the rather complicated view object/array mashup structure. In your posted function above, the first line is
$view = views_get_view($viewname);
After that, $view contains the whole view object. The filter settings are defined per display, so assuming you have a view with only a default display, you will find the filter settings under
$view->display['default']->display_options['filters']
(Note the object/array notation mix - the display is a contained object of type views_display)
The 'filters' array contains one entry per filter, with varying elements depending on the filter type. For your purpose, I would suggest to create a dummy view with just the filter you are interested in, with preconfigured/hardcoded values. Using a debugger (or var_dump/print_r) you can then take a look at the filter array after view creation. From what you find there, you should be able to deduce how to inject your custom date range.
Disclaimer: Poking around in the view like this is a bit annoying and not to effective, but it works. As of yet, I have not found a concise documentation of Views2 that would explain the innards in a straight forward way, as I find the official API documentation a bit lacking concerning the usage from code. (Of course this could well be just me being to stupid ;)
If you're using views 2, you can use the GUI to add a date argument. Then in the url, you can put :
www.yousite.com/yourview/startDate--finishDate
For the startDate/finishDate, the format is YYYY-MM-DD-HH.
GL!

Categories