I have a multilanguage site and I would like PHP to automatically set the language depending on the location from where you enter the site.
I tried a couple of ways.
localeconv() is not returning local language at all,
nl_langinfo() was also not helpful at all,
mb_language() returns not the language I was looking for,
$_SERVER['HTTP_ACCEPT_LANGUAGE'] this returned me a couple of languages instead of just one.
setlocale(LC_ALL, 0) returned C for some reason.
But I failed to get the correct info every time.
I guess that setlocale(LC_ALL, 0) is the best solution, but I don't know what the returning C means and I don't know what to expect from different languages.
I looked for a solution on many different sites (including SO) and found the solutions I mentioned earlier. Unfortuately none of them did what I was looking for.
I use $language = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2); to get the first (= preferred) entry of the language array, reduced to 2 characters, for example "en" or "de"
Another approach without substr: locale_accept_from_http($_SERVER['HTTP_ACCEPT_LANGUAGE'])
Short answer: Language and location are very different things. You shouldn't set the language based on the location.
Why?
Many countries have multiple languages. Additionally, if you are English and you log onto your favourite website while you are on holiday in Japan, you don't want to see it in Japanese.
As Johannes mentioned, better to use the browser's language ($_SERVER['HTTP_ACCEPT_LANGUAGE']) if you want to make that decision automatically.
Related
I am working on a website and the requirement is to make it in two languages i.e. icelandic and english.
just like facebook and other google, if a user selects a language, then the site is translated in that language.
I am not allowed to use google translator.
Any other way to do this in Php
Thanks in advance
Well, I never did it, but i did think about it :), for me i have to do something like this from scratch,
First, do not echo your String that will be displayed to your clients hardcoded, create a dictionary, this dictionary can be in any format, be it php file, xml file, json. You can also extend the functionality by adding Database in it. The main idea is to create a dictionary having all your messages that will be displayed to the user in all the languages you want to display it
consider if you do it using normal PHP FIle, use OOP built class say known as Message, then as attribute to the class add the several languages that you have to use and also some setters and getters
e.g.
Message
{
english;
french;
.....
}
then in PHP, when you echo your messages, try to get the language you want to use, and then
do something like this
echo message.getEnglishMessage();
Look, I've been very generic, now decide on the type of file that you'll use and build the dictionary
Hopes it helps :-)
I use an es.php (spanish not sure what icelandic is) and build all of the mod_rewrite off that. You treat it exactly same as you would if it were the index.php for english. For inputing data into the database have a column for language. All of your queries that call data will then have the language as a condition.
The "gettext" is the way you can go with but if you and your client are in nice understanding ask him to provide the data in language other than english as well and then in DB table there will be a column 'language' in which 'ic' or 'en' flag will be the data, and during fetching the data anywhere, according to language your sql query will contain the language as a where condition with desired flag as its value.
i need to translate my site in multiple languages. i was thinking to use a database called language and put the translation there.
database : translation
tables: language
column: id, english, french, german, italian, spanish
or i was thinking about a php solution like:
english.php
french.php
german.php
italian.php
spanish.php
so you simply include the file you need.
now, i can see pros and cons for both, what i want to know is what is consider the standard in the industry to do something like this?
You can use gettext, this function is proposed for this feature, not a "standard" but fast enough.
The second options in the use of a PHP file with a big array (really big, for each string), this is the most common solution.
To the database content (the big problem here, don't forget), if all your content must have the translation, one column for each language, otherwise use a flag of language for each line on database.
There is no industry standard. I have seen (and implemented) solutions using flat files, XML, PHP code, a database, and gettext files to store the localized strings. It's a matter of what is more suitable for you.
My go-to method for PHP is simply files containing arrays of strings, for example
en.php
return array (
'How are you?' => 'How are you?',
'Goodbye' => 'Goodbye',
);
de.php
return array (
'How are you?' => 'Wie gehts?',
'Goodbye' => 'Auf wiedersehen',
);
This can be integrated into an application with reasonable granularity (there can be many such files, e.g. one for each component) and control (you can easily fall back to any other language if you don't find a string) and it is also very convenient to modify without need for special tools.
My favorite PHP framework (Yii) and a giant open source project I have worked on (Moodle) also use this approach.
Noone of the two solutions seems great to me. You should think in the long run when you think a solution.
What if you choose to translate your website in other languages different from those you thought as russian or chinese? In the first case you have to add more and more columns, in the second you've to create more and more file. Another cons is what if you translate a page in italian and spanish but not yet in french?
I think that a good thing is to have a database based solution and a main language. Now you can do something like this:
Create a table 'page' (id, title, ...) where you'll store the page in the main language and where you'll have the info of the translated page too
Create a table 'translation' (idsource, idtranslation, language)
Everytime check the available translations and give those to the users
In database localization you have four main strategies. Each has particular advantages and disadvantages. For the long term I would definitely recommend cloning. You can see the four methods at the link below:
http://www.sisulizer.com/localization/software/server-desktop-database.shtml
There are two main ideas you want to be sure to be implementing. The first, be sure you are integrating some form of translation memory. Your language vendor should be instructing you on how to do this and probably doing it for you.
The second, for each additional language you target, your data will get at least 2x more complex. Keep this in mind as you move forward. Not only your data, but your file sets, management, etc.
Hope that helps. Let me know if you have further questions.
Russell
Currently I'm writing a short survey (html form) using php, mysql and jquery. I want the user to select their country from a drop-down list and then get the right currency (server side) so later on I can ask things referring to the right currency.
I really don't got a clear view on how to achieve this. I know I can find an up to date country list from: http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm
I could make into a php array but then?
http://snipplr.com/view/36437/php-country-code--to-html-currency-symbol-list/
Seems nice code, but I like to use something that is up to date.
Its no problem for me to use a mysql database, but it is a problem to install plug-ins/expansions (hosting won't allow it).
Does somebody knows a good (and maybe easy) way to achieve this?
You can use CURL or file_get_contents() to read the content from the URL and is always updated.
Not answering your question, but:
As of PHP5.3 the intl became a default. It contains NumberFormatter::formatCurrency() that does what your linked country_currency() is trying to do - only properly. If PHP5.3 is a viable minimum requirement (seeing that 5.2 is deprecated and not supported anymore) - use the intl functions.
With Locale::acceptFromHttp() you can check the browser's request headers to preselect the best matching locale.
Your ISO CountryCode list should still be helpful for a manual <select> on a certain level. But keep in mind that it's not quite accurate: Germany translates to de, which may not be specific enough seeing de_AT, de_CH, de_DE. Each of them may present Currency differently. €1,123.23, 1 123,23 €, and so on. You'll still need to know which currency you're processing, though. So you need the list of ISO country codes AND the map of countrycode to currency.
PHP Intl's NumberFormatter accepts English as a language for any country. So just use en_ plus your country code.
echo (new NumberFormatter('en_DE', NumberFormatter::CURRENCY))
->getTextAttribute(NumberFormatter::CURRENCY_CODE); // EUR
echo (new NumberFormatter('en_RS', NumberFormatter::CURRENCY))
->getTextAttribute(NumberFormatter::CURRENCY_CODE); // RSD
People search in my website and some of these searches are these ones:
tapoktrpasawe
qweasd qwa as
aıe qwo ıak kqw
qwe qwe qwe a
My question is there any way to detect strings that similar to ones above ?
I suppose it is impossible to detect 100% of them, but any solution will be welcomed :)
edit: I mean the "gibberish searches". For example some people search strings like "asdqweasdqw", "paykaprkg", "iwepr wepr ow" in my search engine, and I want to detect jibberish searches.
It doesn't matter if search result will be 0 or anything else. I can't use this logic.
Some new brands or products will be ignored if I will consider "regular words".
Thank you for your help
You could build a model of character to character transitions from a bunch of text in English. So for example, you find out how common it is for there to be a 'h' after a 't' (pretty common). In English, you expect that after a 'q', you'll get a 'u'. If you get a 'q' followed by something other than a 'u', this will happen with very low probability, and hence it should be pretty alarming. Normalize the counts in your tables so that you have a probability. Then for a query, walk through the matrix and compute the product of the transitions you take. Then normalize by the length of the query. When the number is low, you likely have a gibberish query (or something in a different language).
If you have a bunch of query logs, you might first make a model of general English text, and then heavily weight your own queries in that model training phase.
For background, read about Markov Chains.
Edit, I implemented this here in Python:
https://github.com/rrenaud/Gibberish-Detector
and buggedcom rewrote it in PHP:
https://github.com/buggedcom/Gibberish-Detector-PHP
my name is rob and i like to hack True
is this thing working? True
i hope so True
t2 chhsdfitoixcv False
ytjkacvzw False
yutthasxcvqer False
seems okay True
yay! True
You could do what Stackoverflow does and calculate the entropy of the string.
Of course, this is just one of many heuristics SO uses to determine low-quality answers, and should not be relied upon as 100% accurate.
Assuming you mean jibberish searches... It would be more trouble than it's worth. You are providing them with a search functionality, let them use it however they please. I'm sure there are some algorithms out there that detect strange character groupings, but it would probably be more resource/labour intensive than just simply returning no results.
I had to solve a closely related problem for a source code mining project, and although the package is written in Python and not PHP, it seemed worth mentioning here in case it can still be useful somehow. The package is Nostril (for "Nonsense String Evaluator") and it is aimed at determining whether strings extracted during source-code mining are likely to be class/function/variable/etc. identifiers or random gibberish. It works well on real text too, not just program identifiers. Nostril uses n-grams (similar to the Gibberish Detector in the answer by Rob Neuhaus) in combination with a custom TF-IDF scoring function. It comes pretrained, and is ready to use out of the box.
Example: the following code,
from nostril import nonsense
real_test = ['bunchofwords', 'getint', 'xywinlist', 'ioFlXFndrInfo',
'DMEcalPreshowerDigis', 'httpredaksikatakamiwordpresscom']
junk_test = ['faiwtlwexu', 'asfgtqwafazfyiur', 'zxcvbnmlkjhgfdsaqwerty']
for s in real_test + junk_test:
print('{}: {}'.format(s, 'nonsense' if nonsense(s) else 'real'))
will produce the following output:
bunchofwords: real
getint: real
xywinlist: real
ioFlXFndrInfo: real
DMEcalPreshowerDigis: real
httpredaksikatakamiwordpresscom: real
faiwtlwexu: nonsense
asfgtqwafazfyiur: nonsense
zxcvbnmlkjhgfdsaqwerty: nonsense
The project is on GitHub and I welcome contributions.
I'd think you could detect these strings the same way you could detect "regular words." It's just pattern matching, no?
As to why users are searching for these strings, that's the bigger question. You may be able to stem off the gibberish searches some other way. For example, if it's comment spam phrases that people (or a script) is looking for, then install a CAPTCHA.
Edit: Another end-run around interpreting the input is to throttle it slightly. Allow a search every 10 seconds or so. (I recall seeing this on forum software, as well as various places on SO.) This will take some of the fun out of searching for sdfpjheroptuhdfj over and over again, and at the same time won't interfere with the users who are searching for, and finding, their stuff.
As some people commented, there are no hits in google for tapoktrpasawe or putjbtghguhjjjanika (Well, there are now, of course) so if you have a way to do a quick google search through an API, you could throw out any search terms that got no Google results and weren't the names of one of your products. Why you would want to do this is a whole other question - are you trying to save effort for your search library? Make your hand-review of "popular search terms" more meaningful? Or are you just frustrated at the inexplicable behaviour of some of the people out on the big wide internet? If it's the latter, my advice is just let it go, even if there is a way to prevent it. Some other weirdness will come along.
Short answer - Jibberish Search
Probabilistic Language Model works.
Logic
word is made up of sequence of characters, and if 2 characters come together more frequently and if we sum up all frequency of 2 contiguous characters coming together in word, and sum cross threshold limit (being an english word), it is said to proper english word. In brief, this logic is famous by Markov chains.
Link
For Mathematics of Gibberish and better understanding, refer to video https://www.youtube.com/watch?v=l15C8UJu17s . Thanks !!
If the search is performed on products, you could cache their names or codes and check them against that list before quering database. Else, if your site is for english users, you can build a dictionary of strings that aren't used in the english language, like qwkfagsd. Which, and agreeing with other answer, will be more resource intensive than if not there.
I have developed a website www.tenxian.com.
It has three language versions, English, Japanese and Chinese. How can I write an effective PHP program which can automatically choose a language version based on the IP address of the visitor?
If I use "if-else", the code would be much complicated; If I use switch-case, how to write it since the data which should be dealt with are IP ranges, not particular numbers. Besides, I don't know these IP ranges
What is the easiest way to do it?
Please, PLEASE, do not make the mistake of thinking that IP == language. Look at the browsers accept-language header first, then the browser identification string which might contain the OS language, and only then take the IP into account. In almost 100% of all cases the browser accept-language header will be present and sufficient.
And always give the user the choice to switch to another language.
Just apart from the simple case of a foreigner abroad, how do you determine the language for Belgium, where they speak French, Dutch and German? (Maybe that doesn't apply to your case, but just philosophically. :)).
Check out GeoPlugin:
http://www.geoplugin.com/webservices/php
Yes please don't do it... Google does this and dreaking annoying.. I always get the thai version instead the english one from my browser.
Use the http headers from the browser.
<?php
$ln = split(",",$_SERVER["HTTP_ACCEPT_LANGUAGE"]);
print_r($ln[0]);
?>
Perhaps this will help: www.countryipblocks.net
You'd probably want to use some form of IP geocoding database (example).
Assuming you can convert IP ranges to one of your language choices, you could do this (all replies above): have all your messages in the applications stored in an associative array of this form.
$MESSAGES[$USER_LANGUAGE][$msgId]
where $USER_LANGUAGE can be chinese, japanese, or english (or any other equivalent enum). $msgId can be things like "login.successful", "login.fail" etc. Where ever you display messages to the user do not display hardcoded strings, make a reference to the variable using the $msgId.
You can access it as a global variable OR you can create a function that takes in the $msgId as a parameter and returns the message, $USER_LANGUAGE can be a global variable as well (which is set the first time the user comes in).
take a look at the maxmind geoip module for PHP (http://www.maxmind.com/app/php), as for your data structure perhaps key it to the ISO-3166-1 country code which apache_note("GEOIP_COUNTRY_CODE"); returns.