PHP French and Russian characters after csv upload - php

I've created a csv import allowing users to upload their csv full of info on a Mysql database and displaying them on another webpage.
Now, some users are french and some others russian. So I'd need to be able to handle both sets of characters, but I find myself having problems with both.
I tried to add the utf8_decode tags before the variables but the situation doesn't change.
I'd like to know wether there is a general solution allowing to deal with both sets of characters in the same page??
ps in a previous page I was dealing with I handled it by passing the utf8_decode tag everytime I was dealing with a French variable, and by putting nothing everytime dealing with a russian variable. But in this case the trick doesn't work.
Thanks in advance. "the world of characters sets is a weird beist..."
marko.c

You could convert everything to UTF-32 just to be sure, you could try something like:
if(!mb_detect_encoding($csv, 'UTF-32', true)){
iconv(mb_detect_encoding($csv, mb_detect_order(), true), "UTF-32", $csv);
}

Ok so in the end the whole problem was in the csv upload. once added the following line to the csv upload
mysql_query("SET NAMES 'UTF8'");
everything worked properly. No need to recall any type of encoding nor decoding UTF8, both russian and french characters simpley work well.
cheers, thanks

Related

Bug with php file converted from ansi to utf-8

I have a few php scripts files encoded in ANSI. Now that I converted my website to html5, I need everything in UTF-8, so that accents in these file are displayed correctly without any php conversion through iconv(). I used Notepad++ to set the encoding of my scripts on UTF-8 and save the files, and most are fine, accents are displayed correctly, only the main script now blocks everything, and the server only returns a white page, without any error message, even with ini_set('error_reporting', 'E_ALL') !
When I change the encoding back to ANSI in Notepad++, and save the file without any other change, it works again (except the accents are not displayed correctly without iconv() ).
I did also try to use a php script to change the encoding with ...$file = iconv('ISO-8859-1','UTF-8', $file);... but the result is exactly the same !
I wrote a short php script to look for high char() values, but the highest values seems to be usual French accents like é, è, etc which are also present on other files and pose no problem. I did remove other special chars, without any effect...
The problem is that the file is large, more than 4500 lines and I'm not sure how to proceed to correct this ? Anyone has had this problem, or has any idea ?
The issue was with the "£" (pound) character, I used it a lot as delimiter in preg_match("£(...)£", "...", $string) and preg_replace conditions.
For some reason these characters were not accepted after conversion. I had to replace all of them, then only it worked fine in utf-8... Apparently they are not a problem now that the file is converted, I can use them again.

PHP get ASCII code of a character

How can I get the ASCII code of a character in PHP ??
http://www.backbone.se/urlencodingUTF8.htm
When I try:
$h = dechex(ord('ñ'));
echo $h;
I'm getting C3 when I should be getting f1
I want for example:
for 'ñ' -> %f1
for 'º' -> %ba
How can I get that?
Thanks in advance!!
As they told you in the comments, ñ is not standard ASCII. It is present on the extended ASCII table, but your source file may be saved in UTF-8 and thus ñ is stored with different bytes. You could try setting your editor to save documents in a different charset.
However, what you're doing is really dangerous. Since as you see encoding can change, it's always a bad idea to write in source files characters that are not part of standard ASCII.
Why do you need to do that operation? Is there another way? Can't you use UTF-8?

Converting odd character encoding back to utf-8

I have a database full of strings containing strange characters such as:
Design Tattoo Ãœbungshaut
Mehrflächiges Biozid Reinigungs- & Desinfektionsmittel
Where the Ãœ and ä should be, as I understand, an Ü and à when in proper UTF-8.
Is there a standard function to revert these multiple characters back to there proper UTF-8 form?
In PHP I have come across $url = iconv('utf-8', 'iso-8859-1', $url); which seems to get close but falls short. Perhaps I have the wrong parameters, but in any case was just wondering how well this issue is know and if there is an established fix?
The original data was taken from the eCommerce system CubeCart which seems to have no problem converting it back to normal text FYI.
The data shown as example is UTF-8 encoded data mistakenly interpreted as ISO-8859-1 (or windows-1252). The problem combinations are in fact “Ü” and “ä” (“Ā” does not appear in German). So apparently what you need to do is to read the data as UTF-8 and display it that way, instead of converting it.
If the database and output is utf-8 it could be because your not using utf-8 as the client character set.
If your using mysqli you can use set_charset or run SET NAMES utf8 as a query before fetching data.

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

First letter disappear if it has an accent (CSV file, UTF-8 encoded)

I'm actually working on a web application coded in php with zend framework. I need to translate every pages in french and english so I use csv file to do it.
My problem is when a word start with an accentued letter like É or À, the letter just disappear, but the rest of the word is displayed.
For example, if my csv file contains Écriture, it displays criture. But if I have exécution, it displays exécution without any problems.
Everytime I want to display text in my view, I just call <?php echo $this->translate('line to call in csv'); ?> and my text is displayed.
Like I said ,my application is encoded with UTF-8, and I don't have any problems withs specials characters, except when they're first. I googled it but couldn't find anything for now.
Thanks already for your help !
UPDATE
I forgot to say that when I execute my application in zend browser to debug it, everything's fine, my É displays. It's only in broswers like IE or FF that I have the problem.
UPDATE #2
I just found another post talking about fgetcsv, and it looks like the function I use to translate from my csv file is using fgetcsv() ... could it be the problem ? And if it is, how can I fix it ? It's coded like that in Zend Translate library I'm not sure I want to start changing things there ...
UPDATE #3
I continued my research and I found issues in PHP when encoded UTF-8. But Zend Framework is encoded UTF-8 by default so I'm sure there is a way to make this work.. I'm still searching but I hope someone has the solution !
I had the same problem, I tried AJ's solution and it worked:
Missing first character of fields in csv
The problem seems to be that fgetcsv() uses locale settings, just use
setlocale(LC_ALL, 'en_US.UTF-8');
In .csv file content try to use
; as delimiter
and
" as enclosure.
something like this inside .csv file
"key1";"value1" ##first line
"key1";"value1" ##second line
"key1";"value1" ##fird line
this solve like ussue for me
view csv file using hex editor and make sure it is encoded in the right way
"É" is 0xC3 0x89,
"À" is 0xC3 0x80
Did you have some strtoupper() or ucfirst() or similar functions in your code? In that case try mb_strtoupper($str, 'UTF-8')

Categories