Difference Firefox - Chrome when encoding umlauts - php

Chrome converts this: aöüß to %C3%A4%C3%B6%C3%BC%C3%9F
But Firefox converts it to this strange thing here: a%F6%FC%DF
I can't seem to find a way to convert the Firefox thing back to the original in PHP.
Urldecode and rawurldecode unfortunately don't work. Does anyone know how to deal with that? Thanks.

As Tei already guessed: Chrome is using UTF-8 (as probably recommended) for URL parameters while Firefox uses Latin-1. I don't think that you can control this behavior. Also this will be difficult to handle, because you pretty much need to guess the encoding that was used.
This is how the decoding works (browser-dependent, assuming that you're using UTF-8 in your application):
Chrome:
$text = urldecode($_GET['text']);
Firefox:
$text = utf8_encode(urldecode($_GET['text']));
This may be a solution that works in most cases:
function urldecode_utf8($text) {
$decoded = urldecode($text);
if (!mb_check_encoding($decoded, 'UTF-8')) {
$decoded = utf8_encode($decoded);
}
return $decoded;
}

Force your page to use UTF-8. Probably these codes are different encoded umlauts. One is something like Latin1, and the other perhaps is UTF-8.
The best way to force utf-8 is in a meta tag in the html.
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Related

Converting t’to apostrophe in php [duplicate]

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

Header special char conversion?

I have some problems with char conversion on my php's page header.
I have to develop a snippet of code that, means WS (xml-rpc protocol), can interface with another snippet of code wrote in python.
This is python snippet's output:
Output={'metaTagKeyWords': '', 'metaTagTitle': '10% DISCOUNT FOR 3 NIGHTS','metaTagDescription': 'Questa \xc3\xa8 una prova: devo vedere che succede.\r\n\r\nProva prova.\r\n\r\nDaje.\r\n\r\nENGLISH VERSION !!!!\r\n'}
So I have to convert some char: first of all \xc3\xa8 that is the unicode conversion of "è" and, in a second time, the "\r\n\" chars.
I know how to procede with "\r\n\" chars, but I don't know how to convert the unicode char.
I have had alredy tried to do something like this:
htmlentities($data[$META_TITLE_KEY], ENT_QUOTES, 'UTF-8')
But it dind't work.
Moreover, I had alredy tried to convert in pyhon the string in UTF-8 (so that entity would be u'\xc3' or something like that, but the results are pretty the same.)
An additional info: that conversion have to be used on php file header, into "meta tag description" tag.
EDIT1:
It's seems to be that, what we belive as an UTF-8, is instead a LATIN-1. So, if i change in the header that part:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
in
<meta http-equiv="content-type" content="text/html;charset=ISO-8859-1" />
it works.
But I have to have a utf-8 charset; so I suppose that have to do something in python applicative logic (because when I go from editor to DB i encode something while when I return from DB to editor I decode something).
Stay tune for more info
EDIT2:
Maybe some function that i use to save my data onto Postrges DMB, convert data in latin-1 and then in utf-8. So, if I add this instruction:
d_meta[element] = codeDbToEditor(d_meta[element]).replace('\r\n', ' ').decode('latin-1')
everything seems to works.
Have I had the right "insipration"?
$str="Hello Loréane";
echo utf8_encode($str);
Hope It Helps

Getting ’ instead of an apostrophe(') in PHP

I've tried converting the text to or from utf8, which didn't seem to help.
I'm getting:
"It’s Getting the Best of Me"
It should be:
"It’s Getting the Best of Me"
I'm getting this data from this url.
To convert to HTML entities:
<?php
echo mb_convert_encoding(
file_get_contents('http://www.tvrage.com/quickinfo.php?show=Surviver&ep=20x02&exact=0'),
"HTML-ENTITIES",
"UTF-8"
);
?>
See docs for mb_convert_encoding for more encoding options.
Make sure your html header specifies utf8
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
That usually does the trick for me (obviously if the content IS utf8).
You don't need to convert to html entities if you set the content-type.
Your content is fine; the problem is with the headers the server is sending:
Connection:Keep-Alive
Content-Length:502
Content-Type:text/html
Date:Thu, 18 Feb 2010 20:45:32 GMT
Keep-Alive:timeout=1, max=25
Server:Apache/2.2.8 (Ubuntu) PHP/5.2.4-2ubuntu5.7 with Suhosin-Patch
X-Powered-By:PHP/5.2.4-2ubuntu5.7
Content-Type should be set to Content-type: text/plain; charset=utf-8, because this page is not HTML and uses the utf-8 encoding. Chromium on Mac guesses ISO-8859-1 and displays the characters you're describing.
If you are not in control of the site, specify the encoding as UTF-8 to whatever function you use to retrieve the content. I'm not familiar enough with PHP to know how exactly.
I know the question was answered but setting meta tag didn't help in my case and selected answer was not clear enough, so I wanted to provide simpler answer.
So to keep it simple, store string into a variable and process that like this
$TVrageGiberish = "It’s Getting the Best of Me";
$notGiberish = mb_convert_encoding($TVrageGiberish, "HTML-ENTITIES", 'UTF-8');
echo $notGiberish;
Which should return what you wanted It’s Getting the Best of Me
If you are parsing something, you can perform conversion while assigning values to a variable like this, where $TVrage is array with all the values, XML in this example from a feed that has tag "Title" which may contain special characters such as ‘ or ’.
$cleanedTitle = mb_convert_encoding($TVrage->title, "HTML-ENTITIES", 'UTF-8');
If you're here because you're experiencing issues with junk characters in your WordPress site, try this:
Open wp-config.php
Comment out define('DB_CHARSET', 'utf8') and define('DB_COLLATE', '')
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
//define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
//define('DB_COLLATE', '');
It sounds like you're using standard string functions on a UTF8 characters (’) that doesn't exist in ISO 8859-1. Check that you are using Unicode compatible PHP settings and functions. See also the multibyte string functions.
We had success going the other direction using this:
mb_convert_encoding($text, "HTML-ENTITIES", "ISO-8859-1");
Just try this
if $text contains strange charaters do this:
$mytext = mb_convert_encoding($text, "HTML-ENTITIES", 'UTF-8');
and you are done..
if all seems not to work, this could be your best solution.
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
==or==
<?php
$content="It’s Getting the Best of Me";
$content = str_replace("’", "'", $content);
echo $content;
?>
try this :
html_entity_decode(mb_convert_encoding(stripslashes($text), "HTML-ENTITIES", 'UTF-8'))
For fopen and file_put_contents, this will work:
str_replace("’", "'", htmlspecialchars_decode(mb_convert_encoding($string_to_be_fixed, "HTML-ENTITIES", "UTF-8")));
You Should check encode encoding origin then try to convert to correct encode type.
In my case, I read csv files then import to db. Some files displays well some not. I check encoding and see that file with encoding ASCII displays well, other file with UTF-8 is broken. So I use following code to convert encoding:
if(mb_detect_encoding($content) == 'UTF-8') {
$content = iconv("UTF-8", "ASCII//TRANSLIT", $content);
file_put_contents($file_path, $content);
} else {
$content = mb_convert_encoding($content, 'UTF-8', 'UTF-8');
file_put_contents($file_path, $content);
}
After convert I push the content to file then process import to DB, now it displays well in front-end
If none of the above solutions work:
In my case I noticed that the single quote was a different style of single quote. Instead of ' my data had a ’. Notice the difference in the single quote? So I simply wrote a str_replace to replace it and it fixed the problem. Probably not the most elegant solution but it got the job done.
$string= str_replace("’","'",$string);
I looked at the link, and it looks like UTF-8 to me. i.e., in Firefox, if you pick View, Character Encoding, UTF-8, it will appear correctly.
So, you just need to figure out how to get your PHP code to process that as UTF-8. Good luck!
use this
<meta http-equiv="Content-Type" content="text/html; charset=utf8_unicode_ci" />
instead of this
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
If nothing works try this mb_convert_encoding($elem->textContent, 'UTF-8', 'utf8mb4');

Unicode and PHP - am I doing something wrong?

I'm using Kohana 3, which has full support for Unicode.
I have this as the first child of my <head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
The Unicode character I am inserting into is é as in Café.
However, I am getting the triangle with a ? (as in could not decode character).
As far as I can tell in my own code, I am not doing any string manipulation on the text.
In fact, I have placed the accent straight into a view's PHP file and it is still not working.
I copied the character from this page: http://www.fileformat.info/info/unicode/char/00e9/index.htm
I've only just started examining PHP's Unicode limitations, so I could be doing something horribly wrong.
So, how do I display this character? Do I need to resort to the HTML entity?
Update
So this works
Caf<?php echo html_entity_decode('é', ENT_NOQUOTES, 'UTF-8'); ?>
Why does that work? If I copy the output accented e from that script and insert it into my document, it doesn't work.
View the http headers. You should see something like
Content-Type: text/html; charset=UTF-8
Browsers don't pay much attention to meta tags, if there was a real http header stating a different encoding.
update
Whatcha get from this?
echo bin2hex('é');
echo chr(0xc3) . chr(0xa9);
You should get c3a9é, otherwise I'd say file encoding issue.
I guess, you see �, the replacement character for invalid UTF-8 byte sequences. Your text is not UTF-8 encoded. Check your editor’s settings to control the encoding of the PHP file.
If you’re not sure about the encoding of your sources, you can enforce UTF-8 compatibilty as described here (German text): Force UTF-8.
You should never need entities except the basic ones.

Get source code with Chinese characters PHP

Well, I give up.
I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312).
I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedded inside a rhomboid shape.
("�������ѯ�ؼ��֣�" Like so)
Declaring the encoding for the php file didn't do anything except of getting rid of some unwanted character showing at the start of the page.
By declaring it I mean:
header('Content-Type', 'text/html; charset=GB2312');
I can't get any data that's written in Chinese, also tried file_get_contents with the same luck. I'm probably missing something obvious since I can't find any related discussion elsewhere.
Thanks in advance.
Have you tried converting the encoding with mb_convert_encoding or iconv, e.g.
$str = mb_convert_encoding($content, 'UTF-8', 'GB2312');
or
$str = iconv("UTF-8", "GB2312//IGNORE", $content);
Get it in whatever character set the source uses, then convert it to something usable locally, such as UTF-8. Then send it to the browser.
set header('Content-Type: text/html; charset=utf-8');
It's working for me

Categories