utf-8 decoding problem in php - php

I got a .vcf file with parts encoded as UTF-8:
CATEGORIES;CHARSET=UTF-8:Straße & –dienste
Now "–" should be a "-" and "Straße" should convert to "Straße".
I tried
utf8_decode()
iconv()
mb_convert_encoding()
And have been playing with several output encoding options like
header('content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
mb_http_output( "UTF-8" );
But I don't get the wanted results - instead: "StraÃ?e & â??dienste"
Anyone getting that knot out of my brain? Thanks a lot.

solved.
i had to convert the PHP file back to ISO-8859-1 (instead of UTF-8).
thought that would make no difference, but it does!

You may actually want to try utf8_encode(). I had a similar problem when retrieving UTF-8 encoded information from MySQL and displaying it on a UTF-8 HTML page.

forgot to mention: there is no MySQL...
plain php ;-)
echo "Straße & –dienste";
echo utf8_decode("Straße & –dienste");
should somehow become "Straße & -dienste"... but won't, won't, won't

I don't have an answer for you, as I'm not sure I understand fully what you are trying to do (read a .vcf file in PHP?)....
But a clue is this: "Straße" is "Straße" encoded in UTF-8, but then interpreted as Latin1 (or Windows-1252).

Related

Yet another character encoding issue

The language is Php. The editor is Php Storm. The editor encoding is utf-8. The file encoding also. mb_detect_encoding() also returns that the encoding is utf-8 but php does not recognizes č, ć, ž, đ and others. Does anyone know what the problem is?
I know that this is yet another character encoding question and that the solution is never clear in this case, but thank you for any answer.
EDIT
I use my own php framework and the index.php file is encoded to ANSI, not utf-8, but the rest of the files are utf-8. If I try to change from ANSI to utf-8, I get a content encoding
error.
I solved the problem with these lines of code:
ini_set('mbstring.internal_encoding','UTF-8');
ini_set( 'default_charset', 'UTF-8' );
ini_set('mbstring.func_overload',7);
header('Content-Type: text/html; charset=UTF-8');
Actually, setting the 'default_charset' is enough to work but since I spent the last 4 hours trying to solve this, the hell with it. Let it all fire.

get_meta_tags and persian phrases

I used this function,
$code = get_meta_tags('http://www.narenji.ir/');
and I've seen this
'مکانی برای آشنایی با ابزارها Ùˆ اخبار داغ دنیای Ùناوری'
How can I fix this issue?
Can I fix it without using JSON?
You must be missing some link here, your code just works:
Example
The key point is that you preserve the UTF-8 encoding so that Persian is supported. Otherwise you would need some other encoding (one that I do not yet know) that supports Persian and a library that is able to re-encode that.
Which encoding do you want to use for Persian output?
If you are executing your script from a browser, make sure you sending UTF-8 as your content encoding. Add a Content-Type header before echo'ing anything.
header('Content-Type:text/html; charset=utf-8');
utf8_decode() is built specifically for converting from UTF-8 to ISO-8859-1 (latin1). Persian characters are not in Latin1, so why would you feel it's necessary here??
working example: http://codepad.viper-7.com/tEjZAz

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

Character encoding in PHP

I never had this problem before, it was usually my database or the html page. But now i think its my php. I import text from a csv or from a text area and in both ways it goes wrong.
for example é changes to é. I used htmlentities to fix this but it didn't work. The htmlentities function didn't return é in html but é in html entities, so it already loses the real characters before htmlentities comes in to place... So does that mean my php file has the wrong encoding or something?
I hope someone can help me out..
Thanks!
Chris
A file is usually ISO-8859-1 (Latin) or UTF-8 ... ISO-8859-1 is 1 byte per char, UTF-8 is 1-4 bytes per char. So if you get 2 chars when you expect one, then you are reading UTF-8 and showing it as ISO-8859-1 ... if you get strange chars, then you are reading ISO-8859-1 and showing it as UTF-8.
If you provide more details, it would be easier to pinpoint, but in short, you have inconsistent charsets and need to convert one or the other so they're all the same. But from what it seems, you're using ISO-8859-1 in your project, but you are reading some UTF-8 from somewhere... use utf8_decode($text) if that data should be indeed be stored as UTF-8, or find the data and convert it manually.
EDIT: If you are using AJAX somewhere, then you will ALWAYS get UTF-8 from it, and you'll have to decode it yourself with utf8_decode() if you want to keep using ISO-8859-1.
Try opening your php file and change the encoding to UTF-8
if that doesn't help, add this to your php:
header('Content-Type: text/html; charset=utf-8');
Or this to your html:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Take a look at PHP's iconv().

Get source code with Chinese characters PHP

Well, I give up.
I've been messing around with all I could think of to retrieve data from a target website that has information in traditional Chinese encoding (charset=GB2312).
I've been using the simple_html_parser like always but it doesn't seem to return the Chinese characters, in fact all I get are some weird question marks embedded inside a rhomboid shape.
("�������ѯ�ؼ��֣�" Like so)
Declaring the encoding for the php file didn't do anything except of getting rid of some unwanted character showing at the start of the page.
By declaring it I mean:
header('Content-Type', 'text/html; charset=GB2312');
I can't get any data that's written in Chinese, also tried file_get_contents with the same luck. I'm probably missing something obvious since I can't find any related discussion elsewhere.
Thanks in advance.
Have you tried converting the encoding with mb_convert_encoding or iconv, e.g.
$str = mb_convert_encoding($content, 'UTF-8', 'GB2312');
or
$str = iconv("UTF-8", "GB2312//IGNORE", $content);
Get it in whatever character set the source uses, then convert it to something usable locally, such as UTF-8. Then send it to the browser.
set header('Content-Type: text/html; charset=utf-8');
It's working for me

Categories