Decoding strings on PHP: Data treats UTF-8 Bytes as Windows-1252 - php

I am getting data from a web API which has a strange encoding. I am using PHP and can't seem to decode input strings. I seem to be having this problem, which explains what's going but doesn't really help me figure out how to fix it.
Can anyone help?
Thanks!

You may want to try analyzing the encoding using something like mb_detect_encoding().
http://www.php.net/mb_detect_encoding

You can use mb_detect_encoding() to detect the encoding of the strings.
If they are not what you are expecting, you can use mb_convert_encoding() to convert to something like UTF-8 or whatever you want.

Related

Can't decode GET parameter

I have a simple `url that pass two parameters. Name and cellphone. But when I use special characters, the parameter can't be decoded. It appears the ?? instead of the character.
I already tried use urldecode($_GET['name']), rawurldecode, html_entity_decode, utf8_decode, but none of this worked.
I have the utf-8 meta tag in my HTML and I also tryed pass this as a header inside php, but it didn't work.
The code is like this
<?php echo $_GET['name']; ?>
You simply have the use the correct function, which is utf8_encode:
<?php echo utf8_encode($_GET['name']); ?>
Output:
Consultório
The function utf8_encode:
This function converts the string data from the ISO-8859-1 encoding to
UTF-8.
See the documentation here.
name=Consult%F3rio
This is the good old ISO-8859-1 encoding for Consultório of the early days of the web. If the decoded version renders incorrectly, it's very likely that your application is not using ISO-8859-1 at all, thus there's no benefit in using it there either. If your app is using UTF-8, the simplest solution would be to switch entirely to UTF-8:
Consult%C3%B3rio
This is basically what you get with any builtin PHP function when fed with UTF-8 data because they work at byte level:
var_dump(rawurlencode('Consultório')); // string(16) "Consult%C3%B3rio"
If this happens to be third-party data you can't control, please check Martin's answer.

get_meta_tags and persian phrases

I used this function,
$code = get_meta_tags('http://www.narenji.ir/');
and I've seen this
'مکانی برای آشنایی با ابزارها Ùˆ اخبار داغ دنیای Ùناوری'
How can I fix this issue?
Can I fix it without using JSON?
You must be missing some link here, your code just works:
Example
The key point is that you preserve the UTF-8 encoding so that Persian is supported. Otherwise you would need some other encoding (one that I do not yet know) that supports Persian and a library that is able to re-encode that.
Which encoding do you want to use for Persian output?
If you are executing your script from a browser, make sure you sending UTF-8 as your content encoding. Add a Content-Type header before echo'ing anything.
header('Content-Type:text/html; charset=utf-8');
utf8_decode() is built specifically for converting from UTF-8 to ISO-8859-1 (latin1). Persian characters are not in Latin1, so why would you feel it's necessary here??
working example: http://codepad.viper-7.com/tEjZAz

Unicode encoding in php with Hebrew

i am trying to get some information from a webpage however it is in a different encoding is there an easy way to convert to utf8 and then use it?
For example i am getting these urls which i will need to visit
http://www.mega.co.il/jsfweb/cat/טופו/
http://www.mega.co.il/jsfweb/cat/גבינה_מלוחה/
http://www.mega.co.il/jsfweb/cat/גבינה_לארוח/
http://www.mega.co.il/jsfweb/cat/גבינה_מותכת/
http://www.mega.co.il/jsfweb/cat/גבינה_צהובה/
http://www.mega.co.il/jsfweb/cat/גבינה_לבנה/
http://www.mega.co.il/jsfweb/cat/קוטג/
how do i turn that to utf8 and then urlencode in php?
You can try function html_entity_decode() to decode that entities. To change decoding, use mb_convert_encoding(). I have no experience with Hebrew, so I don't know if it would work.

How do I decode the following string in PHP?

Does anyone know how to properly decode the following string in PHP?
=?iso-8859-1?Q?OLG=20Slots=20at=20Windsor=20=26=20Caesars=20Windsor=20w?=
I tried using
quoted_printable_decode()
but did not produce the desired result.
This string retrieved from an email header. This above string is the "Subject". It appears that email clients (both web-based and applications) are able to decode the string properly.
Thanks for your time!
It is not url_encoded, instead try this :
$subject = '=?iso-8859-1?Q?OLG=20Slots=20at=20Windsor=20=26=20Caesars=20Windsor=20w?=';
echo utf8_decode(imap_utf8($subject));
Manual
The first bit suggests it's encoded in ISO-8859-1, which is, if I'm reading Wikipedia correctly, standard ASCII.
This means that you probably don't need to decode the string, you just need to understand it ;-)
Where did you find it? What do you think it's meaning might be? (eg is it submitted form data, some sort of RPC encoding, something else?)
Thank you! Just seen your edit. Have you tried b64 decoding it? At a guess it's base64 encoded ASCII. Try base64_decode().

How to detact the encoding using mb_detect_encoding correct?

I want to detect encoding correct, but i found mb_detect_encoding always get error result, And I added lots of encoding_list UTF8 ISO-8859-* ....
You are trying to do something that only sometimes works. Encoding detection is not a exact "science" so the best thing you can do is to avoid it.

Categories