Unicode encoding in php with Hebrew - php

i am trying to get some information from a webpage however it is in a different encoding is there an easy way to convert to utf8 and then use it?
For example i am getting these urls which i will need to visit
http://www.mega.co.il/jsfweb/cat/טופו/
http://www.mega.co.il/jsfweb/cat/גבינה_מלוחה/
http://www.mega.co.il/jsfweb/cat/גבינה_לארוח/
http://www.mega.co.il/jsfweb/cat/גבינה_מותכת/
http://www.mega.co.il/jsfweb/cat/גבינה_צהובה/
http://www.mega.co.il/jsfweb/cat/גבינה_לבנה/
http://www.mega.co.il/jsfweb/cat/קוטג/
how do i turn that to utf8 and then urlencode in php?

You can try function html_entity_decode() to decode that entities. To change decoding, use mb_convert_encoding(). I have no experience with Hebrew, so I don't know if it would work.

Related

Decoding strings on PHP: Data treats UTF-8 Bytes as Windows-1252

I am getting data from a web API which has a strange encoding. I am using PHP and can't seem to decode input strings. I seem to be having this problem, which explains what's going but doesn't really help me figure out how to fix it.
Can anyone help?
Thanks!
You may want to try analyzing the encoding using something like mb_detect_encoding().
http://www.php.net/mb_detect_encoding
You can use mb_detect_encoding() to detect the encoding of the strings.
If they are not what you are expecting, you can use mb_convert_encoding() to convert to something like UTF-8 or whatever you want.

PHP convert specific character

I have string like this - "Divizia NaÅ¢ionalÄ", and want to convert it to "Divizia Naţională", which is correct string. What heppand, how to remove this special characters? Hich PHP function I can use? Now to convert it to " Devizia Nationala" which is best readable for everyone?
Use UTF8
Be sure your data source is UTF8
Learn how to force browsers to display UTF8
Don't use other encodings, convert everything to UTF8 as soon as possible
Your best support on this mission is iconv()

get_meta_tags and persian phrases

I used this function,
$code = get_meta_tags('http://www.narenji.ir/');
and I've seen this
'مکانی برای آشنایی با ابزارها Ùˆ اخبار داغ دنیای Ùناوری'
How can I fix this issue?
Can I fix it without using JSON?
You must be missing some link here, your code just works:
Example
The key point is that you preserve the UTF-8 encoding so that Persian is supported. Otherwise you would need some other encoding (one that I do not yet know) that supports Persian and a library that is able to re-encode that.
Which encoding do you want to use for Persian output?
If you are executing your script from a browser, make sure you sending UTF-8 as your content encoding. Add a Content-Type header before echo'ing anything.
header('Content-Type:text/html; charset=utf-8');
utf8_decode() is built specifically for converting from UTF-8 to ISO-8859-1 (latin1). Persian characters are not in Latin1, so why would you feel it's necessary here??
working example: http://codepad.viper-7.com/tEjZAz

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

decoding ISO characters

I got Chinese characters encoded in ISO-8859-1, for example 兼 = 兼
Those characters are taken form the database using AJAX and sent by Json using json_encode.
I then use the template Handlebars to set the data on the page.
When I look at the ajax page the characters are displayed correctly, the source is still encoded.
But the final result displays the encrypted characters.
I tried to decode on the javascript part with unescape but there is no foreach with the template that gives me the possibility to decode the specific variable, so it crashes.
I tried to decode on the PHP side with htmlspecialchars_decode but without success.
Both pages are encoded in ISO-8859-1, but I can change them in UTF8 if necessary, but the data in the database remains encoded in ISO-8859-1.
Thank you for your help.
You're simply representing your characters in HTML entities. If you want them as "actual characters", you'll need to use an encoding that can represent those characters, ISO-8859 won't do. htmlspecialchars_decode doesn't work because it only decodes a handful of characters that are special in HTML and leaves other characters alone. You'll need html_entity_decode to decode all entities, and you'll need to provide it with a character set to decode to which can handle Chinese characters, UTF-8 being the obvious best choice:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
You'll then need to make sure the browser knows that you're sending it UTF-8. If you want to store the text in the database in UTF-8 as well (which you really should), best follow the guide How to handle UTF-8 in a web app which explains all the pitfalls.
Are you including your text with the "double-stache" Handlebars syntax?
{{your expression}}
As the Handlebars documentation mentions, that syntax HTML-escapes its output, which would cause the results you're mentioning, where you're seeing the entity 兼 instead of 兼.
Using three braces instead ("triple-stache") won't escape the output and will let the browser correctly interpet those numeric entities:
{{{your expression}}}

Categories