How to Convert Html Codes to Relevant Unicode Characters - php

Actually, I have googled a Lot, And I have explored this forum too, but this is my second day, and I could not find the solution.
My Problem is that I want to convert the Html Codes
باخ
to its equallent unicode characters
خ ا ب
Actually I do not want to convert all the html symbols to unicode characters. I only want to convert the arabic / urdu html code to unicode characters. The range of these characters is from ؛ To ۹ If there is no any PHP function then How can I replace the codes with their equallent unicode character in one go?

I think you're looking for:
html_entity_decode('باخ', ENT_QUOTES, 'UTF-8');
When you go from ب to ب, that's called decoding. Doing the opposite is called encoding.
As for replacing only characters from ؛ to ۹ maybe try something like this.
<?php
// Random set of entities, two are outside the 1563 - 1785 range.
$entities = '؛؜<لñ۸۹';
// Matches entities from 1500 to 1799, not perfect, I know.
preg_match_all('/&#1[5-7][0-9]{2};/', $entities, $matches);
$entityRegex = array(); // Will hold the entity code regular expression.
$decodedCharacters = array(); // Will hold the decoded characters.
foreach ($matches[0] as $entity)
{
// Convert the entity to human-readable character.
$unicodeCharacter = html_entity_decode($entity, ENT_QUOTES, 'UTF-8');
array_push($entityRegex, "/$entity/");
array_push($decodedCharacters, $unicodeCharacter);
}
// Replace all of the matched entities with the human-readable character.
$replaced = preg_replace($entityRegex, $decodedCharacters, $entities);
?>
That's as close as I can get to solving this. Hopefully, this helps a little. It's 5:00am where I am now, so I'm off to sleep! :)

did you try the utf-8 encoding in html head?
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

try this
<?php
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
foreach($trans_tbl as $k => $v)
{
$ttr[$v] = utf8_encode($k);
}
$text = 'بب....;&#1582';
$text = strtr($text, $ttr);
echo $text;
?>
for mysql solution you can set the character set as
$mysqli = new mysqli($host, $user, $pass, $db);
if (!$mysqli->set_charset("utf8")) {
die("error");
}

Related

create URL slugs for chinese characters. Using PHP

My users sometimes use chinese characters for the title of their input.
My slugs are in the format of /stories/:id-:name where an example could be /stories/1-i-love-php.
How do I allow chinese characters?
I have googled and found the japanese version of this answer over here.
Don't quite understand Japanese, so I am asking about the chinese version.
Thank you.
i have tested in Bengali characters
it may work. try this:
at first the coded page (write code where in the page) have to convert into encoding type in UTF-8, then write code.
code here:
function to_slug($string, $separator = '-') {
$re = "/(\\s|\\".$separator.")+/mu";
$str = #trim($string);
$subst = $separator;
$result = preg_replace($re, $subst, $str);
return $result;
}
$id=34;
$string_text="আড়াইহাজারে দেড় বছরের --- শিশুর -গলায় ছুরি";
$base_url="http://example.com/";
echo $target_url=$base_url.$id."-". #to_slug($string_text);
var_dump($target_url);
output:
http://example.com/34-আড়াইহাজারে-দেড়-বছরের-শিশুর-গলায়-ছুরি
string 'http://example.com/34-আড়াইহাজারে-দেড়-বছরের-শিশুর-গলায়-ছুরি' (length=136)

HTML Special Characters (foreign languages)

Basically I have this string:
Český, Deutsch, English (US), Español (ES), Français (France), Italiano, 日本語, 한국어, Polski, 中文(繁體)
And I want to convert it into all possible HTML entities (there might be russian characters too!).
I've tried to make different "htmlspecialchars" and "htmlentities" function with different charsets but it returns empty strings...
$l = htmlentities("Český, Deutsch, English (US), Español (ES), Français (France), Italiano, 日本語, 한국어, Polski, 中文(繁體) €", ENT_COMPAT, "BIG5-HKSCS");
$l = htmlentities($l, ENT_COMPAT, "KOI8-R");
$l = htmlentities($l, ENT_COMPAT, "EUC-JP");
$l = htmlentities($l, ENT_COMPAT, "Shift_JIS");
$l = htmlentities($l, ENT_COMPAT, "Shift_JIS");
echo $l;
returns an empty string.
Any help?
Here's my "unutf8" function, which converts all UTF8 characters into HTML entities of the form 〹
function unutf8($str) {
return preg_replace_callback("([\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3}|[\xF8-\xFB][\x80-\xBF]{4}|[\xFC-\xFD][\x80-\xBF]{5})",
function($m) {
$c = $m[0];
$out = bindec(ltrim(decbin(ord($c[0])),"1"));
$l = strlen($c);
for( $i=1; $i<$l; $i++) {
$out = ($out<<6) | bindec(ltrim(decbin(ord($c[$i])),"1"));
}
if( $out < 256) return chr($out);
return "&#".$out.";";
},$str);
}
It parses the string for valid UTF8 character sequences and converts the multi-byte sequence into the ordinal value of the character. It's very messy and I don't expect to win any awards for good coding with this, but it works.
Please note, however, that if you have unencoded characters then you WILL run into problems. For example, if for some reason you have é©© then the result will be 驩. Please make sure your string is valid UTF8 before passing it to the function.
Use header to modify the HTTP header to utf-8:
header('Content-Type: text/html; charset=utf-8');
Also, make sure your HTML document is also in utf-8:
<meta http-equiv="Content-type" content="text/html" charset="utf-8" />
Don't go for tough solutions and just follow this small and simple steps :
1) mysql_set_charset("utf8", $conn); set this with your config connection code.
or
2) mysql_query("SET NAMES 'UTF8'");
enter your query here........
mysql_set_charset("UTF8", queryResult);

php non latin to hex function

I have website that's in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...
I need a function that convert this
"māja un man tā patīk"
to
"māja un man tā patīk"
and that does not touch html, so if there is <b> it needs to stay as <b>, not > or <
And please no advices about the encoding and how wrong that is.
$str = "<b>Obāchan</b> おばあちゃん";
$str = preg_replace_callback('/./u', function ($matches) {
$chr = $matches[0];
if (strlen($chr) > 1) {
$chr = mb_convert_encoding($chr, 'HTML-ENTITIES', 'UTF-8');
}
return $chr;
}, $str);
This expects the original $str to be UTF-8 encoded, i.e. your PHP file should be saved in UTF-8. It encodes all non-ASCII compatible code points to HTML entities. Since all HTML special characters are ASCII characters, they remain untouched. The resulting string is pure ASCII. Since the lower Win-1251 code points are ASCII compatible, the resulting string is also a valid Win-1251 string. The above $str converts to:
<b>Obāchan</b> おばあちゃん
The main things you probably don't want to encode are <, > and &. Those are really the only special characters. So how about encoding everything first, and then just decode <, > and & I feel you should be fine.
This is untested:
$output =
htmlspecialchars_decode(
htmlentities($input, ENT_NOQUOTES, 'CP-1251')
);
let me know
What Evert suggest looks logical to me too! If you insist this is a way to do it if there are only two letters that bother you. For more letters the scrit will not be as effective and needs to change.
<?PHP
function myConvert($str)
{
$chars['ā']='ā';
$chars['ī']='ī';
foreach ($chars as $key => $value)
$output = str_replace($key, $value, $str);
echo $str;
}
myConvert("māja un man tā patīk");
?>
==================edited==============
For many characters maybe this one can help you:
<?PHP
function myConvert($str)
{
$final=null;
$parts = preg_split("/&#[0-9]*;/i", $str);//get all text parts
preg_match_all("/&#[0-9]*;/i", $str, $delimiters );//get delimiters;
$delimiters[0][]='';//make arrays equal size
foreach($parts as $key => $value)
$final.=$value.mb_convert_encoding
($delimiters[0][$key], "UTF-8", "HTML-ENTITIES");
return $final;
}
$fh = fopen("testFile.txt", 'w') ;
fwrite($fh, myConvert("māja un man tā patīkī"));
fclose($fh);
?>
The desired output is written in the text file. This code, exactly as it is -not merged in some project- does what it claims to do. Converts codes like ā to the analogous character they present.

How do I convert arabic letters in htmlentities symbols?

I need convert arabic letters in htmlentities symbols. Codepage: ISO-8859-1.
سك - this is arabic symbol for example.
htmlentities("سك")
returns:
س�
How can I get from this symbol the html-entities سك?
htmlentities() can do only characters that have named entities. See this question on how to convert arbitrary characters into numeric entities.
You're probably not targeting the correct charset. Try: htmlentities('سك', ENT_QUOTES, 'UTF-8');
i'm using a function to make sure there are no html code or cotation posted by user
function cleartext($x1){
$x1 = str_replace('"','',$x1);
$x1 = str_replace("'",'',$x1);
$x1 = htmlentities($x1, ENT_QUOTES, 'UTF-8');
return $x1;
}
so thank for ( ENT_QUOTES, 'UTF-8' ) it helped me to find what am looking for

Problem in UTF Encoding in PHP

I use the following lines of code:
$revTerm = "". strrev($limitAry["term"]);
$revTerm = utf8_encode($revTerm);
The $revTerm contains Norwegian characters as ø æ å. However, it is shown correctly. I need to reverse them before displaying, so I use the first line.
When I display them this way, I get an error of bad xml format - used to fill a grid.
When I try to use the second line, I don't get an error but the characters are not shown correctly. Could there be any other way to solve that?
If it may help, I use jqGrid to fill those data in.
strrev, like most PHP string functions, is not safe for multi-byte encodings.
try this example
$test = 'А роза упала на лапу Азора ウィキ';
$test = iconv('utf-8', 'utf-16le', $test);
$test = strrev($test);
// キィウ арозА упал ан алапу азор А
echo iconv('utf-16be', 'utf-8', $test);
(russian)
http://bolknote.ru/2012/04/02/~3625#56
Try this:
$revTerm = utf8_decode($limitAry["term"]);
$revTerm = strrev($revTerm);
$revTerm = utf8_encode($revTerm);
For using strrev you have to decode your string to a non-multibyte string.

Categories