Convert UTF8 Text and Numbers to UTF16-BE for Clickatell - php

I use an online sms service (Clickatell) for a web app i use. My main language is Greek, so i need to convert in my php file the sms text to UTF-16BE before i send it. For example i need to convert the text
"Το ραντεβού σας έχει μεταφερθεί στις 12-12-2016 και ώρα 18:25"
to
03a403bf002003c103b103bd03c403b503b203bf03cd002003c303b103c2002003ad03c703b503b9002003bc03b503c403b103c603b503c103b803b503af002003c303c403b903c2002000310032002d00310032002d0032003000310036002003ba03b103b9002003ce03c103b1002000310038003a00320035
I need to conver everything inluding spaces, symbols and numbers.
I have found a few php commands but they are converting only the text.
$text=strtoupper(str_replace(array('"', '\u'), array('',''), json_encode('Το ραντεβού σας έχει μεταφερθεί στις 12-12-2016 και ώρα 18:25')));
When using the above code I get the below result:
03A403BF 03C103B103BD03C403B503B203BF03CD 03C303B103C2 03AD03C703B503B9 03BC03B503C403B103C603B503C103B803B503AF 03C303C403B903C2 12-12-2016 03BA03B103B9 03CE03C103B1 18:25
If you notice the date and time as well as all the spaces are not in unicode.
Can anyone tell me how to get my whole phrase in unicode? How can i do this with php?
Thank you in advance

I'm not sure what you mean by "they are converting only the text", but if you're looking to convert a UTF-8 string to UTF-16BE, then you can try:
iconv('UTF-8', 'UTF-16BE', $string);
or..
mb_convert_encoding($string, 'UTF-16BE', 'UTF-8');
Edit:
Since you've shared some code now, your technique for conversion is not sound, unless you really want it represented like you have it. Your result is basically a hex representation of the individual bytes, but not the bytes themselves.
Edit 2:
If you genuinely need it in the format specified, the following will do it for you:
$string = iconv('UTF-8', 'UTF-16BE', $string); // .. or mb_convert_encoding
$converted = '';
for ($i = 0; $i < strlen($string); $i++) {
$converted .= sprintf('%02X', ord($string[$i]));
}

Related

replace multibyte utf8 character in php

I am trying to preg_replace the multibytecharacter for euro in UTF (shown as ⬠in my html) to a "$" and the * for an "#"
$orig = "2 **** reviews ⬠19,99 price";
$orig = mb_ereg_replace(mb_convert_encoding('€', 'UTF-8', 'HTML-ENTITIES'), "$", $orig);
$orig = preg_replace("/[\$\;\?\!\{\}\(\)\[\]\/\*\>\<]/", "#", $orig);
$a = htmlentities($orig);
$b = html_entity_decode($a);
The "*" are being replaced but not the "â¬" .......
Also tried to replace it with
$orig = preg_replace("/[\xe2\x82\xac]/", "$", $orig);
Doesn't convert either....
Another plan which didnt work:
$orig= mb_ereg_replace(mb_convert_encoding('€', 'UTF-8', 'HTML-ENTITIES'), "$", $orig);
Brrr someone knows how to get rid of this utf8 euro character:
echo html_entity_decode('€');
(driving me nuts)
This could be caused by two reasons:
The actual source text is UTF8 encoded, but your PHP code not.
You can solve this by just using this line and save your file UTF8 encoded (try using notepad++).
str_replace('€', '$', $source);
The source text is corrupted: multibyte characters are converted to latin1 (wrong database charset?). You can try to convert them back to latin1:
str_replace('€', '$', utf8_decode($source))
Pasting my comment here as an answer so you can mark it!
Wouldn't
str_replace(html_entity_decode('€'), '$', $source)
work?
In your $orig string you do not have euro sign.
When I run this php file:
<?php
$orig = "â¬";
for($i=0; $i<strlen($orig); $i++)
echo '0x' . dechex(ord($orig{$i})) . ' ';
?>
If saved as utf-8 I get: 0xc3 0xa2 0xc2 0xac
If saved as latin-1 I get: 0xe2 0xac
In any case it is not € sign which is:0xE2 0x82 0xAC or unicode \u20AC ( http://www.fileformat.info/info/unicode/char/20ac/index.htm ).
0x82 is missing!!!!!
Run this program above, see what do you get and use this hex values to get rid of â¬.
For real € sign this works:
<?php
$orig = html_entity_decode('€', ENT_COMPAT, 'UTF-8');
$dest = preg_replace('~\x{20ac}~u', '$', $orig);
echo "($orig) ($dest)";
?>
BTW if UTF-8 file containing € is displayed as latin-1 you should get:
€ and not ⬠as in your example.
So in fact, you have problems with encoding and conversion between encodings. If you try to save € in latin1 middle character will be lost (for example my Komodo will alert me and then replace ‚ with ?). In other words, you somehow damaged your € sign - and then you tried to replace it as it was complete. :D

How to convert Arabic text to Hex using PHP

Kindly I need to convert the Arabic text to and from Hexadecimal like the following example Using PHP
مرحبا
06450631062D06280627
Regards,
Eco
If you just need to have the Arabic text written in the HTML document begin generated, I think the simplest way is to convert the sequence to character references, turning e.g. 0645 to م. This could be done as follows:
<?php
$str = '06450631062D06280627';
for($i = 0; $i < strlen($str)/4; $i++) {
echo "&#x", substr($str, 4*$i, 4), ";";
}
?>
I get a unicode string with following code.
$str = "Some Hexa String";
$replacedString = preg_replace("/\\\\u([0-9abcdef]{4})/", "&#x$1;", $str);
$unicodeString = mb_convert_encoding($replacedString, 'UTF-8', 'HTML-ENTITIES');
bin2hex($str); // Bin to Hex
pack("H*", $hexStr); // Hex to Bin

php non latin to hex function

I have website that's in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...
I need a function that convert this
"māja un man tā patīk"
to
"māja un man tā patīk"
and that does not touch html, so if there is <b> it needs to stay as <b>, not > or <
And please no advices about the encoding and how wrong that is.
$str = "<b>Obāchan</b> おばあちゃん";
$str = preg_replace_callback('/./u', function ($matches) {
$chr = $matches[0];
if (strlen($chr) > 1) {
$chr = mb_convert_encoding($chr, 'HTML-ENTITIES', 'UTF-8');
}
return $chr;
}, $str);
This expects the original $str to be UTF-8 encoded, i.e. your PHP file should be saved in UTF-8. It encodes all non-ASCII compatible code points to HTML entities. Since all HTML special characters are ASCII characters, they remain untouched. The resulting string is pure ASCII. Since the lower Win-1251 code points are ASCII compatible, the resulting string is also a valid Win-1251 string. The above $str converts to:
<b>Obāchan</b> おばあちゃん
The main things you probably don't want to encode are <, > and &. Those are really the only special characters. So how about encoding everything first, and then just decode <, > and & I feel you should be fine.
This is untested:
$output =
htmlspecialchars_decode(
htmlentities($input, ENT_NOQUOTES, 'CP-1251')
);
let me know
What Evert suggest looks logical to me too! If you insist this is a way to do it if there are only two letters that bother you. For more letters the scrit will not be as effective and needs to change.
<?PHP
function myConvert($str)
{
$chars['ā']='ā';
$chars['ī']='ī';
foreach ($chars as $key => $value)
$output = str_replace($key, $value, $str);
echo $str;
}
myConvert("māja un man tā patīk");
?>
==================edited==============
For many characters maybe this one can help you:
<?PHP
function myConvert($str)
{
$final=null;
$parts = preg_split("/&#[0-9]*;/i", $str);//get all text parts
preg_match_all("/&#[0-9]*;/i", $str, $delimiters );//get delimiters;
$delimiters[0][]='';//make arrays equal size
foreach($parts as $key => $value)
$final.=$value.mb_convert_encoding
($delimiters[0][$key], "UTF-8", "HTML-ENTITIES");
return $final;
}
$fh = fopen("testFile.txt", 'w') ;
fwrite($fh, myConvert("māja un man tā patīkī"));
fclose($fh);
?>
The desired output is written in the text file. This code, exactly as it is -not merged in some project- does what it claims to do. Converts codes like ā to the analogous character they present.

How to convert some multibyte characters into its numeric html entity using PHP?

Test string:
$s = "convert this: ";
$s .= "–, —, †, ‡, •, ≤, ≥, μ, ₪, ©, ® y ™, ⅓, ⅔, ⅛, ⅜, ⅝, ⅞, ™, Ω, ℮, ∑, ⌂, ♀, ♂ ";
$s .= "but, not convert ordinary characters to entities";
$encoded = mb_convert_encoding($s, 'HTML-ENTITIES', 'UTF-8');
asssuming your input string is UTF-8, this should encode most everything into numeric entities.
Well htmlentities doesn't work correctly. Fortunately someone has posted code on the php website that seems to do the translation of multibyte characters properly
I did work on decoding ascii into html coded text (&#xxxx). https://github.com/hellonearthis/ascii2web

Char Encoding: Changing file from MacRoman to UTF-8 breaks string

I am working on a CakePHP site saved in MacRoman char encoding. I want to change all the files to UTF-8 for internationalisation. For all the other files in the site this works fine. However, in the core.php file there is a security salt, which is a string with special characters ("!:* etc.). When I save this file as UTF-8 the salt gets corrupted. I can roll this back with git, but it's an annoyance.
Does anyone know how I can convert the string from MacRoman to UTF-8?
You don't give enough information to confirm this, but I guess the salt is used in its binary form. In that case, changing the encoding of the file will corrupt the salt if this binary stream is changed, even if the characters are correctly converted.
Since the first 128 characters are similar in UTF-8 and Mac OS Roman, you don't have to worry if the salt is written using only these characters.
Let's say the salt is somewhere:
$salt = "a!c‡Œ";
You could write instead:
$salt = "a!c\xE0\xCE";
You could map all to their hexadecimal representation, as it might be easier to automate:
$salt = "\x61\x21\x63\xE0\xCE";
See the table here.
The following snippet can automate this conversion:
$res = "";
foreach (str_split($salt) as $c) {
$res .= "\\x".dechex(ord($c));
}
echo $res;
Thanks for the input, pointed me in the right direction. The solution is:
$salt = iconv('UTF-8', 'macintosh', $string);
For those who do not have access to iconv here is a function in PHP:
http://sebastienguillon.com/test/jeux-de-caracteres/MacRoman_to_utf8.txt.php
It will properly convert MacRoman text to UTF-8 and you can even decide how you want to break ligatures.
<?php
function MacRoman_to_utf8($str, $break_ligatures='none')
{
// $break_ligatures : 'none' | 'fifl' | 'all'
// 'none' : don't break any MacRoman ligatures, transform them into their utf-8 counterparts
// 'fifl' : break only fi ("\xDE" => "fi") and fl ("\xDF"=>"fl")
// 'all' : break fi, fl and also AE ("\xAE"=>"AE"), ae ("\xBE"=>"ae"), OE ("\xCE"=>"OE") and oe ("\xCF"=>"oe")
if($break_ligatures == 'fifl')
{
$str = strtr($str, array("\xDE"=>"fi", "\xDF"=>"fl"));
}
if($break_ligatures == 'all')
{
$str = strtr($str, array("\xDE"=>"fi", "\xDF"=>"fl", "\xAE"=>"AE", "\xBE"=>"ae", "\xCE"=>"OE", "\xCF"=>"oe"));
}
$str = strtr($str, array("\x7F"=>"\x20", "\x80"=>"\xC3\x84", "\x81"=>"\xC3\x85",
"\x82"=>"\xC3\x87", "\x83"=>"\xC3\x89", "\x84"=>"\xC3\x91", "\x85"=>"\xC3\x96",
"\x86"=>"\xC3\x9C", "\x87"=>"\xC3\xA1", "\x88"=>"\xC3\xA0", "\x89"=>"\xC3\xA2",
"\x8A"=>"\xC3\xA4", "\x8B"=>"\xC3\xA3", "\x8C"=>"\xC3\xA5", "\x8D"=>"\xC3\xA7",
"\x8E"=>"\xC3\xA9", "\x8F"=>"\xC3\xA8", "\x90"=>"\xC3\xAA", "\x91"=>"\xC3\xAB",
"\x92"=>"\xC3\xAD", "\x93"=>"\xC3\xAC", "\x94"=>"\xC3\xAE", "\x95"=>"\xC3\xAF",
"\x96"=>"\xC3\xB1", "\x97"=>"\xC3\xB3", "\x98"=>"\xC3\xB2", "\x99"=>"\xC3\xB4",
"\x9A"=>"\xC3\xB6", "\x9B"=>"\xC3\xB5", "\x9C"=>"\xC3\xBA", "\x9D"=>"\xC3\xB9",
"\x9E"=>"\xC3\xBB", "\x9F"=>"\xC3\xBC", "\xA0"=>"\xE2\x80\xA0", "\xA1"=>"\xC2\xB0",
"\xA2"=>"\xC2\xA2", "\xA3"=>"\xC2\xA3", "\xA4"=>"\xC2\xA7", "\xA5"=>"\xE2\x80\xA2",
"\xA6"=>"\xC2\xB6", "\xA7"=>"\xC3\x9F", "\xA8"=>"\xC2\xAE", "\xA9"=>"\xC2\xA9",
"\xAA"=>"\xE2\x84\xA2", "\xAB"=>"\xC2\xB4", "\xAC"=>"\xC2\xA8", "\xAD"=>"\xE2\x89\xA0",
"\xAE"=>"\xC3\x86", "\xAF"=>"\xC3\x98", "\xB0"=>"\xE2\x88\x9E", "\xB1"=>"\xC2\xB1",
"\xB2"=>"\xE2\x89\xA4", "\xB3"=>"\xE2\x89\xA5", "\xB4"=>"\xC2\xA5", "\xB5"=>"\xC2\xB5",
"\xB6"=>"\xE2\x88\x82", "\xB7"=>"\xE2\x88\x91", "\xB8"=>"\xE2\x88\x8F", "\xB9"=>"\xCF\x80",
"\xBA"=>"\xE2\x88\xAB", "\xBB"=>"\xC2\xAA", "\xBC"=>"\xC2\xBA", "\xBD"=>"\xCE\xA9",
"\xBE"=>"\xC3\xA6", "\xBF"=>"\xC3\xB8", "\xC0"=>"\xC2\xBF", "\xC1"=>"\xC2\xA1",
"\xC2"=>"\xC2\xAC", "\xC3"=>"\xE2\x88\x9A", "\xC4"=>"\xC6\x92", "\xC5"=>"\xE2\x89\x88",
"\xC6"=>"\xE2\x88\x86", "\xC7"=>"\xC2\xAB", "\xC8"=>"\xC2\xBB", "\xC9"=>"\xE2\x80\xA6",
"\xCA"=>"\xC2\xA0", "\xCB"=>"\xC3\x80", "\xCC"=>"\xC3\x83", "\xCD"=>"\xC3\x95",
"\xCE"=>"\xC5\x92", "\xCF"=>"\xC5\x93", "\xD0"=>"\xE2\x80\x93", "\xD1"=>"\xE2\x80\x94",
"\xD2"=>"\xE2\x80\x9C", "\xD3"=>"\xE2\x80\x9D", "\xD4"=>"\xE2\x80\x98", "\xD5"=>"\xE2\x80\x99",
"\xD6"=>"\xC3\xB7", "\xD7"=>"\xE2\x97\x8A", "\xD8"=>"\xC3\xBF", "\xD9"=>"\xC5\xB8",
"\xDA"=>"\xE2\x81\x84", "\xDB"=>"\xE2\x82\xAC", "\xDC"=>"\xE2\x80\xB9", "\xDD"=>"\xE2\x80\xBA",
"\xDE"=>"\xEF\xAC\x81", "\xDF"=>"\xEF\xAC\x82", "\xE0"=>"\xE2\x80\xA1", "\xE1"=>"\xC2\xB7",
"\xE2"=>"\xE2\x80\x9A", "\xE3"=>"\xE2\x80\x9E", "\xE4"=>"\xE2\x80\xB0", "\xE5"=>"\xC3\x82",
"\xE6"=>"\xC3\x8A", "\xE7"=>"\xC3\x81", "\xE8"=>"\xC3\x8B", "\xE9"=>"\xC3\x88",
"\xEA"=>"\xC3\x8D", "\xEB"=>"\xC3\x8E", "\xEC"=>"\xC3\x8F", "\xED"=>"\xC3\x8C",
"\xEE"=>"\xC3\x93", "\xEF"=>"\xC3\x94", "\xF0"=>"\xEF\xA3\xBF", "\xF1"=>"\xC3\x92",
"\xF2"=>"\xC3\x9A", "\xF3"=>"\xC3\x9B", "\xF4"=>"\xC3\x99", "\xF5"=>"\xC4\xB1",
"\xF6"=>"\xCB\x86", "\xF7"=>"\xCB\x9C", "\xF8"=>"\xC2\xAF", "\xF9"=>"\xCB\x98",
"\xFA"=>"\xCB\x99", "\xFB"=>"\xCB\x9A", "\xFC"=>"\xC2\xB8", "\xFD"=>"\xCB\x9D",
"\xFE"=>"\xCB\x9B", "\xFF"=>"\xCB\x87", "\x00"=>"\x20", "\x01"=>"\x20",
"\x02"=>"\x20", "\x03"=>"\x20", "\x04"=>"\x20", "\x05"=>"\x20",
"\x06"=>"\x20", "\x07"=>"\x20", "\x08"=>"\x20", "\x0B"=>"\x20",
"\x0C"=>"\x20", "\x0E"=>"\x20", "\x0F"=>"\x20", "\x10"=>"\x20",
"\x11"=>"\x20", "\x12"=>"\x20", "\x13"=>"\x20", "\x14"=>"\x20",
"\x15"=>"\x20", "\x16"=>"\x20", "\x17"=>"\x20", "\x18"=>"\x20",
"\x19"=>"\x20", "\x1A"=>"\x20", "\x1B"=>"\x20", "\x1C"=>"\x20",
"\1D"=>"\x20", "\x1E"=>"\x20", "\x1F"=>"\x20", "\xF0"=>""));
return $str;
}
?>
Have you tried mb-convert-encoding ?
Think it would be:
$str = mb_convert_encoding($str, "macintosh", "UTF-8");
Just curious, have you tried copying the salt, saving as UTF-8 and then pasting the salt back in place and saving again?

Categories