I use JSON to encode an array, and I get a string like this:
{"name":"\u00fe\u00fd\u00f0\u00f6\u00e7"}
Now I need to convert this to ISO-8859-9. I tried the following but it fails:
header('Content-type: application/json; charset=ISO-8859-9');
$json = json_encode($response);
$json = utf8_decode($json);
$json = mb_convert_encoding($json, "ISO-8859-9", "auto");
echo $json;
It doesnt seem to work. What am I missing?
Thank you for your time.
You can do:
$json = json_encode($response);
header('Content-type: application/json; charset=ISO-8859-9');
echo mb_convert_encoding($json, "ISO-8859-9", "UTF-8");
Assuming that strings in $response is in utf-8. But I would strongly suggest that you just use utf-8 all the way through.
Edit: Sorry, just realised that won't work, since json_encode escapes unicode points as javascript escape codes. You'll have to convert these to utf-8 sequences first. I don't think there are any built-in functionality for that, but you can use a slightly modified variation of this library to get there. Try the following:
function unicode_hex_to_utf8($hexcode) {
$arr = array(hexdec(substr($hexcode[1], 0, 2)), hexdec(substr($hexcode[1], 2, 2)));
$dest = '';
foreach ($arr as $src) {
if ($src < 0) {
return false;
} elseif ( $src <= 0x007f) {
$dest .= chr($src);
} elseif ($src <= 0x07ff) {
$dest .= chr(0xc0 | ($src >> 6));
$dest .= chr(0x80 | ($src & 0x003f));
} elseif ($src == 0xFEFF) {
// nop -- zap the BOM
} elseif ($src >= 0xD800 && $src <= 0xDFFF) {
// found a surrogate
return false;
} elseif ($src <= 0xffff) {
$dest .= chr(0xe0 | ($src >> 12));
$dest .= chr(0x80 | (($src >> 6) & 0x003f));
$dest .= chr(0x80 | ($src & 0x003f));
} elseif ($src <= 0x10ffff) {
$dest .= chr(0xf0 | ($src >> 18));
$dest .= chr(0x80 | (($src >> 12) & 0x3f));
$dest .= chr(0x80 | (($src >> 6) & 0x3f));
$dest .= chr(0x80 | ($src & 0x3f));
} else {
// out of range
return false;
}
}
return $dest;
}
print mb_convert_encoding(
preg_replace_callback(
"~\\\\u([1234567890abcdef]{4})~", 'unicode_hex_to_utf8',
json_encode($response)),
"ISO-8859-9", "UTF-8");
As you can see on the PHP documentation site JSON encoding/decoding functions only work with utf8 encoding, so trying to change this can cause you some data problems, you may get unexpected and wrong results.
Related
I'm writing an attribute to an HDF5 file using UTF-8 encoding. As an example, I've written "äöüß" to the attribute "notes" in the file.
I'm now trying to parse the output of h5ls (or h5dump) to extract this data back. Either tool gives me an output like this:
ATTRIBUTE "notes" {
DATATYPE H5T_STRING {
STRSIZE 8;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): "\37777777703\37777777644\37777777703\37777777666\37777777703\37777777674\37777777703\37777777637"
}
}
I'm aware that, e.g., \37777777703\37777777644 somehow encodes ä as 0xC3 0xA4, however, I have a really hard time coming up with how this encoding works.
What's the magic formula behind this and how can I properly decode it back into äöüß?
The strings are encoded using base 8. I've decoded them in the PHP backend using:
$line = "This is the text including some UTF-8 bytes \37777777703\37777777644\37777777703\37777777666\37777777703\37777777674\37777777703\37777777637";
// extract UTF-8 Bytes
$octbytes;
preg_match_all("/\\\\37777777(\\d{3})/", $line, $octbytes);
// parse extracted Bytes
for ($m = 0; $m < count($octbytes[1]); ) {
$B = octdec($octbytes[1][$m]);
// UTF-8 may span over 2 to 4 Bytes
$numBytes;
if (($B & 0xF8) == 0xF0) { $numBytes = 4; }
else if (($B & 0xF0) == 0xE0) { $numBytes = 3; }
else if (($B & 0xE0) == 0xC0) { $numBytes = 2; }
else { $numBytes = 1; }
$hxstr = "";
$replaceStr = "";
for ($j = 0; $j < $numBytes; $j++) {
$match = $octbytes[1][$m+$j];
$dec = octdec($match) & 255;
$hx = strtoupper(dechex($dec));
$hxstr = $hxstr . $hx;
$replaceStr = $replaceStr . "\\37777777" . $match;
}
// pack extracted bytes into one hex string
$utfChar = pack("H*", $hxstr); // < this will be interpreted correctly
// replace Bytes in the input with the parsed chars
$parsedData = str_replace($replaceStr,$utfChar,$line);
// go to next byte
$m+=$numBytes;
}
echo "The parsed line: $line";
I want to convert hindi / Devanagari text for example "आए थे पर्यटक, खुद ही बह ग" into Unicode escaped characters like "\u0906\u090f \u0925\u0947 \u092a\u0930\u094d\u092f\u091f\u0915, \u0916\u0941\u0926 \u0939\u0940 \u092c\u0939 \u0917".
I am developing a hindi website and i have seen most of sites are using Escaped Unicode sequence inside their meta tags and schema.org.
So i decided to give it a try.
i can see Hindi AKA Devanagari letters with their Escaped Unicode sequence at http://www.endmemo.com/unicode/devanagari.php
and i have also seen a tool which works the same https://www.mobilefish.com/services/unicode_escape_sequence_converter/unicode_escape_sequence_converter.php
but i cannot find any way to convert these Devanagari letters into Escaped Unicode sequence via php.
I have tried few things but nothing is working and i am not getting much help from google because all articles / forums are talking to decoding unicode escape sequence to unicode but none of them is taking about encoding..
header( 'Content-Type: text/html; charset=utf-8' );
function encode2($str) {
$str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
$t = unpack("N*", $str);
$t = array_map(function($n) { return "&#$n;"; }, $t);
return implode("", $t);
}
$message = "आए थे पर्यटक, खुद ही बह गए";
$message_convert = encode2($message);
echo $message_convert;
echo "fdfdfdfdfdfdfd<br/>";
echo mb_convert_encoding($message, "HTML-ENTITIES", "auto");
I want this "आए थे पर्यटक, खुद ही बह ग" to "\u0906\u090f \u0925\u0947 \u092a\u0930\u094d\u092f\u091f\u0915, \u0916\u0941\u0926 \u0939\u0940 \u092c\u0939 \u0917"
Please help!
as suggest by #paskl i tried:
$message = "आए थे पर्यटक, खुद ही बह गए";
$unicode = json_encode($message)
echo $unicode;
And i got ""\u0906\u090f \u0925\u0947 \u092a\u0930\u094d\u092f\u091f\u0915, \u0916\u0941\u0926 \u0939\u0940 \u092c\u0939 \u0917\u090f""
I hope it will help others who want to convert devanagari/hindi letters into Escaped Unicode sequence with php on their website.
Thanks to #paskl
Unless you're looking to transmit this data as JSON I wouldn't really recommend using json_encode() as it will wrap your output in literal double quotes that you'd need to strip back off. However there's not an easy way to encode unicode escapes in PHP in a way that is memory-efficient.
That said, here is the not-easy code:
// PHP < 7.2
// https://github.com/symfony/polyfill-mbstring/blob/master/Mbstring.php#L708-L730
if( ! function_exists("mb_ord") ) {
function mb_ord($s) {
if (1 === \strlen($s)) {
return \ord($s);
}
$code = ($s = unpack('C*', substr($s, 0, 4))) ? $s[1] : 0;
if (0xF0 <= $code) {
return (($code - 0xF0) << 18) + (($s[2] - 0x80) << 12) + (($s[3] - 0x80) << 6) + $s[4] - 0x80;
}
if (0xE0 <= $code) {
return (($code - 0xE0) << 12) + (($s[2] - 0x80) << 6) + $s[3] - 0x80;
}
if (0xC0 <= $code) {
return (($code - 0xC0) << 6) + $s[2] - 0x80;
}
return $code;
}
}
function ord2seqlen($ord) {
if($ord < 128){
return 1;
} else if($ord < 224) {
return 2;
} else if($ord < 240) {
return 3;
} else if($ord < 248) {
return 4;
} else {
throw new \Exception("No support for 5 or 6 byte sequences.");
}
}
function utf8_seq_iter($input) {
for($i=0,$c=strlen($input); $i<$c; ) {
$bytes = ord2seqlen(ord($input[$i]));
yield substr($input, $i, $bytes);
$i += $bytes;
}
}
function escape_codepoint($codepoint, $skip_low=true) {
$ord = mb_ord($codepoint);
if( $skip_low && $ord < 128 ) {
return $codepoint;
} else {
return sprintf("\\u%04x", $ord);
}
}
$input = "आए थे पर्यटक, खुद ही बह गए";
$output = '';
foreach( utf8_seq_iter($input) as $codepoint ) {
$output .= escape_codepoint($codepoint);
}
var_dump($output);
Output:
string(121) "\u0906\u090f \u0925\u0947 \u092a\u0930\u094d\u092f\u091f\u0915, \u0916\u0941\u0926 \u0939\u0940 \u092c\u0939 \u0917\u090f"
Edit: I've turned this into a small composer package available here:
https://packagist.org/packages/wrossmann/utf8_escape
iD;English [en];Chinese [zh];German [de];Hindi [hi];Hindi (TOGO) [hi_TG];Japanese [ja]
Source[local].AlarmGroup[AlarmText_02].ID[1310:90];Unwinder: Accu position difference too big. Check for laminate break;拆卷器: 蓄存器位置差过大。 检查复合片材是否中断;Laminatspeicher: Zu grosse Positionsänderung - Auf Laminatriss prüfen;290;;巻出装置: アキュムレーター位置の差が大きすぎます。 ラミネートが壊れていないか確認してください
Source[local].AlarmGroup[AlarmText_02].ID[1311:91];Unwinder: Accu level too small for auto splice;拆卷器: 自动拼接的蓄存器级别过小;Abwickler: Akku Füllstand zu klein für Autospleiss;291;;巻出装置: 自動紙継を行うにはアキュムレーターのレベルが小さすぎます
I am trying to fetch csv content as mentioned above :
The csv file is saved as Unicode Text. It has Chinese, German, Japanese Language.
I am unable to fetch foreign language in correct format.
CSV reader Code
header('Content-Type: text/html; charset=utf-8');
$row = 1;
$up_file = 'text_SHOT_S.csv';
setlocale(LC_ALL, 'en_US.UTF-8');
if (($handle = fopen($up_file, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$num = count($data);
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c].'<br>';
}
}
fclose($handle);}
Output of the following Code:
iD戼㹲䔀渀最氀椀猀栀 嬀攀渀崀㰀牢>Chinese [zh]戼㹲䜀攀爀洀愀渀 嬀搀攀崀㰀牢>Hindi [hi]戼㹲䠀椀渀搀椀 ⠀吀伀䜀伀⤀ 嬀栀椀开吀䜀崀㰀牢>Japanese [ja] 戼㹲匀漀甀爀挀攀嬀氀漀挀愀氀崀⸀䄀氀愀爀洀䜀爀漀甀瀀嬀䄀氀愀爀洀吀攀砀琀开 ㈀崀⸀䤀䐀嬀㌀ 㨀㤀 崀㰀牢>Unwinder: Accu position difference too big. Check for laminate break戼㹲였睢桓ᩖ쐀墄桛䵖湏읝➏ə‰쀀൧࡙䝔偲⽧♦ⵔ굎㱥牢>Laminatspeicher: Zu grosse Positionsänderung - Auf Laminatriss prüfen戼㹲㈀㤀 㰀牢>戼㹲ff艹앑溈㩿 ꈀ괰ﰰ뼰ﰰ䴰湏湿䱝✰䵙夰丰縰夰Ȱ‰�촰ﰰ젰䰰쨰豘昰䐰樰䐰䬰먰赸垊昰估怰唰䐰ര㰀牢>Source[local].AlarmGroup[AlarmText_02].ID[1311:91]戼㹲唀渀眀椀渀搀攀爀㨀 䄀挀挀甀 氀攀瘀攀氀 琀漀漀 猀洀愀氀氀 昀漀爀 愀甀琀漀 猀瀀氀椀挀攀㰀牢>拆卷器: 自动拼接的蓄存器级别过小戼㹲䄀戀眀椀挀欀氀攀爀㨀 䄀欀欀甀 䘀ﰀ氀氀猀琀愀渀搀 稀甀 欀氀攀椀渀 昀ﰀ爀 䄀甀琀漀猀瀀氀攀椀猀猀㰀牢>291戼㹲㰀牢>巻出装置: 自動紙継を行うにはアキュムレーターのレベルが小さすぎます 戼㹲㰀牢
I either check garbage character or most of the content converted to Chinese.
Also tried the header('Content-Type: text/html; charset=iso-8859-1') and setlocale(LC_CTYPE, 'zh_CN.UTF-8','zh_ZH.big5');
I want the output same as CSV content.
Thanks in advance .
For reading CSV content I used PHPExcel and converted UTF-16 file into UTF-8 then it will fetch Chinese content properly.
Please refer below link for converting UTF-16 File to an UTF-8.
How to Convert an UTF-16 File to an UTF-8 file using PHP
To convert a file simply call the convert_file_to_utf8() function
and pass to it the file path of the file you wish to convert. The
function then uses the PHP function file_get_contents() to pack the
input file’s contents into a string variable which is then passed to
the main converter function which converts the string from UTF-16 to
UTF-8 encoding if necessary. Finally, it uses file_put_contents() to
stuff the resulting string back into the original file, overwriting
the original file contents.
function utf16_to_utf8($str) {
$c0 = ord($str[0]);
$c1 = ord($str[1]);
if ($c0 == 0xFE && $c1 == 0xFF) {
$be = true;
} else if ($c0 == 0xFF && $c1 == 0xFE) {
$be = false;
} else {
return $str;
}
$str = substr($str, 2);
$len = strlen($str);
$dec = '';
for ($i = 0; $i < $len; $i += 2) {
$c = ($be) ? ord($str[$i]) << 8 | ord($str[$i + 1]) :
ord($str[$i + 1]) << 8 | ord($str[$i]);
if ($c >= 0x0001 && $c <= 0x007F) {
$dec .= chr($c);
} else if ($c > 0x07FF) {
$dec .= chr(0xE0 | (($c >> 12) & 0x0F));
$dec .= chr(0x80 | (($c >> 6) & 0x3F));
$dec .= chr(0x80 | (($c >> 0) & 0x3F));
} else {
$dec .= chr(0xC0 | (($c >> 6) & 0x1F));
$dec .= chr(0x80 | (($c >> 0) & 0x3F));
}
}
return $dec;
}
function convert_file_to_utf8($csvfile) {
$utfcheck = file_get_contents($csvfile);
$utfcheck = utf16_to_utf8($utfcheck);
file_put_contents($csvfile, $utfcheck);
}
Please before read this answer, read the different coment.
Mudassir, you can see the exact charset with tortoise, with comparator of file (see img)
Your soft use not utf-8 but utf-16 encoding. If you cant change this, you can use http://php.net/manual/en/function.mb-convert-encoding.php
http://php.net/manual/fr/mbstring.supported-encodings.php
I've try with your file and this function, and it's work correctly. See the code :
header('Content-Type: text/html; charset=utf-8');
$row = 1;
$up_file = 'text_SHOT_S.csv';
setlocale(LC_ALL, 'en_US.UTF-8');
if (($handle = fopen($up_file, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
$num = count($data);
$row++;
for ($c=0; $c < $num; $c++) {
// echo $data[$c].'<br>';
echo mb_convert_encoding($data[$c],'utf8','utf-16').'<br>';
}
}
fclose($handle);}
I was using emojione to convert emoticons but there is problem.
When someone upload emoticon from mobile then something like, \ud83d\ude0c\ud83d\ude0c\ud83d\ude0c this unicode.
emojione doesn't convert this type of code.
Can anybody help me to convert this code or suggest me to use any other package
I have done # last.
$str = '\ud83d\ude0c\ud83d\ude0c\ud83d\ude0c';
$regex = '/\\\u([dD][89abAB][\da-fA-F]{2})\\\u([dD][c-fC-F][\da-fA-F]{2})
|\\\u([\da-fA-F]{4})/sx';
echo preg_replace_callback($regex, function($matches) {
if (isset($matches[3])) {
$cp = hexdec($matches[3]);
} else {
$lead = hexdec($matches[1]);
$trail = hexdec($matches[2]);
// http://unicode.org/faq/utf_bom.html#utf16-4
$cp = ($lead << 10) + $trail + 0x10000 - (0xD800 << 10) - 0xDC00;
}
// https://tools.ietf.org/html/rfc3629#section-3
// Characters between U+D800 and U+DFFF are not allowed in UTF-8
if ($cp > 0xD7FF && 0xE000 > $cp) {
$cp = 0xFFFD;
}
// https://github.com/php/php-src/blob/php-5.6.4/ext/standard/html.c#L471
// php_utf32_utf8(unsigned char *buf, unsigned k)
if ($cp < 0x80) {
return chr($cp);
} else if ($cp < 0xA0) {
return chr(0xC0 | $cp >> 6) . chr(0x80 | $cp & 0x3F);
}
return html_entity_decode('&#' . $cp . ';');
}, $str);
output will be:
😌😌😌
I want to use the PHP function openssl_verify() to verify the signatures of different X.509 certificates.
I have all it needs (certificate, $data, $signature, $pub_key_id) except of the signature algorithm but which is stored in the certificate.
My simple question is: How can I extract signature algorithm from certificates?
How about this?
$cer = file_get_contents('certificate.cer');
$res = openssl_x509_read($cer);
openssl_x509_export($res, $out, FALSE);
$signature_algorithm = null;
if(preg_match('/^\s+Signature Algorithm:\s*(.*)\s*$/m', $out, $match)) $signature_algorithm = $match[1];
var_dump($signature_algorithm);
It produces the output:
string(21) "sha1WithRSAEncryption"
Which you would have to map to OPENSSL_ALGO_SHA1 yourself.
Look at this question, you can do it similar, try this:
private function GetCertSignatureAlgorithm($certSignatureBinary, $pubKeyResourceId)
{
if(false === openssl_public_decrypt($certSignatureBinary, $sigString, $pubKeyResourceId))
{
return false;
}
if (empty($sigString) ||
strlen($sigString) < 5)
{
return false;
}
if (ord($sigString[0]) !== 0x30 ||
ord($sigString[2]) !== 0x30 ||
ord($sigString[4]) !== 0x06)
{
return false;
}
$sigString = substr($sigString, 4);
$len = ord($sigString[1]);
$bytes = 0;
if ($len & 0x80)
{
$bytes = ($len & 0x7f);
$len = 0;
for ($i = 0; $i < $bytes; $i++)
{
$len = ($len << 8) | ord($sigString[$i + 2]);
}
}
$oidData = substr($sigString, 2 + $bytes, $len);
$hashOid = floor(ord($oidData[0]) / 40) . '.' . ord($oidData[0]) % 40;
$value = 0;
for ($i = 1; $i < strlen($oidData); $i++)
{
$value = $value << 7;
$value = $value | (ord($oidData[$i]) & 0x7f);
if (!(ord($oidData[$i]) & 0x80))
{
$hashOid .= '.' . $value;
$value = 0;
}
}
//www.iana.org/assignments/hash-function-text-names/hash-function-text-names.xml
//www.php.net/manual/en/openssl.signature-algos.php
switch($hashOid)
{
case '1.2.840.113549.2.5': return 'md5';
case '1.3.14.3.2.26': return 'sha1';
case '2.16.840.1.101.3.4.2.1': return 'sha256';
case '2.16.840.1.101.3.4.2.2': return 'sha384';
case '2.16.840.1.101.3.4.2.3': return 'sha512';
//not secure = not accepted
//case '1.2.840.113549.2.2': //'md2';
//case '1.2.840.113549.2.4': //'md4';
//case '1.3.14.3.2.18': //'sha';
}
throw new Exception('CertSignatureAlgorithm not found');
}
One way might be openssl x509 -text -noout < $certfile | grep "Signature Algorithm"
Using phpseclib, a pure PHP X.509 parser...
<?php
include('File/X509.php');
$x509 = new File_X509();
$cert = $x509->loadX509(file_get_contents('sample.pem'));
echo $cert['signatureAlgorithm']['algorithm'];