I have $_SERVER['REDIRECT_SSL_CLIENT_S_DN'] content that has somekind of hex data. How can i decode it?
$_SERVER['REDIRECT_SSL_CLIENT_S_DN'] = '../CN=\x00M\x00\xC4\x00,\x00I\x00S\x00,\x004\x000\x003\x001\x002\x000\x000\x002/SN=..';
$pattern = '/CN=(.*)\\/SN=/';
preg_match($pattern, $_SERVER['REDIRECT_SSL_CLIENT_S_DN'], $server_matches);
print_r($server_matches[1]);
The result is:
\x00M\x00\xC4\x00,\x00I\x00S\x00,\x004\x000\x003\x001\x002\x000\x000\x002
The result i need is:
MÄ,IS,40312002
I tried to decode it with chr(hexdec($value)); and it almost works, but in html input i see lot of question marks.
EDIT:
Additional test with results. Not yet perfect. Array reveals some errors: http://pastebin.com/BC4xxqmE
After using utf8_encode, you now have a multibyte string. This means you need to use PHP's multibyte (mb_) functions.
So, str_split won't work anymore. You need to use either mb_split or preg_split with the u flag.
$splitted = preg_split('//u', $string);
Here's a demo showing that your code is now working: http://ideone.com/nqeC0U
Have you tried unicode equivalent of chr()? chr mod 256 all the input that's why you see all those question marks.
The code below is from one of the post in chr php manual
function unichr($u) {
return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}
Update
//New function
function unichr($intval) {
return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}
I test with xC4=196 it gives me an Ä
http://codepad.viper-7.com/3htuwW
Your input is in UTF-8 using that conversion is similar to utf8_decode which will convert to ISO-8859-1. UTF-8 though supports more characters than ISO-8859-1. This is why xC4 shows up as a question mark for you.
Try using something more powerful like iconv.
Related
I've got a string, that is UTF-8 encoding according to mb_detect_encoding(). I want to trim like this:
$string = trim($string);
But it has no effect.
When I look at the string with urlencode($string) it displays:
"++++++++++++++++String+more+text++++++++++++"
According to: https://markushedlund.com/dev/trim-unicodeutf-8-whitespace-in-php/ I tried this code, but no effect:
preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $string);
How do i trim this?
How can I find what the space character stands for and then replace it. All I know is urlencode, but this just tells me it's a space by showing +++.
Update:
Thanks to #Stefanov.sm in the comments below, I learned that you can output the string to hex with: bin2hex($string); Then I see a whole lot of 20202020 and I see 20 stands for space in UTF-8 encoding.
Strange though the trim won't work, but what does is:
$string = str_replace("\x20","",$string);
Maybe I can figure this out why. But at least the objective to get rid of them is completed.
the "+" signs remains for white-space.
What you should try to do is to use mb_detect_encoding function to be sure of the encoding. https://www.php.net/manual/fr/function.mb-detect-encoding.php
<?php
mb_detect_encoding($str, 'UTF-8', true); // Will tell you TRUE or FALSE
?>
Try explicitly naming "+" for removal:
%string = trim($string, "+ ");
Note the space after "+", which means "remove both spaces and plus-signs".
Encoding has probably nothing to do with his, unless those pluses are a misrepresentation of some other character.
You could try this multibyte trim function:
function mb_trim($str) {
return preg_replace("/^\s+|\s+$/u", "", $str);
}
No guarantee it will solve the problem, but it can't hurt.
I found it here: Multibyte trim in PHP?
I'm trying to decode some special characters in php and can't seem to find a way to do it.
$str = 'This i"s an example';
This just returns some dots.
$str = preg_replace_callback("/(&#[0-9]+;)/", function($m) {
return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
}, $str);
Some other tests just return the same string.
$str = html_entity_decode($str, ENT_QUOTES, 'UTF-8');
$str = htmlspecialchars_decode($str, ENT_QUOTES);
Anyway, I've been trying all sorts of combinations but really no idea how to convert this to UTF-8 characters.
What I'm expecting to see is this:
Thi’s i"s a’n e”xa“mple
And actually if I take this directly and use htmlentities to encode it I see different characters to begin with.
Thi’s i"s a’n e”xa“mple
Unfortunately I don't have control of the source and I'm stuck dealing with those characters.
Are they non standard, do I need to replace them manually with my own lookup table?
EDIT
Looking at this table here: https://brajeshwar.github.io/entities/
I see the characters I'm looking after are not listed. When I test a few characters from this table they decode just fine. I guess the list in php is incomplete by default?
If you check the unicode standard for the characters you're referring to: http://www.unicode.org/charts/PDF/U0080.pdf
You would see that all the codepoints you have in your string do not have representable glyphs and are control characters.
Which means that it is expected that they are rendered as empty squares (or dots, depending on how your renderer treats those).
If it works for someone somewhere - it's a non-standard behaviour, which one must not rely on, since it is, well, non-standard.
Apparently the text you have has the initial encoding of cp1250, so you either should treat it accordingly, or re-encode entities manually:
$str = 'This i"s an example';
$str = preg_replace_callback("/&#([0-9]+);/u", function($m) {
return iconv('cp1250', 'utf-8', chr($m[1]));
}, $str);
echo $str;
Hi I'm having a problem converting special characters to upper case.
With regular strtoupper I get something like DANIëL and when applying mb_strtoupper I get DANI?L.
Here's the code:
mb_strtoupper(rtrim($pieces[1], ","), 'UTF-8')
Mind you, I already have this running on the input:
iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $tr->TD[0])
Could this be the reason? Or is there something else?
Typical issue of trying to uppercasing a Latin1 when the converter expect UTF-8
Be sure to check your string source. This sample will works if your text editor works in Latin1 pagecode, and not in UTF-8
$str = "daniël"; //or your rtrim($pieces[1],",")
$str = mb_convert_encoding($str,'UTF-8','Latin1');
echo mb_strtoupper($str, 'UTF-8');
//will echo DANIËL
I have a string such as this - Panamá. I need to convert this string to Panam\xE1 so it's readable in a JavaScript file I'm generating using PHP.
Is there a function to encode this in PHP? Any ideas would be appreciated.
My rule is,
If you try to encode or escape data using preg_replace or
using massive mapping arrays or str_replace, STOP you are probably doing it wrong.
All it takes is one missed or eroneous mapping (and you WILL miss some mappings) then you end up with code that doesn't work in all cases and code which corrupts your data in some cases. Whole libraries have been written already dedicated to doing the translations for you (e.g. iconv) and for escaping data, you should use the proper PHP function.
If you plan on outputting the data to a browser (the fact you want to encode for javascript suggests this) then I suggest using UTF8 encoding. If your data is in latin-1, use the utf8_encode function.
Whether your PHP string contains ASCII characters or not, to send any data from PHP to JS you should ALWAYS use the json_encode function.
PHP code
$your_encoding = 'latin1';
$panama = "Panamá";
//Get your data in utf8 if it isnt already
$panama = iconv($your_encoding, "utf-8", $panama);
$panama_encoded = json_encode($panama);
echo "var js_panama = " . $panama_encoded . ";";
JS Output
var js_panama = "Panam\u00e1";
Even though JSON supports unicode, it may not be compatible with your non UTF-8 javascript file. This is not a problem because the json_encode PHP function will escape unicode characters by default.
Assuming that your input is in the latin-1 encoding then ord and dechex will do what you want:
$result = preg_replace_callback(
'/[\x80-\xff]/',
function($match) {
return '\x'.dechex(ord($match[0]));
},
$input);
If your input is in any other encoding then you would need to know what encoding that is and adapt the solution accordingly. Note that in this case it would not be possible to use specifically the \x## notation in the JS output in all cases.
This should work for you:
$str = "Panamá";
$str = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$utf = iconv('UTF-8', 'UCS-4', current($m));
return sprintf("\x%s", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $str);
echo $str;
Output (Source Code):
Panam\xE1
Alright & I've a query. Is there any way to display Unicode symbol from it's unique number. For eg. I've the Integral symbol (∫) & It's Unicode number & HTML code respectively are 'U+222B' and '& #8747;' I can display the symbol by printing the HTML code like below.
echo "& #8747;"; //Displays Integral [∫] symbol if we remove space after Ampersand.
But with Unicode number, Can we achieve the same? because in one of my website characters are not encoding properly. It just displays Unicode numbers like below.
%u03A8 %u0D24 etc.
Please share your thoughts. Thanks in advance.
%u03A8 %u0D24 etc.
This looks like the output of JavaScript's window.escape() function. Change your JavaScript code to call window.encodeURIComponent() instead, and decode its output on the PHP side using urldecode() if necessary.
If corrupted strings are already stored in your database, you could try to clean them up using code similar to this:
$s = preg_replace_callback('/(?:%u[0-9A-F]{4})+/', function ($m) {
return mb_convert_encoding(
hex2bin(str_replace('%u', '', $m[0])), 'UTF-8', 'UTF-16BE');
}, $s );
Not sure if this will work with your customers, but worth a try:
echo mb_convert_encoding('&#' . intval(0x0D24) . ';', 'UTF-8', 'HTML-ENTITIES');