I've run into an issue converting hexadecimal encoding into a string when the hexidecimal character has 3 characters (rather than the usual 2).
I current use the following function which has been working fine:
function hexToStr($hex){
$string='';
for ($i=0; $i < strlen($hex)-1; $i+=2){
$string .= chr(hexdec($hex[$i].$hex[$i+1]));
}
return $string;
}
I'm aware I can also use:
pack("H*", $hex);
Example characters which don't convert: ă ţ which are represented as 103 163 in hexadecimal. Interestingly websites like: https://codebeautify.org/string-hex-converter don't recognise their own conversion of these characters.
Example hexadecimal:
7061726f6c65692070656e74727520636f6e74756c2064756d6e6561766f61737472103206120666f737420696e69163696174103
Expected output:
parolei pentru contul dumneavoastră a fost iniţiată
Related
I'd like to ask dumb question.
How could I convert #48#49#50 based on # char ?
In PHP, understand that chr() function is used to convert a ASCII value to a character.
48 is 0.
49 is 1.
And 50 is 2.
May I know how to convert #48#49#50 as 012 and how to store 012 in one variable ?
Eg- $num = 012
We can try using preg_replace_callback() here with the regex pattern #\d+. As we capture each #\d+ match, use chr() on that match to generate the ASCII character replacement.
$input = "#48#49#50";
$out = preg_replace_callback(
"/#(\d+)/",
function($m) { return chr($m[1]); },
$input
);
echo $out; // 012
How to convert html entities to hex?
I used this code
$username = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
$char = current($m);
$utf = iconv('UTF-8', 'UCS-4', $char);
return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $username);
But it dosent convert chars like < and others.
If you look into the regex used [\x{80}-\x{10FFFF}], you'll see that it would match all chars whose ASCII value(in hex) lies between 0x80 and 0x10FFFF
But if you take a look at the ASCII chart you see
The hex values of < and > are lower than 0x80. Assuming you have gotten the regex from API publishers they probably want you to convert extended ASCII chars such as these so it won't cause any problem whatsoever. But you can just edit the regex and get it to work for other characters as well
I'm currently using the substr() function which works fine for characters written in english. But when I apply that to characters written in greek, the text is cut with a strange character (a questionmark inside a diamond shape) appearing before the 3 fullstops (...).
Below is the code, thanks:
$string //a varchar string written in greek and called from the database
if (strlen($string) > 200) {
echo substr($string, 0, 200).'...';
}
Use multibyte functions like so:
mb_internal_encoding( "UTF-8" );
if( mb_strlen( $string ) > 200 ) {
echo mb_substr( $string, 0, 200 ) . "...";
}
The normal functions work on bytes and don't have any character awareness like you are expecting from them. Text using common english characters in UTF-8 are all 1 byte per character, so the normal functions accidentally work for them.
I'm writing a basic function in PHP which takes an input string, converts a list of "weird" characters to URL-friendly ones. Writing the function is not the issue, but rather how it inteprets strings with weird charaters.
For example, right now I have this problem:
$string = "år";
echo $string[0]; // Output: �
echo $string[1]; // Output: �
echo $string[0] . $string[1]; // Output: å
echo $string[2]; // Output: r
So basically it interprets the letter "å" as two characters, which causes problem for me. Because I want to be able to look at each character of the string individually and replace it if needed.
I encode everything in UTF8 and I know my issue has to do something with UTF8 treating weird characters as two chars, as we've seen above.
But how do I work around this? Basically I want to achieve this:
$string = "år";
echo $string[0]; // Output: å
echo $string[1]; // Output: r
$string = "år";
mb_internal_encoding('UTF-8');
echo mb_substr($string, 0, 1); // å
echo mb_substr($string, 1, 1); // r
Since UTF encoding is not always 1 byte per-letter, but stretches as you need more space your non-ASCII letters actually take more than one byte of memory. And array-like access to a string variable returns that byte, not a letter. So to actually get it, you should use methods for that
echo mb_substr($string, 0,1);// Output: å
echo mb_substr($string, 1,1);// Output: r
strlen($username);
Username can carry ASCII, Unicode or both.
Example:
Jam123 (ASCII) - 6 characters
ابت (Unicode) - 3 characters but strlen returns 6 bytes as unicode is 2 bytes per char.
Jamت (Unicode and ASCII) - 5 characters (3 ASCII and 2 Unicode even though i have only one unicode character)
Username in all cases shouldn't go beyond 25 characters and shouldn't be less than 4 chars.
My main problem is when mixing Unicode and ASCII together, how can i keep track of count so the condition statement can deicde whether username is not over 25 and not less than 4.
if(strlen($username) <= 25 && !(strlen($username) < 4))
3 characters in unicode will be counted as 6 bytes which causes trouble because it allows user to have a username of 3 unicode characters when the characters should be minimum of 4.
Numbers will always be in ASCII
Use mb_strlen(). It takes care of unicode characters.
Example:
mb_strlen("Jamت", "UTF-8"); // 4
You can use mb_strlen where you select your encoding.
http://sandbox.phpcode.eu/g/3a144/1
<?php
echo mb_strlen('ابت', 'UTF8'); // returns 3
function to count words in UNICODE sentence/string:
function mb_count_words($string)
{
preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches); return count($matches[0]);
}
or
function mb_count_words($string, $format = 0, $charlist = '[]') {
$string=trim($string);
if(empty($string))
$words = array();
else
$words = preg_split('~[^\p{L}\p{N}\']+~u',$string);
switch ($format) {
case 0:
return count($words);
break;
case 1:
case 2:
return $words;
break;
default:
return $words;
break;
}
}
then do:
echo mb_count_words("chào buổi sáng");