PHP function chr and ord with special chars - php

In PHP when I use the ord function in order to catch the ASCII code of my character I get this behavior:
ord("a") // return 97
chr(97) // return a
But when I use a special character like Œ the returns are different:
ord("Œ") // return 197
chr(197) // return �
All of my pages are encoded in utf8. This behaviour is the same for most of the special characters.
Has somebody seen this problem in the past? How can I fix it?

ord() and chr() both use the ASCII values of characters, which is a single byte encoding. Œ is not a valid character in ASCII.
You can get each byte of a multi-byte character by specifying the byte offset, as follows:
$oethel = "Œ";
$firstByte = ord($oethel[0]); // 197
$secondByte = ord($oethel[1]); // 146
Reversing the process, however, does not work, because assigning to a string byte offset converts that string to an array:
$newOethel = "";
$newOethel[0] = chr(197);
$newOethel[1] = chr(146);
echo $newOethel;
// Output is as follows:
// PHP Notice: Array to string conversion
// Array

The black diamond with a question mark is a display problem.
Review the details of black diamond in https://stackoverflow.com/a/38363567/1766831 . There are two cases; see which one fits.

Related

how to change ascii alphabet to utf-8 in php

I have an ASCII string. I like to change its encoding to utf-8.
But I found there's a simple function to change ascii to utf-8 in php.
and vice verse, I like to change utf-8 alphabet to ascii.
Please advise.
I have tried:
<?php
// utf-8
$str = "CHONKIOK";
// I can't even how to print these utf-8 characters in php. I just copied/pasted the string.
// strlen($str) => 24 bytes
// mb_detect_encoding($str) => utf-8
$str2 = "CHONKIOK";
// strlen($str2) => 8 bytes
// mb_detect_encoding($str2) => ascii
// change ascii to utf-8
$str = mb_convert_encoding($str2, "UTF-8");
echo mb_detect_encoding($str);
// returns ascii
What you are doing is correct.
As per mb_detect_encoding it states that it detects the most likely character encoding.
As the entire ASCII set is contained within UTF-8 at the exact same character positions, this function is telling you that it's an ASCII string because it technically is. The bytes of this string when encoded in both ASCII and UFT-8 are identical.
As you've found, when you include some characters outside of the ASCII set then it will give you the next probable encoding.
What exactly should I do to obtain this string: "CHONKIOK" from "CHONKIOK"?
The characters you're after are called "Fullwidth Latin" characters.
Given the C character provided is character 65,315 and a regular C is character 67, you could possible obtain the strings you're after by adding the difference of 65,248. This is only possible because the alphabet tends to repeat in the same order throughout different parts of the character charts.
You can get the code point of a character using mb_ord and convert it back to a character using mb_chr, after adding 65,248.
That might look something like:
$str_input = "ABC abc 123";
$convertable = "ABCDEFG12349abcdefg";
$str_output = "";
for ($i = 0; $i < strlen($str_input); $i++) {
$char = mb_ord($str_input[$i], "UTF-8");
if(str_contains($convertable, $str_input[$i])) $char += 65248;
$str_output .= mb_chr($char, "UTF-8");
}
echo $str_output; // outputs "ABC abc 123"
Just be sure to include the whole alphabet in $convertable
try this to convert to utf-8:
utf8_encode(string $string): string
try this to convert to ASCII:
utf8_decode(string $string): string

PHP - Convert an escaped character into hex

Having trouble finding a way to convert an escaped character within a string to its hexadecimal value.
For example:
$tab = "\t";
$hexTab = escapedToHex($tab);
echo $hexTab; // prints "09"
User ord to get the ascii value and then dechex to convert to hex
dechex(ord($tab));

Using PHP to Convert Decimal to ASCII using SPACE delimiters

Using PHP, how do you convert a string of decimal numbers with spaces in between into a string without spaces? (unless of course it is a DEC space (32) converted)
Example: 84 104 97 110 107 32 121 111 117
I have checked out the related questions and most of them are just asking what the built in function is for converting decimal to ascii. I know chr() and ord() and I think the solution really should use explode() and implode() along with a string replace. I am just horrible at for and foreach loops so the logic breaks my mind. :)
The closest SO topic I found is this one which is basically the opposite of what I am asking for -
Using PHP to Convert ASCII Character to Decimal Equivalent
This would be a situation where the strtok function actually could be used for something.
The strtok function tokenizes a string based on a character. In this case the token delimiter is a space. Each time you call strtok it returns the next token in the string.
The chr function is used to convert the ordinal (decimal) number to its ASCII character equivalent.
function myParseString($str) {
$output = ''; // What we will return
$token = strtok($str, ' '); // Initialize the tokenizer
// Loop until there are no more tokens left
while ($token !== false) {
$output .= chr($token); // Add the token to the output
$token = strtok(' '); // Advance the tokenizer, getting the next token
}
// All the tokens have been consumed, return the result!
return $output;
}
$str = '84 104 97 110 107 32 121 111 117';
echo myParseString($str);
(And you are welcome.)

how to use similar text php code in arabic

Trying to use php similar_text() with arabic, but it's not working.
However it works great with english.
<?php
$var = similar_text("ياسر","عمار","$per");
echo $var;
?>
outbot : 5
that's wrong result, it should be 2. Is there similar_text() with arabic letters?
Here's one I'm using
//from http://www.phperz.com/article/14/1029/31806.html
function mb_split_str($str) {
preg_match_all("/./u", $str, $arr);
return $arr[0];
}
//based on http://www.phperz.com/article/14/1029/31806.html, added percent
function mb_similar_text($str1, $str2, &$percent) {
$arr_1 = array_unique(mb_split_str($str1));
$arr_2 = array_unique(mb_split_str($str2));
$similarity = count($arr_2) - count(array_diff($arr_2, $arr_1));
$percent = ($similarity * 200) / (strlen($str1) + strlen($str2) );
return $similarity;
}
So
$var = mb_similar_text('عمار', 'ياسر', $per);
output: $var = 2, $per = 25
Because the Arabic text are multibyte strings normal PHP functions cannot be used (such as 'similar_text()').
echo(strlen("عمار"));
The above code outputs: 8
echo(mb_strlen("عمار", "UTF-8"));
Using the mb_strlen function with the UTF-8 encoding specified, the output is: 4 (the correct number of characters).
You can use the mb_ functions to make your own version of the similar_text function: http://php.net/manual/en/ref.mbstring.php
Just for the record and hopefully to make some help, I want to clarify the behavior of the similar_text() function when some multi-byte character strings are given (including the character strings of the Arabic.)
The function simply treats each byte of the input string as an individual character (which implies it neither supports multi-byte characters nor the Unicode.)
The byte streams of the عمار and ياسر strings are respectively represented as the following (the bytes (in the hexadecimal representation) are separated using . and, where the end of a character is reached, then a : is used instead):
06.39:06.45:06.27:06.31 <-- Byte stream for عمار
|| || || || ||
06.4A:06.27:06.33:06.31 <-- Byte stream for ياسر
As you can tell, there are five matching, and that's the reason why the function returns 5 in this case (every two hexadecimal digits represent a byte.)

How do I get the number of characters in PHP?

mb_strlen only gives number of bytes, and it is not what I wanted.
It should work with multibyte characters.
mb_strlen($text, "UTF-8");
You may make use of mb_strlen.
mb_strlen() with mb_internal_encoding('UTF-8').
strlen(): Returns the number of bytes rather than the number of characters in a string.
$name = "Perú"; // With accent mark
echo strlen($name); // Display 5, because "ú" require 2 bytes.
$name = "Peru"; // Without accent mark
echo strlen($name); // Display 4
mb_strlen(): Returns the number of characters in a string having character encoding. A multi-byte character is counted as 1.
$name = "Perú"; // With accent mark
echo mb_strlen($name); // Display 4, because "ú" is counted as 1.
$name = "Peru"; // Without accent mark
echo mb_strlen($name); // Display 4
iconv_strlen(): Returns the character count of a string, as an integer.
$name = "Perú"; // With accent mark
echo iconv_strlen($name); // Display 4.
$name = "Peru"; // Without accent mark
echo iconv_strlen($name); // Display 4
mb_strlen the string being measured for length.
<?php
$str = 'abcdef';
echo strlen($str); // 6
$str = ' ab cd ';
echo strlen($str); // 7
?>
Directly from the documentation.
If you are using UTF-8 encoding, step through all bytes in the string and count the characters which have the eighth bit not set.
This solution does not need the mb extension.
I am not sure about mb_strlen, but I use just plain old strlen myself...

Categories