PHP: Split multibyte string (word) into separate characters

PHP: Split multibyte string (word) into separate characters - php

Trying to split this string "主楼怎么走" into separate characters (I need an array) using mb_split with no luck... Any suggestions?
Thank you!

try a regular expression with 'u' option, for example
$chars = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);

An ugly way to do it is:
mb_internal_encoding("UTF-8"); // this IS A MUST!! PHP has trouble with multibyte
// when no internal encoding is set!
$string = ".....";
$chars = array();
for ($i = 0; $i < mb_strlen($string); $i++ ) {
$chars[] = mb_substr($string, $i, 1); // only one char to go to the array
}
You should also try your way with mb_split with setting the internal_encoding before it.

You can use grapheme functions (PHP 5.3 or intl 1.0) and IntlBreakIterator (PHP 5.5 or intl 3.0). The following code shows the diffrence among intl and mbstring and PCRE functions.
// http://www.php.net/manual/function.grapheme-strlen.php
$string = "a\xCC\x8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6)
$expected = ["a\xCC\x8A", "o\xCC\x88"];
$expected2 = ["a", "\xCC\x8A", "o", "\xCC\x88"];
var_dump(
$expected === str_to_array($string),
$expected === str_to_array2($string),
$expected2 === str_to_array3($string),
$expected2 === str_to_array4($string),
$expected2 === str_to_array5($string)
);
function str_to_array($string)
{
$length = grapheme_strlen($string);
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = grapheme_substr($string, $i, 1);
}
return $ret;
}
function str_to_array2($string)
{
$it = IntlBreakIterator::createCharacterInstance('en_US');
$it->setText($string);
$ret = [];
$prev = 0;
foreach ($it as $pos) {
$char = substr($string, $prev, $pos - $prev);
if ('' !== $char) {
$ret[] = $char;
}
$prev = $pos;
}
return $ret;
}
function str_to_array3($string)
{
$it = IntlBreakIterator::createCodePointInstance();
$it->setText($string);
$ret = [];
$prev = 0;
foreach ($it as $pos) {
$char = substr($string, $prev, $pos - $prev);
if ('' !== $char) {
$ret[] = $char;
}
$prev = $pos;
}
return $ret;
}
function str_to_array4($string)
{
$length = mb_strlen($string, "UTF-8");
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = mb_substr($string, $i, 1, "UTF-8");
}
return $ret;
}
function str_to_array5($string) {
return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
}
When working on production environment, you need to replace invalid byte sequence with the substitute character since almost all grapheme and mbstring functions can't handle invalid byte sequence. If you have an interest, see my past answer: https://stackoverflow.com/a/13695364/531320
If you don't take of perfomance, htmlspecialchars and htmlspecialchars_decode can be used. The merit of this way is supporting various encoding other than UTF-8.
function str_to_array6($string, $encoding = 'UTF-8')
{
$ret = [];
str_replace_callback($string, function($char, $index) use (&$ret) { $ret[] = $char; return ''; }, $encoding);
return $ret;
}
function str_replace_callback($string, $callable, $encoding = 'UTF-8')
{
$str_size = strlen($string);
$string = str_scrub($string, $encoding);
$ret = '';
$char = '';
$index = 0;
for ($pos = 0; $pos < $str_size; ++$pos) {
$char .= $string[$pos];
if (str_check_encoding($char, $encoding)) {
$ret .= $callable($char, $index);
$char = '';
++$index;
}
}
return $ret;
}
function str_check_encoding($string, $encoding = 'UTF-8')
{
$string = (string) $string;
return $string === htmlspecialchars_decode(htmlspecialchars($string, ENT_QUOTES, $encoding));
}
function str_scrub($string, $encoding = 'UTF-8')
{
return htmlspecialchars_decode(htmlspecialchars($string, ENT_SUBSTITUTE, $encoding));
}
If you want to learn the specification of UTF-8, the byte manipulation is the good way to practice.
function str_to_array6($string)
{
// REPLACEMENT CHARACTER (U+FFFD)
$substitute = "\xEF\xBF\xBD";
$size = strlen($string);
$ret = [];
for ($i = 0; $i < $size; $i += 1) {
if ($string[$i] <= "\x7F") {
$ret[] = $string[$i];
} elseif ("\xC2" <= $string[$i] && $string[$i] <= "\xDF") {
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < "\x80" || "\xBF" < $string[$i+1]) {
$ret[] = $substitute;
} else {
$ret[] = substr($string, $i, 2);
$i += 1;
}
} elseif ("\xE0" <= $string[$i] && $string[$i] <= "\xEF") {
$left = "\xE0" === $string[$i] ? "\xA0" : "\x80";
$right = "\xED" === $string[$i] ? "\x9F" : "\xBF";
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < $left || $right < $string[$i+1]) {
$ret[] = $substitute;
} else {
if (!isset($string[$i+2])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+2] < "\x80" || "\xBF" < $string[$i+2]) {
$ret[] = $substitute;
$i += 1;
} else {
$ret[] = substr($string, $i, 3);
$i += 2;
}
}
} elseif ("\xF0" <= $string[$i] && $string[$i] <= "\xF4") {
$left = "\xF0" === $string[$i] ? "\x90" : "\x80";
$right = "\xF4" === $string[$i] ? "\x8F" : "\xBF";
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < $left || $right < $string[$i+1]) {
$ret[] = $substitute;
} else {
if (!isset($string[$i+2])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+2] < "\x80" || "\xBF" < $string[$i+2]) {
$ret[] = $substitute;
$i += 1;
} else {
if (!isset($string[$i+3])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+3] < "\x80" || "\xBF" < $string[$i+3]) {
$ret[] = $substitute;
$i += 2;
} else {
$ret[] = substr($string, $i, 4);
$i += 3;
}
}
}
} else {
$ret[] = $substitute;
}
}
return $ret;
}
The result of benchmark between these functions is here.
grapheme
0.12967610359192
IntlBreakIterator::createCharacterInstance
0.17032408714294
IntlBreakIterator::createCodePointInstance
0.079245090484619
mbstring
0.081080913543701
preg_split
0.043133974075317
htmlspecialchars
0.25599694252014
byte maniplulation
0.13132810592651
The benchmark code is here.
$string = '主楼怎么走';
foreach (timer([
'grapheme' => 'str_to_array',
'IntlBreakIterator::createCharacterInstance' => 'str_to_array2',
'IntlBreakIterator::createCodePointInstance' => 'str_to_array3',
'mbstring' => 'str_to_array4',
'preg_split' => 'str_to_array5',
'htmlspecialchars' => 'str_to_array6',
'byte maniplulation' => 'str_to_array7'
],
[$string]) as $desc => $time) {
echo $desc, PHP_EOL,
$time, PHP_EOL;
}
function timer(array $callables, array $arguments, $repeat = 10000) {
$ret = [];
$save = $repeat;
foreach ($callables as $key => $callable) {
$start = microtime(true);
do {
array_map($callable, $arguments);
} while($repeat -= 1);
$stop = microtime(true);
$ret[$key] = $stop - $start;
$repeat = $save;
}
return $ret;
}

The documentation of str_split() says
Note:
str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string.
PHP 7.4.0 adds a new function mb_str_split(). Use this instead.

Assuming you have set the desired encoding and regular expression encoding for the MB functions (such as to UTF-8), you could use my method from my String class library.
/**
* Splits a string into pieces (on whitespace by default).
^
* #param string $pattern
* #param string $target
* #param int $limit
* #return array
*/
public function split(string $target, string $pattern = '\s+', int $limit = -1): array
{
return mb_split($pattern, $target, $limit);
}
By wrapping the mb_split() function in a method, I make it much easier to use. Simply invoke it with the desired value for the variable $pattern.
Remember, set the character encoding appropriately for your task.
mb_internal_encoding('UTF-8'); // For example.
mb_regex_encoding('UTF-8'); // For example.
In the case of my wrapper method, supply the empty string to the method like so.
$string = new String('UTF-8', 'UTF-8'); // Sets the internal and regex encodings.
$string->split($yourString, "")
In the direct PHP case ...
$characters = mb_split("", $string);

Related

How to return a sqrt result in a PHP public function [duplicate]

Trying to split this string "主楼怎么走" into separate characters (I need an array) using mb_split with no luck... Any suggestions?
Thank you!

try a regular expression with 'u' option, for example
$chars = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);

An ugly way to do it is:
mb_internal_encoding("UTF-8"); // this IS A MUST!! PHP has trouble with multibyte
// when no internal encoding is set!
$string = ".....";
$chars = array();
for ($i = 0; $i < mb_strlen($string); $i++ ) {
$chars[] = mb_substr($string, $i, 1); // only one char to go to the array
}
You should also try your way with mb_split with setting the internal_encoding before it.

You can use grapheme functions (PHP 5.3 or intl 1.0) and IntlBreakIterator (PHP 5.5 or intl 3.0). The following code shows the diffrence among intl and mbstring and PCRE functions.
// http://www.php.net/manual/function.grapheme-strlen.php
$string = "a\xCC\x8A" // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6)
$expected = ["a\xCC\x8A", "o\xCC\x88"];
$expected2 = ["a", "\xCC\x8A", "o", "\xCC\x88"];
var_dump(
$expected === str_to_array($string),
$expected === str_to_array2($string),
$expected2 === str_to_array3($string),
$expected2 === str_to_array4($string),
$expected2 === str_to_array5($string)
);
function str_to_array($string)
{
$length = grapheme_strlen($string);
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = grapheme_substr($string, $i, 1);
}
return $ret;
}
function str_to_array2($string)
{
$it = IntlBreakIterator::createCharacterInstance('en_US');
$it->setText($string);
$ret = [];
$prev = 0;
foreach ($it as $pos) {
$char = substr($string, $prev, $pos - $prev);
if ('' !== $char) {
$ret[] = $char;
}
$prev = $pos;
}
return $ret;
}
function str_to_array3($string)
{
$it = IntlBreakIterator::createCodePointInstance();
$it->setText($string);
$ret = [];
$prev = 0;
foreach ($it as $pos) {
$char = substr($string, $prev, $pos - $prev);
if ('' !== $char) {
$ret[] = $char;
}
$prev = $pos;
}
return $ret;
}
function str_to_array4($string)
{
$length = mb_strlen($string, "UTF-8");
$ret = [];
for ($i = 0; $i < $length; $i += 1) {
$ret[] = mb_substr($string, $i, 1, "UTF-8");
}
return $ret;
}
function str_to_array5($string) {
return preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
}
When working on production environment, you need to replace invalid byte sequence with the substitute character since almost all grapheme and mbstring functions can't handle invalid byte sequence. If you have an interest, see my past answer: https://stackoverflow.com/a/13695364/531320
If you don't take of perfomance, htmlspecialchars and htmlspecialchars_decode can be used. The merit of this way is supporting various encoding other than UTF-8.
function str_to_array6($string, $encoding = 'UTF-8')
{
$ret = [];
str_replace_callback($string, function($char, $index) use (&$ret) { $ret[] = $char; return ''; }, $encoding);
return $ret;
}
function str_replace_callback($string, $callable, $encoding = 'UTF-8')
{
$str_size = strlen($string);
$string = str_scrub($string, $encoding);
$ret = '';
$char = '';
$index = 0;
for ($pos = 0; $pos < $str_size; ++$pos) {
$char .= $string[$pos];
if (str_check_encoding($char, $encoding)) {
$ret .= $callable($char, $index);
$char = '';
++$index;
}
}
return $ret;
}
function str_check_encoding($string, $encoding = 'UTF-8')
{
$string = (string) $string;
return $string === htmlspecialchars_decode(htmlspecialchars($string, ENT_QUOTES, $encoding));
}
function str_scrub($string, $encoding = 'UTF-8')
{
return htmlspecialchars_decode(htmlspecialchars($string, ENT_SUBSTITUTE, $encoding));
}
If you want to learn the specification of UTF-8, the byte manipulation is the good way to practice.
function str_to_array6($string)
{
// REPLACEMENT CHARACTER (U+FFFD)
$substitute = "\xEF\xBF\xBD";
$size = strlen($string);
$ret = [];
for ($i = 0; $i < $size; $i += 1) {
if ($string[$i] <= "\x7F") {
$ret[] = $string[$i];
} elseif ("\xC2" <= $string[$i] && $string[$i] <= "\xDF") {
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < "\x80" || "\xBF" < $string[$i+1]) {
$ret[] = $substitute;
} else {
$ret[] = substr($string, $i, 2);
$i += 1;
}
} elseif ("\xE0" <= $string[$i] && $string[$i] <= "\xEF") {
$left = "\xE0" === $string[$i] ? "\xA0" : "\x80";
$right = "\xED" === $string[$i] ? "\x9F" : "\xBF";
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < $left || $right < $string[$i+1]) {
$ret[] = $substitute;
} else {
if (!isset($string[$i+2])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+2] < "\x80" || "\xBF" < $string[$i+2]) {
$ret[] = $substitute;
$i += 1;
} else {
$ret[] = substr($string, $i, 3);
$i += 2;
}
}
} elseif ("\xF0" <= $string[$i] && $string[$i] <= "\xF4") {
$left = "\xF0" === $string[$i] ? "\x90" : "\x80";
$right = "\xF4" === $string[$i] ? "\x8F" : "\xBF";
if (!isset($string[$i+1])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+1] < $left || $right < $string[$i+1]) {
$ret[] = $substitute;
} else {
if (!isset($string[$i+2])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+2] < "\x80" || "\xBF" < $string[$i+2]) {
$ret[] = $substitute;
$i += 1;
} else {
if (!isset($string[$i+3])) {
$ret[] = $substitute;
return $ret;
} elseif ($string[$i+3] < "\x80" || "\xBF" < $string[$i+3]) {
$ret[] = $substitute;
$i += 2;
} else {
$ret[] = substr($string, $i, 4);
$i += 3;
}
}
}
} else {
$ret[] = $substitute;
}
}
return $ret;
}
The result of benchmark between these functions is here.
grapheme
0.12967610359192
IntlBreakIterator::createCharacterInstance
0.17032408714294
IntlBreakIterator::createCodePointInstance
0.079245090484619
mbstring
0.081080913543701
preg_split
0.043133974075317
htmlspecialchars
0.25599694252014
byte maniplulation
0.13132810592651
The benchmark code is here.
$string = '主楼怎么走';
foreach (timer([
'grapheme' => 'str_to_array',
'IntlBreakIterator::createCharacterInstance' => 'str_to_array2',
'IntlBreakIterator::createCodePointInstance' => 'str_to_array3',
'mbstring' => 'str_to_array4',
'preg_split' => 'str_to_array5',
'htmlspecialchars' => 'str_to_array6',
'byte maniplulation' => 'str_to_array7'
],
[$string]) as $desc => $time) {
echo $desc, PHP_EOL,
$time, PHP_EOL;
}
function timer(array $callables, array $arguments, $repeat = 10000) {
$ret = [];
$save = $repeat;
foreach ($callables as $key => $callable) {
$start = microtime(true);
do {
array_map($callable, $arguments);
} while($repeat -= 1);
$stop = microtime(true);
$ret[$key] = $stop - $start;
$repeat = $save;
}
return $ret;
}

The documentation of str_split() says
Note:
str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string.
PHP 7.4.0 adds a new function mb_str_split(). Use this instead.

Assuming you have set the desired encoding and regular expression encoding for the MB functions (such as to UTF-8), you could use my method from my String class library.
/**
* Splits a string into pieces (on whitespace by default).
^
* #param string $pattern
* #param string $target
* #param int $limit
* #return array
*/
public function split(string $target, string $pattern = '\s+', int $limit = -1): array
{
return mb_split($pattern, $target, $limit);
}
By wrapping the mb_split() function in a method, I make it much easier to use. Simply invoke it with the desired value for the variable $pattern.
Remember, set the character encoding appropriately for your task.
mb_internal_encoding('UTF-8'); // For example.
mb_regex_encoding('UTF-8'); // For example.
In the case of my wrapper method, supply the empty string to the method like so.
$string = new String('UTF-8', 'UTF-8'); // Sets the internal and regex encodings.
$string->split($yourString, "")
In the direct PHP case ...
$characters = mb_split("", $string);

How can I split the string in php?

I have a string like $str.
$str = "00016cam 321254300022cam 321254312315300020cam 32125433153";
I want to split it in array like this. The numbers before 'cam' is the string length.
$splitArray = ["00016cam 3212543", "00022cam 3212543123153", "00020cam 32125433153"]
I have tried following code:
$lengtharray = array();
while ($str != null)
{
$sublength = substr($str, $star, $end);
$star += (int)$sublength; // echo $star."<br>"; // echo $sublength."<br>";
if($star == $total)
{
exit;
}
else
{
}
array_push($lengtharray, $star); // echo
print_r($lengtharray);
}

You can try this 1 line solution
$str = explode('**', preg_replace('/\**cam/', 'cam', $str)) ;

If your string doesn't contain stars then I'm afraid you need to write a simple parser that will:
take characters from the left until it's not numeric
do substr having the length
repeat previous steps on not consumed string
<?php
$input = "00016cam 321254300022cam 321254312315300020cam 32125433153";
function parseArray(string $input)
{
$result = [];
while ($parsed = parseItem($input)) {
$input = $parsed['rest'];
$result[] = $parsed['item'];
}
return $result;
}
function parseItem(string $input)
{
$sLen = strlen($input);
$len = '';
$pos = 0;
while ($pos < $sLen && is_numeric($input[$pos])) {
$len .= $input[$pos];
$pos++;
}
if ((int) $len == 0) {
return null;
}
return [
'rest' => substr($input, $len),
'item' => substr($input, 0, $len)
];
}
var_dump(parseArray($input));

this code works for me. hope helps.
$str = "**00016**cam 3212543**00022**cam 3212543123153**00020**cam 32125433153";
$arr = explode("**", $str);
for ($i=1; $i < sizeof($arr); $i=$i+2)
$arr_final[]=$arr[$i].$arr[$i+1];

String compression in php

Here is my input
aaaabbaaaababbbcccccccccccc
And this is my expected output
a4b2a4b1a1b3c12
I tried like doing foreach and then concating the count of values. It seems like brutforcing. Is there any way to do it efficiently in php .
Help pls

You can use regular expression to get the result
preg_match_all('/(.)\1*/', $str, $m, PREG_SET_ORDER);
$m = array_map(function($i) { return $i[1] . strlen($i[0]); } , $m);
echo implode('', $m); // a4b2a4b1a1b3c12
demo

Here's an example of how to do it with a few for loops (encoding and decoding):
$input = 'aaaabbaaaababbbcccccccccccc';
$encoded = SillyEncoding::encode($input);
$decoded = SillyEncoding::decode($encoded);
echo "input = \t", var_export($input, true), "\n";
echo "encoded = \t", var_export($encoded, true), "\n";
echo "decoded = \t", var_export($decoded, true), "\n";
Output:
input = 'aaaabbaaaababbbcccccccccccc'
encoded = 'a4b2a4b1a1b3c12'
decoded = 'aaaabbaaaababbbcccccccccccc'
The SillyEncoding class:
class SillyEncoding
{
private static $digits = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
static function encode($string)
{
$output = '';
if (strlen($string) > 0) {
$count = 0;
$char = $string[0];
for ($i = 0; isset($string[$i]); ++$i) {
if (isset(self::$digits[$string[$i]])) {
throw new \InvalidArgumentException(sprintf('The input string must not contain a digit at offset %d, got "%s"', $i, $string[$i]));
}
if ($string[$i] === $char) {
++$count;
} else {
$output .= "{$char}{$count}";
$count = 1;
$char = $string[$i];
}
if (!isset($string[$i + 1])) {
$output .= "{$char}{$count}";
}
}
}
return $output;
}
static function decode($string)
{
$output = '';
$length = strlen($string);
if ($length > 0) {
$char = $string[0];
$count = null;
if ($length < 2) {
throw new \InvalidArgumentException(sprintf('Input string must be empty or at least 2 bytes long, got %d bytes', $length));
}
if (isset(self::$digits[$string[0]])) {
throw new \InvalidArgumentException(sprintf('Input string must not start with a digit, got "%s"', $string[0]));
}
for ($i = 1; isset($string[$i]); ++$i) {
$isDigit = isset(self::$digits[$string[$i]]);
if ($isDigit) {
$count .= $string[$i];
}
if (!$isDigit || !isset($string[$i + 1])) {
if (null === $count) {
throw new \InvalidArgumentException(sprintf('Expected a digit at offset %d, got "%s"', $i, $string[$i]));
}
$count = (int) $count;
for ($j = 0; $j < $count; ++$j) {
$output .= $char;
}
$char = $string[$i];
$count = null;
}
}
}
return $output;
}
}
A few things to note:
this isn't an efficient compression algorithm - it might reduce the size if there are many repeated characters, but if you feed it "normal" text the output will be about twice the size of the input
the input cannot contain any digits whatsoever (OP: "the input will be strictly alphabets")

$str = str_split('aaaabbaaaababbbcccccccccccc');
$count = 0;
$a=$result=$b='';
for ($i=0; $i <= count($str); $i++) {
if($a==$str[$i]){
$count++;
$result = $a.$count;
} else{
if ($count > 0) {
$b .= $result;
}
$count = 1;
$a = $str[$i];
$result = $a.$count;
}
}
print_r($b);
See result

Optimal function to create a random UTF-8 string in PHP? (letter characters only)

I wrote this function that creates a random string of UTF-8 characters. It works well, but the regular expression [^\p{L}] is not filtering all non-letter characters it seems. I can't think of a better way to generate the full range of unicode without non-letter characters.. short of manually searching for and defining the decimal letter ranges between 65 and 65533.
function rand_str($max_length, $min_length = 1, $utf8 = true) {
static $utf8_chars = array();
if ($utf8 && !$utf8_chars) {
for ($i = 1; $i <= 65533; $i++) {
$utf8_chars[] = mb_convert_encoding("&#$i;", 'UTF-8', 'HTML-ENTITIES');
}
$utf8_chars = preg_replace('/[^\p{L}]/u', '', $utf8_chars);
foreach ($utf8_chars as $i => $char) {
if (trim($utf8_chars[$i])) {
$chars[] = $char;
}
}
$utf8_chars = $chars;
}
$chars = $utf8 ? $utf8_chars : str_split('abcdefghijklmnopqrstuvwxyz');
$num_chars = count($chars);
$string = '';
$length = mt_rand($min_length, $max_length);
for ($i = 0; $i < $length; $i++) {
$string .= $chars[mt_rand(1, $num_chars) - 1];
}
return $string;
}

\p{L} might be catching too much. Try to limit to {Ll} and {LU} -- {L} includes {Lo} -- others.

With PHP7 and IntlChar there is now a better way:
function utf8_random_string(int $length) : string {
$r = "";
for ($i = 0; $i < $length; $i++) {
$codePoint = mt_rand(0x80, 0xffff);
$char = \IntlChar::chr($codePoint);
if ($char !== null && \IntlChar::isprint($char)) {
$r .= $char;
} else {
$i--;
}
}
return $r;
}

PHP functions question

I'm fairly new to PHP functions I really dont know what the bottom functions do, can some one give an explanation or working example explaining the functions below. Thanks.
PHP functions.
function mbStringToArray ($str) {
if (empty($str)) return false;
$len = mb_strlen($str);
$array = array();
for ($i = 0; $i < $len; $i++) {
$array[] = mb_substr($str, $i, 1);
}
return $array;
}
function mb_chunk_split($str, $len, $glue) {
if (empty($str)) return false;
$array = mbStringToArray ($str);
$n = 0;
$new = '';
foreach ($array as $char) {
if ($n < $len) $new .= $char;
elseif ($n == $len) {
$new .= $glue . $char;
$n = 0;
}
$n++;
}
return $new;
}

The first function takes a multibyte string and converts it into an array of characters, returning the array.
The second function takes a multibyte string and inserts the $glue string every $len characters.

function mbStringToArray ($str) { // $str is a function argument
if (empty($str)) return false; // empty() checks if the argument is not equal to NULL (but does exist)
$len = mb_strlen($str); // returns the length of a multibyte string (ie UTF-8)
$array = array(); // init of an array
for ($i = 0; $i < $len; $i++) { // self explanatory
$array[] = mb_substr($str, $i, 1); // mb_substr() substitutes from $str one char for each pass
}
return $array; // returns the result as an array
}
That should help you to understand the second function

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: Split multibyte string (word) into separate characters - php

Trying to split this string "主楼怎么走" into separate characters (I need an array) using mb_split with no luck... Any suggestions? Thank you!

try a regular expression with 'u' option, for example $chars = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);

The documentation of str_split() says Note: str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string. PHP 7.4.0 adds a new function mb_str_split(). Use this instead.

Related

How to return a sqrt result in a PHP public function [duplicate]

How can I split the string in php?

String compression in php

Optimal function to create a random UTF-8 string in PHP? (letter characters only)

PHP functions question

Categories

Resources