Count number of letters-numbers in an UTF8 string?

Count number of letters-numbers in an UTF8 string? - php

Basically I have this string for example $str = 'По, своей 12' I need something that returns the numbers of chars and numbers (leaving off spaces and other punctuation)
How can I achive this? Maybe doing a preg_replace with \p{L} and [0-9] ?
countChars('По, своей 12'); //> Should return: 9
Note: mb_strlen() counts spaces and punctuation too and i don't want this

I did it:
function countChars($haystack) {
$count = preg_replace("/[^\p{L}0-9]/uim",'',$haystack);
return mb_strlen($count);
}

Related

convert ASCII value to a character based on specific char in php

I'd like to ask dumb question.
How could I convert #48#49#50 based on # char ?
In PHP, understand that chr() function is used to convert a ASCII value to a character.
48 is 0.
49 is 1.
And 50 is 2.
May I know how to convert #48#49#50 as 012 and how to store 012 in one variable ?
Eg- $num = 012

We can try using preg_replace_callback() here with the regex pattern #\d+. As we capture each #\d+ match, use chr() on that match to generate the ASCII character replacement.
$input = "#48#49#50";
$out = preg_replace_callback(
"/#(\d+)/",
function($m) { return chr($m[1]); },
$input
);
echo $out; // 012

Separate Unicode and Ascii Charactors with White Space from PHP

I'm Doing some class to Handle Sinhala Unicode from php, I want to separate mixed string Unicode and ascii char as a separate words with white space.
example:
$inputstr = "ලංකාABCDE TEST1දිස්ත්‍රික් වාණිජ්‍යTEMP මණ්ඩලය # MNOPQ";
function separatestring($inputstr)
{
//do some code
return $inputstr;
}
echo separatestring($inputstr);
//OUTPUT String = ලංකා ABCDE TEST1 දිස්ත්‍රික් වාණිජ්‍ය TEMP මණ්ඩලය # MNOPQ
i have try with preg_replace with Regex and several looping methods but any method did not success. please help me on this. Thanks All!

This works for me:
$inputstr = "ලංකාABCDE TEST1දිස්ත්‍රික් වාණිජ්‍යTEMP මණ්ඩලය # MNOPQ";
function separatestring($inputstr)
{
$re = '#\s+|(?<=[^\x20-\x7f])(?=[\x20-\x7f])'
. '|(?<=[\x20-\x7f])(?=[^\x20-\x7f])#';
$array = preg_split($re, $inputstr);
return array_filter($array);
}
echo implode(" ", separatestring($inputstr));
//OUTPUT String = ලංකා ABCDE TEST1 දිස්ත්‍රික් වාණිජ්‍ය TEMP මණ්ඩලය # MNOPQ
The regexp for splitting means the following:
# — start regexp (delimeter character),
\s+ — split on one or more whitespace character (counting the whitespace as the separator),
| — or,
(?<=[^\x20-\x7f])(?=[\x20-\x7f]) — split on a border between non-ASCII and ASCII characters (not counting them as separators),
| — or,
(?<=[\x20-\x7f])(?=[^\x20-\x7f]) — split on a border between ASCII and non-ASCII characters (not counting them as separators),
# — end regexp (delimeter character).
Unfortunately, my regular expression is not too elegant, so sometimes the empty strings are returned (because whitespace is also an ASCII character). I’ve put array_filter to fix this, but a more elegant solution might exist.
I’ve written separatestring in such a way that it returns in array. If you want a string, replace the return statement this way:
return implode(" ", array_filter($array));

PHP: str_word_count(åäöåäöåäöåäö) returns the integer value of 12

I am using special symbols such as å ä ö on my website which measures the lengths of different texts. Thing is, I have noticed that PHP counts the symbols "å" "ä" "ö" as 1 word each. So åäö counts as 3 words, and åäöåäöåäöåäöåäö counts as 15 words. Well this is clearly not correct and I cannot find an answer to this problem anywhere. I'd be thankful for a useful answer, thank you!

If there's a limited set of word characters that you need to take into account, just supply those into str_word_count with its third param (charlist):
$charlist = 'åäö';
echo str_word_count('åäöåäöåäöåäöåäö', 0, $charlist); // 1
Alternatively, you can write your own Unicode-ready str_word_count function. One possible approach is splitting the source string by non-word symbols, then counting the resulting array:
function mb_str_word_count($str) {
return preg_match_all('#[\p{L}\p{N}][\p{L}\p{N}\'-]*#u', $str);
}
Basically, this function counts all the substrings in the target string that start with either Letter or Number character, followed by any number (incl. zero) of Letters, Numbers, hyphens and single quote symbols (matching the description given in str_word_count() docs).

You can try adding
setlocale(LC_ALL, 'en_US.utf8')
before your call to str_word_count
or roll on your own with
substr_count(trim($str), ' ');

this work for me... hope its usefull.
USING str_word_count you need to use utf8_decode(utf8_encode)..
function cortar($str)
{
if (20>$count=str_word_count($str)) {
return $str;
}
else
{
$array = str_word_count($str,1,'.,-0123456789()+=?¿!"<>*ñÑáéíóúÁÉÍÓÚ#|/%$#¡');
$s='';
$c=0;
foreach ($array as $e) {
if (20>$c) {
if (19>$c) {
$s.=$e.' ';
}
else
{
$s.=$e;
}
}
$c+=1;
}
return utf8_decode(utf8_encode($s));
}
}
function returs 20 words

If it is a string without linebreaks, and words are separated by a whitespace, a simple workaround would be to trim() the string and then count the whitespaces.
$string = "Wörk has to be done.";
// 1 space is 2 words, 2 spaces are 3 words etc.
if(substr_count(trim($string), ' ') > 2)
{
// more than 3 words
// ...
}

How to trim special chars from string?

I want to remove all non-alphanumeric signs from left and right of the string, leaving the ones in middle of string.
I've asked similar question here, and good solution is:
$str = preg_replace('/^\W*(.*\w)\W*$/', '$1', $str);
But it does remove also some signs like ąĄćĆęĘ etc and it should not as its still alphabetical sign.
Above example would do:
~~AAA~~ => AAA (OK)
~~AA*AA~~ => AA*AA (OK)
~~ŚAAÓ~~ => AA (BAD)

Make sure you use u flag for unicode while using your regex.
Following works with your input:
$str = preg_replace('/^\W*(.*\w)\W*$/u', '$1', '~~ŚAAÓ~~' );
// str = ŚAAÓ
But this won't work: (Don't Use it)
$str = preg_replace('/^\W*(.*\w)\W*$/', '$1', '~~ŚAAÓ~~' );

You can pass in a list of valid characters and tell the function to replace any character that is not in that list:
$str = preg_replace('/[^a-zA-Z0-9*]+/', '', $str);
The square brackets say select everything in this range. The carat (^) is the regex for not. We then list our valid characters (lower case a to z, uppercase a to z, numbers from 0 to 9, and an asterisks). The plus symbol on the end of the square bracket says select 0 or more characters.
Edit:
If this is the list of all characters you want to keep, then:
$str = preg_replace('/[^ĄąĆćŻżŹźŃńŁłÓó*]+/', '', $str);

ONE question regular expression PHP

Count character '_' in start line
example :
subject = '_abcd_abc'; // return 1
or
subject = '__abcd_abc'; // return 2
or
subject = '___abcd_abc'; // return 3
everyone help me ~
I use PHP

If you are sure the start of the string contains _, you can do this with just strspn():
echo strspn('___abcd_abc', '_');
// -> 3
If there might be no leading underscores, you can still do this without a regex using strlen and ltrim:
strlen($str) - strlen(ltrim($str, "_"));
This counts the string length, then subtracts the string length without the underscores on the left, the result being the number of underscores.
strspn()
ltrim()
strlen()

Try this:
return preg_match('/^_+/', $str, $match) ? strlen($match[0]) : 0;
If preg_match finds a match, $match[0] will contain that match and strlen($match[0]) returns the length of the match; otherwise the expression will return 0.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Count number of letters-numbers in an UTF8 string? - php

I did it: function countChars($haystack) { $count = preg_replace("/[^\p{L}0-9]/uim",'',$haystack); return mb_strlen($count); }

Related

convert ASCII value to a character based on specific char in php

Separate Unicode and Ascii Charactors with White Space from PHP

PHP: str_word_count(åäöåäöåäöåäö) returns the integer value of 12

How to trim special chars from string?

ONE question regular expression PHP

Categories

Resources