How to check if a string starts with "_" in PHP? [duplicate]

How to check if a string starts with "_" in PHP? [duplicate] - php

This question already has answers here:
startsWith() and endsWith() functions in PHP
(34 answers)
Closed 8 years ago.
Example: I have a $variable = "_foo", and I want to make absolutely sure that $variable does not start with an underscore "_". How can I do that in PHP? Is there some access to the char array behind the string?

$variable[0] != "_"
How does it work?
In PHP you can get particular character of a string with array index notation. $variable[0] is the first character of a string (if $variable is a string).

You might check out the substr function in php and grab the first character that way:
http://php.net/manual/en/function.substr.php
if (substr('_abcdef', 0, 1) === '_') { ... }

Since someone mentioned efficiency, I've benchmarked the functions given so far out of curiosity:
function startsWith1($str, $char) {
return strpos($str, $char) === 0;
}
function startsWith2($str, $char) {
return stripos($str, $char) === 0;
}
function startsWith3($str, $char) {
return substr($str, 0, 1) === $char;
}
function startsWith4($str, $char){
return $str[0] === $char;
}
function startsWith5($str, $char){
return (bool) preg_match('/^' . $char . '/', $str);
}
function startsWith6($str, $char) {
if (is_null($encoding)) $encoding = mb_internal_encoding();
return mb_substr($str, 0, mb_strlen($char, $encoding), $encoding) === $char;
}
Here are the results on my average DualCore machine with 100.000 runs each
// Testing '_string'
startsWith1 took 0.385906934738
startsWith2 took 0.457293987274
startsWith3 took 0.412894964218
startsWith4 took 0.366240024567 <-- fastest
startsWith5 took 0.642996072769
startsWith6 took 1.39859509468
// Tested "string"
startsWith1 took 0.384965896606
startsWith2 took 0.445554971695
startsWith3 took 0.42377281189
startsWith4 took 0.373164176941 <-- fastest
startsWith5 took 0.630424022675
startsWith6 took 1.40699005127
// Tested 1000 char random string [a-z0-9]
startsWith1 took 0.430691003799
startsWith2 took 4.447286129
startsWith3 took 0.413349866867
startsWith4 took 0.368592977524 <-- fastest
startsWith5 took 0.627470016479
startsWith6 took 1.40957403183
// Tested 1000 char random string [a-z0-9] with '_' prefix
startsWith1 took 0.384054899216
startsWith2 took 4.41522812843
startsWith3 took 0.408898115158
startsWith4 took 0.363884925842 <-- fastest
startsWith5 took 0.638479948044
startsWith6 took 1.41304707527
As you can see, treating the haystack as array to find out the char at the first position is always the fastest solution. It is also always performing at equal speed, regardless of string length. Using strpos is faster than substr for short strings but slower for long strings, when the string does not start with the prefix. The difference is irrelevant though. stripos is incredibly slow with long strings. preg_match performs mostly the same regardless of string length, but is only mediocre in speed. The mb_substr solution performs worst, while probably being more reliable though.
Given that these numbers are for 100.000 runs, it should be obvious that we are talking about 0.0000x seconds per call. Picking one over the other for efficiency is a worthless micro-optimization, unless your app is doing startsWith checking for a living.

This is the most simple answer where you are not concerned about performance:
if (strpos($string, '_') === 0) {
# code
}
If strpos returns 0 it means that what you were looking for begins at character 0, the start of the string.
It is documented thoroughly here: http://uk3.php.net/manual/en/function.strpos.php
(PS $string[0] === '_' is the best answer)

function starts_with($s, $prefix){
// returns a bool
return strpos($s, $prefix) === 0;
}
starts_with($variable, "_");

Here’s a better starts with function:
function mb_startsWith($str, $prefix, $encoding=null) {
if (is_null($encoding)) $encoding = mb_internal_encoding();
return mb_substr($str, 0, mb_strlen($prefix, $encoding), $encoding) === $prefix;
}

To build on pinusnegra's answer, and in response to Gumbo's comment on that answer:
function has_leading_underscore($string) {
return $string[0] === '_';
}
Running on PHP 5.3.0, the following works and returns the expected value, even without checking if the string is at least 1 character in length:
echo has_leading_underscore('_somestring').', ';
echo has_leading_underscore('somestring').', ';
echo has_leading_underscore('').', ';
echo has_leading_underscore(null).', ';
echo has_leading_underscore(false).', ';
echo has_leading_underscore(0).', ';
echo has_leading_underscore(array('_foo', 'bar'));
/*
* output: true, false, false, false, false, false, false
*/
I don't know how other versions of PHP will react, but if they all work, then this method is probably more efficient than the substr route.

Related

Compressing string using ASCII char codes using backtracking in php

I want to can compress a string using the chars ASCII codes.
I want to compress them using number patterns. Because ASCII codes are numbers, I want to find sub-patterns in the list of ASCII char codes.
Theory
This will be the format for every pattern I found:
[nnn][n][nn], where:
[nnn] is the ASCII code for first char, from group numbers with same pattern.
[n] is a custom number for a certain pattern/rule (I will explain more below).
[nn] shows how many times this rule happens.
The number patterns are not concretely established. But let me give you some examples:
same char
linear growth (every number/ascii is greater, with one, than previous)
linear decrease (every number/ascii is smaller, with one, than previous)
Now let's see some situations:
"adeflk" becomes "097.1.01-100.2.03-108.3.02"
same char ones, linear growth three times, linear decrease twice.
"rrrrrrrrrrr" becomes "114.1.11"
same char eleven times.
"tsrqpozh" becomes "116.3.06-122.1.01-104.1.01"
linear decrease six times, same char ones, same char ones.
I added dots ('.') and dashes ('-') so you can see them easily.
Indeed, we don't see good results (compression). I want to use this algorithm for large strings. And adding more rules (number patterns) we increase changes for making shorter result than original.
I know the existent compressing solutions. I want this solution because the result have only digits, and it helps me.
What I've tried
// recursive function
function run (string $data, array &$rules): string {
if (strlen($data) == 1) {
// str_pad for having always ASCII code with 3 digits
return (str_pad(ord($data), 3, '0', STR_PAD_LEFT) .'.'. '1' .'.'. '01');
}
$ord = ord($data); // first char
$strlen = strlen($data);
$nr = str_pad($ord, 3, '0', STR_PAD_LEFT); // str_pad for having always ASCII code with 3 digits
$result = '';
// compares every rule
foreach ($rules as $key => $rule) {
for ($i = 1; $i < $strlen; $i++) {
// check for how many times this rule matches
if (!$rule($ord, $data, $i)) {
// save the shortest result (so we can compress)
if (strlen($r = ($nr .'.'. $key .'.'. $i .' - '. run(substr($data, $i), $rules))) < strlen($result)
|| !$result) {
$result = $r;
}
continue 2; // we are going to next rule
}
}
// if comes here, it means entire $data follow this rule ($key)
if (strlen($r = (($nr .'.'. $key .'.'. $i))) < strlen($result)
|| !$result) {
$result = $r; // entire data follow this $rule
}
}
return $result; // it will return the shortest result it got
}
// ASCII compressor
function compress (string $data): string {
$rules = array( // ASCII rules
1 => function (int $ord, string $data, int $i): bool { // same char
return ($ord == ord($data[$i]));
},
2 => function (int $ord, string $data, int $i): bool { // linear growth
return (($ord+$i) == ord($data[$i]));
},
3 => function (int $ord, string $data, int $i): bool { // progressive growth
return ((ord($data[$i-1])+$i) == ord($data[$i]));
},
4 => function (int $ord, string $data, int $i): bool { // linear decrease
return (($ord-$i) == ord($data[$i]));
},
5 => function (int $ord, string $data, int $i): bool { // progressive decrease
return ((ord($data[$i-1])-$i) == ord($data[$i]));
}
);
// we use base64_encode because we want only ASCII chars
return run(base64_encode($data), $rules);
}
I added dots ('.') and dashes ('-') only for testing easily.
Results
compress("ana ar") => "089.1.1 - 087.1.1 - 053.1.1 - 104.1.1 - 073.1.1 - 071.4.2 - 121.1.01"
Which is ok. And it runs fast. Without a problem.
compress("ana aros") => Fatal error: Maximum execution time of 15 seconds exceeded
If string is a bit longer, it gets toooo much. It works fast and normal for 1-7 chars. But when there are more chars in string, that happens.
The algorithm doesn't run perfect and doesn't return the perfect 6-digit pattern, indeed. Before getting there, I'm stucked with that.
Question
How I can increase performance of this backtracking for running ok now and also with more rules?

Searching for gradients / infix repetitions is not a good match for compressing a natural language. Natural language is significantly easier to compress using a dictionary based approach (both dynamic dictionaries bundled with the compressed data, as well as pre-compiled dictionaries trained on a reference set work), as even repeating sequences in ASCII encoding usually don't follow any trivial geometric pattern, but appear quite random when observing only the individual characters ordinal representations.
That said, the reason your algorithm is so slow, is because you are exploring all possible patterns, which results in a run time exponential in the input length, precisely O(5^n). For your self-set goal of finding the ideal compression in a set of 5 arbitrary rules, that's already as good as possible. If anything, you can only reduce the run time complexity by a constant factor, but you can't get rid of the exponential run time. In other terms, even if you apply perfect optimizations, that only makes the difference of increasing the maximum input length you can handle by maybe 30-50%, before you inevitably run into timeouts again.
#noam's solution doesn't even attempt to find the ideal pattern, but simply greedily uses the first matching pattern to consume the input. As a result it will incorrectly ignore better matches, but in return it also has only to look at each input character once only, resulting in a linear run time complexity O(n).
Of course there are some details in your current solution which make it a lot easier to solve, just based on simple observations about your rules. Be wary though that these assumptions will break when you try to add more rules.
Specifically, you can avoid most of the backtracking if you are smart about the order in which you try your rules:
Try to start a new geometric pattern ord(n[i])=ord(n[0])+i first, and accept as match only when it matched at least 3 characters ahead.
Try to continue current geometric pattern.
Try to continue current gradient pattern.
Try to start new gradient ord(n[i])=ord(n[0])+i, and accept as match only when it matched at least 2 characters ahead.
Try to start / continue simple repetition last, and always accept.
Once a character from input was accepted by any of these rules (meaning it has been consumed by a sequence), you will no longer need to backtrack from it or check any other rule for it, as you have already found the best possible representation for it. You still need to re-check the rules for every following character you add to to the sequence, as a suffix of the gradient rule may be required as the prefix for a geometric rule.
Generically speaking, the pattern in your rule set which allows this, is the fact that for every rule with a higher priority, no match for that rule can have a better match in any following rule. If you like, you can easily prove that formally for every pair of possible rules you have in your set.
If you want to test your implementation, you should specifically test patterns such as ABDHIK. Even though H is a match the currently running geometric sequence ABDH, using it as the starting point of the new geometric sequence HIK is unconditionally the better choice.

I came up with a initial solution to your problem. Please note:
You will never get a sequence of just one letter, because each 2 consecutive letters are a "linear growth" with a certain difference.
My solution is not very clean. You can, for example combine $matches and $rules to a single array.
My solution is naive and greedy. For example, in the example adeflk, the sequence def is a sequence of 3, but because my solution is greedy, it will consider ad as a sequence of 2, and ef as another sequence of 2. That being said, you can still improve my code.
The code is hard to test. You should probably make use of OOP and divide the code to many small methods that are easy to test separately.
<?php
function compress($string, $rules, $matches) {
if ($string === '') {
return getBestMatch($matches);
}
$currentCharacter = $string[0];
$matchFound = false;
foreach ($rules as $index => &$rule) {
if ($rule['active']) {
$soFarLength = strlen($matches[$index]);
if ($soFarLength === 0) {
$matchFound = true;
$matches[$index] = $currentCharacter;
} elseif ($rule['callback']($currentCharacter, $matches[$index])) {
$matches[$index] .= $currentCharacter;
$matchFound = true;
} else {
$rule['active'] = false;
}
}
}
if ($matchFound) {
return compress(substr($string, 1), $rules, $matches);
} else {
return getBestMatch($matches) . startNewSequence($string);
}
}
function getBestMatch($matches) {
$rule = -1;
$length = -1;
foreach ($matches as $index => $match) {
if (strlen($match) > $length) {
$length = strlen($match);
$rule = $index;
}
}
if ($length <= 0) {
return '';
}
return ord($matches[$rule][0]) . '.' . $rule . '.' . $length . "\n";
}
function startNewSequence($string) {
$rules = [
// rule number 1 - all characters are the same
1 => [
'active' => true,
'callback' => function ($a, $b) {
return $a === substr($b, -1);
}
],
// rule number 2 - ASCII code of current letter is one more than the last letter ("linear growth")
2 => [
'active' => true,
'callback' => function ($a, $b) {
return ord($a) === (1 + ord(substr($b, -1)));
}
],
// rule number 3 - ASCII code is a geometric progression. The ord() of each character increases with each step.
3 => [
'active' => true,
'callback' => function ($a, $b) {
if (strlen($b) == 1) {
return ord($a) > ord($b);
}
$lastCharOrd = ord(substr($b, -1));
$oneBeforeLastCharOrd = ord(substr($b, -2, 1));
$lastDiff = $lastCharOrd - $oneBeforeLastCharOrd;
$currentOrd = ord($a);
return ($currentOrd - $lastCharOrd) === ($lastDiff + 1);
}
],
// rule number 4 - ASCII code of current letter is one less than the last letter ("linear decrease")
4 => [
'active' => true,
'callback' => function ($a, $b) {
return ord($a) === (ord(substr($b, -1)) - 1);
}
],
// rule number 5 - ASCII code is a negative geometric progression. The ord() of each character decreases by one
// with each step.
5 => [
'active' => true,
'callback' => function ($a, $b) {
if (strlen($b) == 1) {
return ord($a) < ord($b);
}
$lastCharOrd = ord(substr($b, -1));
$oneBeforeLastCharOrd = ord(substr($b, -2, 1));
$lastDiff = $lastCharOrd - $oneBeforeLastCharOrd;
$currentOrd = ord($a);
return ($currentOrd - $lastCharOrd) === ($lastDiff - 1);
}
],
];
$matches = [
1 => '',
2 => '',
3 => '',
4 => '',
5 => '',
];
return compress($string, $rules, $matches);
}
echo startNewSequence('tsrqpozh');

Function based on palindrome using php code [duplicate]

This question already has answers here:
palindrome condition checking using function arguments [closed]
(5 answers)
Closed 9 years ago.
I can not understand these steps.
function Palindrome($str) {
if ((strlen($str) == 1) || (strlen($str) == 0)) {
echo " THIS IS PALINDROME";
}
else {
if (substr($str,0,1) == substr($str,(strlen($str) - 1),1)) {
return Palindrome(substr($str,1,strlen($str) -2));
}
else { echo " THIS IS NOT A PALINDROME"; }
}
}
Palindrome("456");

if ((strlen($str) == 1) || (strlen($str) == 0)) {
echo " THIS IS PALINDROME";
}
If strlen($str) <= 1 this is obviously a palindrome.
else {
if (substr($str,0,1) == substr($str,(strlen($str) - 1),1)) {
return Palindrome(substr($str,1,strlen($str) -2));
}
If strlen($str) > 1 and if first and last characters of the string are similar, call the same Palindrome function on the inner string (that is the string without its first and last characters).
else { echo " THIS IS NOT A PALINDROME"; }
}
If first and last characters are not equals, this is not a palindrome.
The principle is to test only the outer characters, and to call the same function again and again on smaller parts of the string, until it has tested every pair of characters that have to be equal if we're dealing with a palindrome.
This is called recursion.
This image illustrates what happens better than my poor english can:
image source

Palindrome("456") gets $str == "456". So, looking at branches:
if ((strlen($str) == 1) || (strlen($str) == 0)) -> false
if (substr($str,0,1) == substr($str,(strlen($str) - 1),1)) is the same as if ("4" == "6")), which is false, so we go to the last branch, outputting that "456" is not a palindrome.
Let's see what would happen for Palindrome("454") gets $str == "456". So, looking at branches:
if ((strlen($str) == 1) || (strlen($str) == 0)) -> false
if (substr($str,0,1) == substr($str,(strlen($str) - 1),1)) is the same as if ("4" == "4")), which is true, so we call Palindrome(substr($str,1,strlen($str) -2)), which is the same as `Palindrome("5")
Now, inside that function call, we get new variable $str == "5". Performing the same steps, our first if is true, so we echo that it is a palindrome.
For a recursion, it is crucial to remember that each function call has it's own local variables. In other words, when you call Palindrome(...) and inside that function call Palindrome(...) is called again, there are two $str variables in memory, one belonging to the first (outer) call and one to the second (inner) call. Of course, each sees only its own, but once you exit the inner call, you have unchanged $str in the outer call. That's why we had $str == "454" in the first call and $str == "5" in the second. These are named the same, but are two variables existing in the memory (until you exit the second (inner) call of Palindrome()).

It's recursive...
So it checks the outer and innrer characters. If they match, it continues to the next most outer/inner character, i.e.
NURSESRUN
Will check:
Is the first and last char equal? (N=N?)
Yes. are the second and second from last equal? (U=U?) - by calling itself again. This is recursion.
If it runs into non equal chars it quits and returns 'NOT A PALINDROME'
If it runs out of checks (zero length string for even number of chars, string length 1 for odd numbers) it reaches the 'terminating condition' (no more recursion) and returns 'THIS IS A PALINDROME'

Regex expression for matching all duplicate substrings of any length

Let's say we have a string: "abcbcdcde"
I want to identify all substrings that are repeated in this string using regex (i.e. no brute-force iterative loops).
For the above string, the result set would be: {"b", "bc", "c", "cd", "d"}
I must confess that my regex is far more rusty than it should be for someone with my experience. I tried using a backreference, but that'll only match consecutive duplicates. I need to match all duplicates, consecutive or otherwise.
In other words, I want to match any character(s) that appears for the >= 2nd time. If a substring occurs 5 times, then I want to capture each of occurrences 2-5. Make sense?
This is my pathetic attempt thus far:
preg_match_all( '/(.+)(.*)\1+/', $string, $matches ); // Way off!
I tried playing with look-aheads but I'm just butchering it. I'm doing this in PHP (PCRE) but the problem is more or less language-agnostic. It's a bit embarrassing that I'm finding myself stumped on this.

Your problem is recursi ... you know what, forget about recursion! =p it wouldn't really work well in PHP and the algorithm is pretty clear without it as well.
function find_repeating_sequences($s)
{
$res = array();
while ($s) {
$i = 1; $pat = $s[0];
while (false !== strpos($s, $pat, $i)) {
$res[$pat] = 1;
// expand pattern and try again
$pat .= $s[$i++];
}
// move the string forward
$s = substr($s, 1);
}
return array_keys($res);
}
Out of interest, I wrote Tim's answer in PHP as well:
function find_repeating_sequences_re($s)
{
$res = array();
preg_match_all('/(?=(.+).*\1)/', $s, $matches);
foreach ($matches[1] as $match) {
$length = strlen($match);
if ($length > 1) {
for ($i = 0; $i < $length; ++$i) {
for ($j = $i; $j < $length; ++$j) {
$res[substr($match, $i, $j - $i + 1)] = 1;
}
}
} else {
$res[$match] = 1;
}
}
return array_keys($res);
}
I've let them fight it out in a small benchmark of 800 bytes of random data:
$data = base64_encode(openssl_random_pseudo_bytes(600));
Each code is run for 10 rounds and the execution time is measured. The results?
Pure PHP - 0.014s (10 runs)
PCRE - 40.86s <-- ouch!
It gets weirder when you look at 24k bytes (or anything above 1k really):
Pure PHP - 4.565s (10 runs)
PCRE - 0.232s <-- WAT?!
It turns out that the regular expression broke down after 1k characters and so the $matches array was empty. These are my .ini settings:
pcre.backtrack_limit => 1000000 => 1000000
pcre.recursion_limit => 100000 => 100000
It's not clear to me how a backtrack or recursion limit would have been hit after only 1k of characters. But even if those settings are "fixed" somehow, the results are still obvious, PCRE doesn't seem to be the answer.
I suppose writing this in C would speed it up somewhat, but I'm not sure to what degree.
Update
With some help from hakre's answer I put together an improved version that increases performance by ~18% after optimizing the following:
Remove the substr() calls in the outer loop to advance the string pointer; this was a left over from my previous recursive incarnations.
Use the partial results as a positive cache to skip strpos() calls inside the inner loop.
And here it is, in all its glory (:
function find_repeating_sequences3($s)
{
$res = array();
$p = 0;
$len = strlen($s);
while ($p != $len) {
$pat = $s[$p]; $i = ++$p;
while ($i != $len) {
if (!isset($res[$pat])) {
if (false === strpos($s, $pat, $i)) {
break;
}
$res[$pat] = 1;
}
// expand pattern and try again
$pat .= $s[$i++];
}
}
return array_keys($res);
}

You can't get the required result in a single regex because a regex will match either greedily (finding bc...bc) or lazily (finding b...b and c...c), but never both. (In your case, it does find c...c, but only because c is repeated twice.)
But once you've found a repeated substring of length > 1, it logically follows that all the smaller "substrings of that substring" must also be repeated. If you want to get them spelled out for you, you need to do this separately.
Taking your example (using Python because I don't know PHP):
>>> results = set(m.group(1) for m in re.finditer(r"(?=(.+).*\1)", "abcbcdcde"))
>>> results
{'d', 'cd', 'bc', 'c'}
You could then go and apply the following function to each of your results:
def substrings(s):
return [s[start:stop] for start in range(len(s)-1)
for stop in range(start+1, len(s)+1)]
For example:
>>> substrings("123456")
['1', '12', '123', '1234', '12345', '123456', '2', '23', '234', '2345', '23456',
'3', '34', '345', '3456', '4', '45', '456', '5', '56']

The closest I can get is /(?=(.+).*\1)/
The purpose of the lookahead is to allow the same characters to be matched more than once (for instance, c and cd). However, for some reason it doesn't seem to be getting the b...

Interesting question. I basically took the function in Jacks answer and was trying if the number of tests can be reduced.
I first tried to only search half the string, however it turned out that creating the pattern to search for via substr each time was way too expensive. The way how it is done in Jacks answer by appending one character per each iteration is way better it looks like. And then I did run out of time so I could not look further into it.
However while looking for such an alternative implementation I at least found out that some of the differences in the algorithm I had in mind could be applied to Jacks function as well:
There is no need to cut the beginning of the string in each outer iteration as the search is already done with offsets.
If the rest of the subject to look for repetition is smaller than the repetition needle, you do not need to search for the needle.
If it was already searched for the needle, you don't need to search again.
Note: This is a memory trade. If you have many repetitions, you will use similar memory. However if you do have a low amount of repetitions, than this variant uses more memory than before.
The function:
function find_repeating_sequences($string) {
$result = array();
$start = 0;
$max = strlen($string);
while ($start < $max) {
$pat = $string[$start];
$i = ++$start;
while ($max - $i > 0) {
$found = isset($result[$pat]) ? $result[$pat] : false !== strpos($string, $pat, $i);
if (!$result[$pat] = $found) break;
// expand pattern and try again
$pat .= $string[$i++];
}
}
return array_keys(array_filter($result));
}
So just see this as an addition to Jacks answer.

substr return empty string

i have problem with $length of substr function
my CODE
$string='I love stackoverflow.com';
function arabicSubStr($value,$start,$length=false){
return mb_substr($value,$start,$length,'UTF-8');
}
echo arabicSubStr($string,7);//outputs nothing
echo substr($string,7);//outputs stackoverflow.com
The reason of the problem is:
If length is given and is 0, FALSE or NULL an empty string will be returned.
So, How i can fix the problem?
i won't use strlen($string)
EDITE
I know the reason is because i've defined $length as false
And i am here to know what should i put in $length parameter to avoid this error?
i am trying to put -1 it's returns //stackoverflow.co

Since the reason you're getting an empty string is specified entirely by the content of your question (using 0, FALSE or NULL), I assume you just want a way to get the rest of the string.
In which case, I'd use something like:
function arabicSubStr ($value, $start, $length = -1) {
if ($length == -1)
$length = mb_strlen ($value, 'UTF-8') - $start;
return mb_substr ($value, $start, $length, 'UTF-8');
}
You need to do it this way since there is no sentinel value of length that means "the rest of the string". Positive numbers (and zero) will limit the size to that given, negative numbers will strip off the end of the string (as you show in your question edit).
If you really don't want to use a string length function, you could try a value of 9999 (or even higher) and hope that:
the mb_substr() function will only use it as a maximum value; and
you won't pass in any strings 10K or more.
In other words, something along the lines of:
function arabicSubStr ($value, $start, $length = 9999){
return mb_substr ($value, $start, $length, 'UTF-8');
}
Though keep in mind I haven't tested that, I don't have any PHP environments at my current location.

It's because you have $length set to false as the default parameter for your function, which effectivley means you want it to return a substring of 0 length.
Unfortunately, if you have to set the final parameter (the charset) which I imagine you do, then you have to calculate the length of the string first, so something like:
function arabicSubStr($value,$start,$length=false){
$length = ($length) ? $length : mb_strlen($value,'UTF-8') - $start;
return mb_substr($value,$start,$length,'UTF-8');
}

simplest, shortest way to count capital letters in a string with php?

I am looking for the shortest, simplest and most elegant way to count the number of capital letters in a given string.

function count_capitals($s) {
return mb_strlen(preg_replace('![^A-Z]+!', '', $s));
}

$str = "AbCdE";
preg_match_all("/[A-Z]/", $str); // 3

George Garchagudashvili Solution is amazing, but it fails if the lower case letters contain diacritics or accents.
So I did a small fix to improve his version, that works also with lower case accentuated letters:
public static function countCapitalLetters($string){
$lowerCase = mb_strtolower($string);
return strlen($lowerCase) - similar_text($string, $lowerCase);
}
You can find this method and lots of other string common operations at the turbocommons library:
https://github.com/edertone/TurboCommons/blob/70a9de1737d8c10e0f6db04f5eab0f9c4cbd454f/TurboCommons-Php/src/main/php/utils/StringUtils.php#L373
EDIT 2019
The method to count capital letters in turbocommons has evolved to a method that can count upper case and lower case characters on any string. You can check it here:
https://github.com/edertone/TurboCommons/blob/1e230446593b13a272b1d6a2903741598bb11bf2/TurboCommons-Php/src/main/php/utils/StringUtils.php#L391
Read more info here:
https://turbocommons.org/en/blog/2019-10-15/count-capital-letters-in-string-javascript-typescript-php
And it can also be tested online here:
https://turbocommons.org/en/app/stringutils/count-capital-letters

I'd give another solution, maybe not elegant, but helpful:
$mixed_case = "HelLo wOrlD";
$lower_case = strtolower($mixed_case);
$similar = similar_text($mixed_case, $lower_case);
echo strlen($mixed_case) - $similar; // 4

It's not the shortest, but it is arguably the simplest as a regex doesn't have to be executed. Normally I'd say this should be faster as the logic and checks are simple, but PHP always surprises me with how fast and slow some things are when compared to others.
function capital_letters($s) {
$u = 0;
$d = 0;
$n = strlen($s);
for ($x=0; $x<$n; $x++) {
$d = ord($s[$x]);
if ($d > 64 && $d < 91) {
$u++;
}
}
return $u;
}
echo 'caps: ' . capital_letters('HelLo2') . "\n";

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.