Fast case counting - php

I have a large number of strings to process in php. I want to "fix" them to be title case (using ucwords(strtolower($str))) but only if they are all upper or all lower case already. If they are already mixed case, I'd just rather just leave them as they are.
What is the fastest way to check for this? It seems like foring through the string would be a rather slow way to go about it.
Here's what I have, which I think will be too slow:
function fixCase($str)
{
$uc = 0;
$lc = 0;
for($i=0;$i<strlen($str);$i++)
{
if ($str[$i] >= 'a' && $str[$i] <= 'z')
$lc++;
else if ($str[$i] >= 'A' && $str[$i] <= 'Z')
$uc++;
}
if ($uc == 0 || $lc == 0)
{
return ucwords(strtolower($str));
}
}

just use a string compare (case sensitive)
function fixCase($str)
{
if (
(strcmp($str, strtolower($str)) === 0) ||
(strcmp($str, strtoupper($str)) === 0) )
{
$str = ucwords(strtolower($str));
}
return $str;
}

There's not going to be any amazing optimization, because by the nature of the problem you need to look at every character.
Personally, I would just loop over the characters of the string with this sort of algorithm:
Look at the first character in the string, set a variable indicating whether it was upper or lowercase.
Now examine each character sequentially. If you get to the end of the string and they've all been the same case as the first character, fix the string's case as you like.
If any character is a different case than the first character was, break the loop and return the string.
Edit: actual code, I think this is about as good as you're going to get.
// returns 0 if non-alphabetic char, 1 if uppercase, 2 if lowercase
function getCharType($char)
{
if ($char >= 'A' && $char <= 'Z')
{
return 1;
}
else if ($char >= 'a' && $char <= 'z')
{
return 2;
}
else
{
return 0;
}
}
function fixCase($str)
{
for ($i = 0; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != 0)
{
$firstCharType = $charType;
break;
}
}
for ($i = $i + 1; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != $firstCharType && $charType != 0)
{
return $str;
}
}
if ($firstCharType == 1) // uppercase, need to convert to lower first
{
return ucwords(strtolower($str));
}
else if ($firstCharType == 2) // lowercase, can just ucwords() it
{
return ucwords($str);
}
else // there were no letters at all in the string, just return it
{
return $str;
}
}

You could try the string case test function I posted here
function getStringCase($subject)
{
if (!empty($subject))
{
if (preg_match('/^[^A-Za-z]+$/', $subject))
return 0; // no alphabetic characters
else if (preg_match('/^[^A-Z]+$/', $subject))
return 1; // lowercase
else if (preg_match('/^[^a-z]+$/', $subject))
return 2; // uppercase
else
return 3; // mixed-case
}
else
{
return 0; // empty
}
}

If the reason you want to avoid fixing already mixed-case strings is for efficiency then you are likely wasting your time, convert every string no matter its current condition:
function fixCase($str)
{
return ucwords(strtolower($str));
}
I would be very surprised if it ran any slower than the accepted answer for strings the length of those you would generally want to title case, and it's one less condition you need to worry about.
If, however, there is good reason to avoid converting already mixed-case strings, for example you want to preserve some intended meaning in the casing, then yes, jcinacio's answer is certainly the simplest and very efficient.

Wouldn't it be easier to check if the string = lowercase(string) or string = uppercase(string) and if so then leave it. Otherwise perform your operation.

Well I decided to do a test of the 2 proposed answers thus far and my original solution. I wouldn't have thought the results would turn out this way, but I guess native methods are /that/ much faster over all.
Code:
function method1($str)
{
if (strcmp($str, strtolower($str)) == 0)
{
return ucwords($str);
}
else if (strcmp($str, strtoupper($str)) == 0)
{
return ucwords(strtolower($str));
}
else
{
return $str;
}
}
// returns 0 if non-alphabetic char, 1 if uppercase, 2 if lowercase
function getCharType($char)
{
if ($char >= 'A' && $char <= 'Z')
{
return 1;
}
else if ($char >= 'a' && $char <= 'z')
{
return 2;
}
else
{
return 0;
}
}
function method2($str)
{
for ($i = 0; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != 0)
{
$firstCharType = $charType;
break;
}
}
for ($i = $i + 1; $i < strlen($str); $i++)
{
$charType = getCharType($str[$i]);
if ($charType != $firstCharType && $charType != 0)
{
return $str;
}
}
if ($firstCharType == 1) // uppercase, need to convert to lower first
{
return ucwords(strtolower($str));
}
else if ($firstCharType == 2) // lowercase, can just ucwords() it
{
return ucwords($str);
}
else // there were no letters at all in the string, just return it
{
return $str;
}
}
function method0($str)
{
$uc = 0;
$lc = 0;
for($i=0;$i<strlen($str);$i++)
{
if ($str[$i] >= 'a' && $str[$i] <= 'z')
$lc++;
else if ($str[$i] >= 'A' && $str[$i] <= 'Z')
$uc++;
}
if ($uc == 0 || $lc == 0)
{
return ucwords(strtolower($str));
}
}
function test($func,$s)
{
$start = gettimeofday(true);
for($i = 0; $i < 1000000; $i++)
{
$s4 = $func($s);
}
$end = gettimeofday(true);
echo "$func Time: " . ($end-$start) . " - Avg: ".sprintf("%.09f",(($end-$start)/1000000))."\n";
}
$s1 = "first String";
$s2 = "second string";
$s3 = "THIRD STRING";
test("method0",$s1);
test("method0",$s2);
test("method0",$s3);
test("method1",$s1);
test("method1",$s2);
test("method1",$s3);
test("method2",$s1);
test("method2",$s2);
test("method2",$s3);
Results:
method0 Time: 19.2899270058 - Avg: 0.000019290
method0 Time: 20.8679389954 - Avg: 0.000020868
method0 Time: 24.8917310238 - Avg: 0.00002489
method1 Time: 3.07466816902 - Avg: 0.000003075
method1 Time: 2.52559089661 - Avg: 0.000002526
method1 Time: 4.06261897087 - Avg: 0.000004063
method2 Time: 19.2718701363 - Avg: 0.000019272
method2 Time: 35.2485661507 - Avg: 0.000035249
method2 Time: 29.3357679844 - Avg: 0.000029336

Note that anything that looks only at [A-Z] will be incorrect as soon as there are accented or umlaut characters. Optimizing for speed is meaningless if the result is incorrect (hey, if the result doesn't have to be correct, it can write you a REALLY fast implementation...)

Related

Generate List of Unique Four-Digit Numbers Without Repeating Digits and Without Forward-Sequential Digits

I had a need to generate a list of four-digit numbers for use as codes. The digits should not repeat, and each next digit should not be sequential. There were some questions that were similar but not enough for me to answer. I chose to share my function instead. It did not matter if reverse numbers were in the list e.g. 1357 > 7531.
It occurred to me that it there may be an opportunity for a recursive function, possibly to return five or six-digit numbers. Improvements to my function are most welcome.
public function codeList() {
$data = [];
for ($ii=0; $ii < 10; $ii++) {
for ($jj=0; $jj < 10; $jj++) {
for ($kk=0; $kk < 10; $kk++) {
for ($ll=0; $ll < 10; $ll++) {
$str = "{$ii}{$jj}{$kk}{$ll}";
$arr = str_split($str);
if (count($arr) === count(array_unique($arr))) {
if (($arr[0] + 1 != $arr[1]) && ($arr[1] + 1 != $arr[2]) && ($arr[2] + 1 != $arr[3])) {
$data[] = $str;
}
}
}
}
}
}
return $data;
} # END FUNCTION codeList

converting hexadecimal to decimal php

I have to convert hexadecimal to decimal with PHP (without using hexdec) for my homework, but my code does not convert properly.
For example, when I use the function HexToDez ("1F4");, the answer should be 500, not 1.
Why is it not working?
the code
<?php
function Replace ($i)
{
switch (strToLower ($i))
{
case "a" : return 10;
case "b" : return 11;
case "c" : return 12;
case "d" : return 13;
case "e" : return 14;
case "f" : return 15;
default : return $i;
}
}
function HexToDez($i) # 1F4
{
$input=$i;
$num=strlen ($input) ;
$pos=0;
$output="";
$hochzahl="";
while($pos<$num)
{
$mid = substr ($input, $pos, 1);
$pos++;
return $end=Replace ($mid);
}
while ($end != 0){
$zahl = $input%10;
$output += $zahl*pow(16, $hochzahl);
$end = $end/10;
$hochzahl++;
}
echo $output;
}
?>
here is "classic" algorithm for you to consider, check the comments:
function HexToDez($s) {
$output = 0;
for ($i=0; $i<strlen($s); $i++) {
$c = $s[$i]; // you don't need substr to get 1 symbol from string
if ( ($c >= '0') && ($c <= '9') )
$output = $output*16 + ord($c) - ord('0'); // two things: 1. multiple by 16 2. convert digit character to integer
elseif ( ($c >= 'A') && ($c <= 'F') ) // care about upper case
$output = $output*16 + ord($s[$i]) - ord('A') + 10; // note that we're adding 10
elseif ( ($c >= 'a') && ($c <= 'f') ) // care about lower case
$output = $output*16 + ord($c) - ord('a') + 10;
}
return $output;
}
echo HexToDez("1F4"); // outputs 500
also, you can use intval function to do the same, just convert your number into hex representation, like 0x###
function HexToDez($s) {
return intval('0x'.$s, 16);
}

Unable to read blank spaces in string

What's wrong with this code. I want to read the number of blank spaces without using any built in function, but it wont return or read the blank spaces:
$string = "can you look into this??";
$i = 0;
$breakPoints = 0;
while ($string[$i] != '' & $string[$i + 1] != '') {
if ($string[$i] == "" || empty($string[$i])) {
die("cdsd");
$breakposition = $string[$i];
$breakPoints++;
} else {
print_r($string[$i]);
}
$i++;
}
echo($breakPoints);
It's always going into the else part and never goes into the if statement. I even tried using isset() but that also didn't work. Where am I making a mistake?
Just loop while the string offset isset() and check if it equals a space. No need to do anything with $i+1:
$string = "can you look into this??";
$i = 0;
$breakPoints = 0;
while (isset($string[$i])) {
if ($string[$i] == " ") {
$breakposition = $string[$i];
$breakPoints++;
} else {
print_r($string[$i]);
}
$i++;
}
echo($breakPoints);
This outputs:
canyoulookintothis??4
Once you've got your code right, you will always run into an string index error and you will need the isset() built in function to check before performing operations.
In other words, the i for the index will eventually point beyond the last letter of the string, this will cause a PHP error. You can use isset() to check for it and break out of the loop. Example:
$string = "can you look into this??";
$i = 0;
$breakPoints = 0;
while (isset($string[$i])) {
if ($string[$i] == " ") {
$breakPoints++;
} else {
if($string[$i] != ''){
print_r($string[$i]);
}
}
$i++;
}
echo("<br />Number of spaces: ".$breakPoints
spaces is not empty, it will tack size.
so use this
$string = "can you look into this??";
$i = 0;
$breakPoints = 0;
while ($string[$i] != '' & $string[$i + 1] != '') {
if ($string[$i] == " ") {
echo " ";
$breakposition = $string[$i];
$breakPoints++;
} else {
print_r($string[$i]);
}
$i++;
}
echo($breakPoints);
DEMO
or try this code,
use preg_match_all.
$matches = " ";
$numSpaces = preg_match_all('/[ ]/', $string , $matches);
or Use this::
substr_count($string , ' ');

PHP - Check if a string is a rotation of another string

Need to write a code block which check is one string is a rotation of another.
Looked at loads of posts on here and it is all in Java or C++ but I need to do it in PHP.
I have tried a few different things, trying to work from the C++ and Java examples but I am not having any luck, here is my current code:
<?php
function isSubstring($s1, $s2) {
if(strlen($s1) != strlen($s2)) {
return false;
}
if(WHAT TO PUT HERE) {
echo "it is!";
} else {
echo "nope";
}
}
isSubstring("hello", "helol");
?>
Many ways available. Here one more using built-in function count_chars on both strings, and then comparing both resulting arrays :
function isSubstring($s1, $s2) {
if (strlen($s1) != strlen($s2)) {
echo "nope";
return;
}
$s1cnt = count_chars($s1, 1);
$s2cnt = count_chars($s2, 1);
if($s1cnt === $s2cnt) {
echo "it is!";
} else {
echo "nope";
}
}
Edit : as MonkeyZeus pointed out, beware of comparison with multibyte characters. It may bite a little bit :
isSubstring('crढap', 'paࢤrc');
will give true as answer. ढ is UTF-8 indian devanagari three byte char : E0 A2 A4 and ࢤ is also three byte chars (arabic) : E0 A4 A2, and the count_chars function counts the individual bytes. So it would be safe to use if chars are from only one language, else get some headache pills...
It seems to me that to manage this kind of things we need to have chars that are made of 3 bytes.
I would go for something like this:
function isSubstring($s1, $s2)
{
// If the strings match exactly then no need to proceed
if($s1 === $s2)
{
echo "it is!";
return;
}
elseif(strlen($s1) !== strlen($s2))
{
// Strings must be of equal length or else no need to proceed
echo "nope";
return;
}
// Put each character into an array
$s1 = str_split($s1);
$s2 = str_split($s2);
// Sort alphabetically based on value
sort($s1);
sort($s2);
// Triple check the arrays against one-another
if($s1 === $s2)
{
echo "it is!";
}
else
{
echo "nope";
}
}
Here is a multibyte safe function to compare the two strings:
function mb_isAnagram($s1, $s2) {
if (strlen($s1) != strlen($s2)) {
return false;
} else {
$c1 = preg_split('//u', $s1, null, PREG_SPLIT_NO_EMPTY);
$c2 = preg_split('//u', $s2, null, PREG_SPLIT_NO_EMPTY);
sort($c1);
sort($c2);
if ($c1 === $c2) {
return true;
} else {
return false;
}
}
}
You could split each string and sort it, like this:
$split1 = unpack("C*",$s1);
asort($split1);
Then you can traverse both arrays comparing the values.
<?php
function isRotationalString($str1,$str2){
$len = strlen($str1);
if($str1 === $str2){
return true;
}else{
if($len == strlen($str2)){
$flag = true;
for($i=0;$i<$len;$i++){
if($str1[0]==$str2[$i]){
$tst = $i;$start = true;break;
}
}
if($start){
for($j=0;$j<$len;$j++){
$m = $j+$tst;
if($m < $len){
if($str1[$j] != $str2[$m]){
$flag = false;break;
}
}else{
if($m>=$len)
{
$k = $m - $len;
if($str1[$j] != $str2[$k]){
$flag = false;break;
}
}
}
}
}else{
$flag = false;
}
return $flag;
}
}
}
echo isRotationalString("abcd","bcda")?'It is':'It is not';
?>
above script will check whether a string is a rotation of another string or not?
isRotationalString("abcd","bcda") => It is
isRotationalString("abcd","cbda") => It is Not
This is the function for string rotation.
echo isRotationalString("abcdef","efabcd")?'It is':'It is not';
function isRotationalString($str1,$str2){
$len = strlen($str1);
if($str1 === $str2){
return true;
} else {
if($len == strlen($str2)) {
$stringMatchedArr1 = $stringMatchedArr2 = [];
for($i=0; $i<$len; $i++) {
$substr = substr($str1,$i );
$pos = strpos($str2, $substr);
if($pos !== false) {
$stringMatchedArr1[] = $substr;
}
}
for($j=1; $j <= $len; $j++) {
$substr = substr($str1, 0, $j );
$pos = strpos($str2, $substr);
if($pos !== false) {
$stringMatchedArr2[] = $substr;
}
}
foreach($stringMatchedArr2 as $string1) {
foreach($stringMatchedArr1 as $string2) {
if($string1.$string2 == $str1)
return true;
}
}
}
}
}
I would sort the characters in the strings by making it an array and then imploding them to a string again.
if (sort(str_split($s1)) == sort(str_split($s2))) {
That would do the trick in one line.
Edit: Thanks Don't Panic, edited my answer!

Rewrite a large number of for loops into something shorter

I have the following code:
for($a=1; $a<strlen($string); $a++){
for($b=1; $a+$b<strlen($string); $b++){
for($c=1; $a+$b+$c<strlen($string); $c++){
for($d=1; $a+$b+$c+$d<strlen($string); $d++){
$tempString = substr_replace($string, ".", $a, 0);
$tempString = substr_replace($tempString, ".", $a+$b+1, 0);
$tempString = substr_replace($tempString, ".", $a+$b+$c+2, 0);
$tempString = substr_replace($tempString, ".", $a+$b+$c+$d+3, 0);
echo $tempString."</br>";
}
}
}
}
What it does is to make all possible combinatons of a string with several dots.
Example:
t.est123
te.st123
tes.t123
...
test12.3
Then, I add one more dot:
t.e.st123
t.es.t123
...
test1.2.3
Doing the way I'm doing now, I need to create lots and lots of for loops, each for a determined number of dots. I don't know how I can turn that example into a functon or other easier way of doing this.
Your problem is a combination problem. Note: I'm not a math freak, I only researched this information because of interest.
http://en.wikipedia.org/wiki/Combination#Number_of_k-combinations
Also known as n choose k. The Binomial coefficient is a function which gives you the number of combinations.
A function I found here: Calculate value of n choose k
function choose($n, $k) {
if ($k == 0) {return 1;}
return($n * choose($n - 1, $k - 1)) / $k;
}
// 6 positions between characters (test123), 4 dots
echo choose(6, 4); // 15 combinations
To get all combinations you also have to choose between different algorithms.
Good post: https://stackoverflow.com/a/127856/1948627
UPDATE:
I found a site with an algorithm in different programming languages. (But not PHP)
I've converted it to PHP:
function bitprint($u){
$s= [];
for($n= 0;$u > 0;++$n, $u>>= 1) {
if(($u & 1) > 0) $s[] = $n;
}
return $s;
}
function bitcount($u){
for($n= 0;$u > 0;++$n, $u&= ($u - 1));
return $n;
}
function comb($c, $n){
$s= [];
for($u= 0;$u < 1 << $n;$u++) {
if(bitcount($u) == $c) $s[] = bitprint($u);
}
return $s;
}
echo '<pre>';
print_r(comb(4, 6));
It outputs an array with all combinations (positions between the chars).
The next step is to replace the string with the dots:
$string = 'test123';
$sign = '.';
$combs = comb(4, 6);
// get all combinations (Th3lmuu90)
/*
$combs = [];
for($i=0; $i<strlen($string); $i++){
$combs = array_merge($combs, comb($i, strlen($string)-1));
}
*/
foreach ($combs as $comb) {
$a = $string;
for ($i = count($comb) - 1; $i >= 0; $i--) {
$a = substr_replace($a, $sign, $comb[$i] + 1, 0);
}
echo $a.'<br>';
}
// output:
t.e.s.t.123
t.e.s.t1.23
t.e.st.1.23
t.es.t.1.23
te.s.t.1.23
t.e.s.t12.3
t.e.st.12.3
t.es.t.12.3
te.s.t.12.3
t.e.st1.2.3
t.es.t1.2.3
te.s.t1.2.3
t.est.1.2.3
te.st.1.2.3
tes.t.1.2.3
This is quite an unusual question, but I can't help but try to wrap around what you are tying to do. My guess is that you want to see how many combinations of a string there are with a dot moving between characters, finally coming to rest right before the last character.
My understanding is you want a count and a printout of string similar to what you see here:
t.est
te.st
tes.t
t.es.t
te.s.t
t.e.s.t
count: 6
To facilitate this functionality I came up with a class, this way you could port it to other parts of code and it can handle multiple strings. The caveat here is the strings must be at least two characters and not contain a period. Here is the code for the class:
class DotCombos
{
public $combos;
private function combos($string)
{
$rebuilt = "";
$characters = str_split($string);
foreach($characters as $index => $char) {
if($index == 0 || $index == count($characters)) {
continue;
} else if(isset($characters[$index]) && $characters[$index] == ".") {
break;
} else {
$rebuilt = substr($string, 0, $index) . "." . substr($string, $index);
print("$rebuilt\n");
$this->combos++;
}
}
return $rebuilt;
}
public function allCombos($string)
{
if(strlen($string) < 2) {
return null;
}
$this->combos = 0;
for($i = 0; $i < count(str_split($string)) - 1; $i++) {
$string = $this->combos($string);
}
}
}
To make use of the class you would do this:
$combos = new DotCombos();
$combos->allCombos("test123");
print("Count: $combos->combos");
The output would be:
t.est123
te.st123
tes.t123
test.123
test1.23
test12.3
t.est12.3
te.st12.3
tes.t12.3
test.12.3
test1.2.3
t.est1.2.3
te.st1.2.3
tes.t1.2.3
test.1.2.3
t.est.1.2.3
te.st.1.2.3
tes.t.1.2.3
t.es.t.1.2.3
te.s.t.1.2.3
t.e.s.t.1.2.3
Count: 21
Hope that is what you are looking for (or at least helps)....

Categories