CamelCase to LowerDashCase - Without Regex - php

I have the following functions:
function dashesToCamelCase($string)
{
return str_replace(' ', '', ucwords(str_replace('-', ' ', $string)));
}
function camelCaseToDashes($string) {
//This method should not have Regex
$string = preg_replace('/\B([A-Z])/', '-$1', $string);
return strtolower($string);
}
And here is a Test:
$testArray = ['UserProfile', 'UserSettings', 'Settings', 'SuperLongString'];
foreach ($testArray as $testData) {
$dashed = camelCaseToDashes($testData);
$orignal = dashesToCamelCase($dashed);
echo '<pre>' . $dashed . ' | ' . $orignal . '</pre>';
}
This is the expected output:
user-profile | UserProfile
user-settings | UserSettings
settings | Settings
super-long-string | SuperLongString
Now my Question: The method camelCaseToDashes now is using Regex. Can you imagine a better (faster) implementation without Regex?

You should really first check a profiler graph before optimizing the wrong thing.
Visualize the actual execution time your preg_replace takes.
Replacing a regex with various PHP string function workarounds is not usually an optimization.

Try this:
$text = 'CamelCaseString';
$result = '';
for ($i = 0; $i < strlen($text); $i++) {
$result .= $i > 0 && ord($text[$i]) >= 64 && ord($text[$i]) <= 90 ? '-'.$text[$i] : $text[$i];
}
var_dump(strtolower($result));
You can create a benchmark to see performance of each solution.
Actually, according to my benchmark, solution with regular expression seems to be roughly 10× faster:
$ php test2.php
regex: 0.41732287406921
without regex: 3.5226600170135
I guess, that regular expression would be better, because it's implemented in C. My algorithm is implemented in PHP, which has significantly worse performance. With few optimizations I was able to improve time of 100,000 iterations of my algorithm to 2.5 second (compare to 0.4 second of 100,000 iterations of regexp).
Improved version:
$text = 'CamelCaseString';
$result = '';
$lenght = strlen($text);
for ($i = 0; $i < $lenght; $i++) {
$ord = ($text[$i]);
$result .= $i > 0 && $ord >= 64 && $ord <= 90 ? '-'.$text[$i] : $text[$i];
}
$ php test2.php
regex: 0.40657687187195
without regex: 2.5361099243164
Also interesting thing:
As you can see on a profile of your function, function preg_replace takes only 12 % of total time, strtolower takes under 1 %. There is no other code in the function regex. But it's possible that this is overhead of Xdebug. Profile was visualized by qCacheGrind.

If you want a ridiculous gain try this:
function dashesToCamelCase2($string) {
return strtr(ucwords(strtr($string, '-', ' ')), ' ', '');
}
function camelCaseToDashes2($string) {
return strtolower(preg_replace('/(?=[A-Z])\B/', '-', $string));
}

Related

PHP : non-preg_match version of: preg_match("/[^a-z0-9]/i", $a, $match)?

Supposedly string is:
$a = "abc-def"
if (preg_match("/[^a-z0-9]/i", $a, $m)){
$i = "i stopped scanning '$a' because I found a violation in it while
scanning it from left to right. The violation was: $m[0]";
}
echo $i;
example above: should indicate "-" was the violation.
I would like to know if there is a non-preg_match way of doing this.
I will likely run benchmarks if there is a non-preg_match way of doing this perhaps 1000 or 1 million runs, to see which is faster and more efficient.
In the benchmarks "$a" will be much longer.
To ensure it is not trying to scan the entire "$a" and to ensure it stops soon as it detects a violation within the "$a"
Based on information I have witnessed on the internet, preg_match stops when the first match is found.
UPDATE:
this is based on the answer that was given by "bishop" and will likely to be chosen as the valid answer soon ( shortly ).
i modified it a little bit because i only want it to report the violator character. but i also commented that line out so benchmark can run without entanglements.
let's run a 1 million run based on that answer.
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if ($validLen < strlen($input)){
#echo "violation at: ". substr($input, $validLen,1);
}
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
the result is: 0.606614112854
( 60 percent of a second )
let's do it with the preg_match method.
i hope everything is the same. ( and fair )..
( i say this because there is the ^ character in the preg_match )
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$input = 'abc-def';
preg_match("/[^a-z0-9]/i", $input, $m);
#echo "violation at:". $m[0];
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
i use "dif" in reference to the terminology "difference".
the "dif" was.. 1.1145210266113
( took 11 percent more than a whole second )
( if it was 1.2 that would mean it is 2x slower than the php way )
You want to find the location of the first character not in the given range, without using regular expressions? You might want strspn or its complement strcspn:
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if (strlen($input) !== $validLen) {
printf('Input invalid, starting at %s', substr($input, $validLen));
} else {
echo 'Input is valid';
}
Outputs Input invalid, starting at -def. See it live.
strspn (and its complement) are very old, very well specified (POSIX even). The standard implementations are optimized for this task. PHP just leverages that platform implementation, so PHP should be fast, too.

What is the backend difference between "filter_var" and "preg_replace" in PHP?

I have numbers coming out of a database (very controlled input) that will have underscores before and after them. They are stored like this:
_51_ _356_
They will not be stored in any other format, but there will be times where I need to get just the numbers out of them. I have chosen to use either
$x = filter_var($myNumber, FILTER_SANITIZE_NUMBER_INT);
or
$y = preg_replace("/[^0-9]/","",$myNumber);
I am not sure of the nuances between the 2 in the backend, but they both produce exactly what I need (I think so, anyway), so it doesn't matter to me which I use. What are the pros and cons of using each of these options? (For example, does one use an array or other weird thing that I might need to know about? One uses way too many resources?)
Well, there isn't big difference in your case. I think preg_replace is more expensive in resource, since it had to parse the regex pattern.
Alternatively you can use trim:
echo trim('_12_', '_');
It will remove the '_' in both side, I think this is the most readable manner to do.
Filters don't use regular expressions, but work in a similar way: iterate a string char-by-char and remove characters that don't match the pattern:
for (i = 0; i < Z_STRLEN_P(value); i++) {
if ((*map)[str[i]]) {
buf[c] = str[i];
++c;
}
}
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#filter_map_apply
and the FILTER_SANITIZE_NUMBER_INT is defined as [^0-9+-]:
/* strip everything [^0-9+-] */
const unsigned char allowed_list[] = "+-" DIGIT;
filter_map map;
filter_map_init(&map);
filter_map_update(&map, 1, allowed_list);
filter_map_apply(value, &map);
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#php_filter_number_int
Of course, [^0-9+-] is not a right expression to filter integer numbers, therefore be prepared for surprises:
$x = filter_var("+++123---", FILTER_SANITIZE_NUMBER_INT);
var_dump($x); // WTF?
My suggestion is to stick to regular expressions: they are explicit and far less buggy than filters.
I wanted to try some various methods for this, so set up the following benchmark. It looks like for your case, trim is definitely the best option as it only has to look at the beginning and end of the string instead of each character. Here are my test results on 10,000,000 random integers surrounded by underscores running PHP 7.0.18.
preg_replace: 1.9469740390778 seconds.
filter_var: 1.6922700405121 seconds.
str_replace: 0.72129797935486 seconds.
trim: 0.37275195121765 seconds.
And here is my code if anyone wants to run similar tests:
<?php
$ints = array();//array_fill(0, 10000000, '_1029384756_');
for($i = 0; $i < 10000000; $i++) {
$ints[] = '_'.mt_rand().'_';
}
$start = microtime(true);
foreach($ints as $v) {
preg_replace('/[^0-9]/', '', $v);
}
$end = microtime(true);
echo 'preg_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
filter_var($v, FILTER_SANITIZE_NUMBER_INT);
}
$end = microtime(true);
echo 'filter_var in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
str_replace('_', '', $v);
}
$end = microtime(true);
echo 'str_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
trim($v, '_');
}
$end = microtime(true);
echo 'trim in '.($end-$start).' seconds.',PHP_EOL;

Count consecutive occurence of specific, identical characters in a string - PHP

I am trying to calculate a few 'streaks', specifically the highest number of wins and losses in a row, but also most occurences of games without a win, games without a loss.
I have a string that looks like this; 'WWWDDWWWLLWLLLL'
For this I need to be able to return:
Longest consecutive run of W charector (i will then replicate for L)
Longest consecutive run without W charector (i will then replicate for L)
I have found and adapted the following which will go through my array and tell me the longest sequence, but I can't seem to adapt it to meet the criteria above.
All help and learning greatly appreciated :)
function getLongestSequence($sequence){
$sl = strlen($sequence);
$longest = 0;
for($i = 0; $i < $sl; )
{
$substr = substr($sequence, $i);
$len = strspn($substr, $substr{0});if($len > $longest)
$longest = $len;
$i += $len;
}
return $longest;
}
echo getLongestSequence($sequence);
You can use a regular expression to detect sequences of identical characters:
$string = 'WWWDDWWWLLWLLLL';
// The regex matches any character -> . in a capture group ()
// plus as much identical characters as possible following it -> \1+
$pattern = '/(.)\1+/';
preg_match_all($pattern, $string, $m);
// sort by their length
usort($m[0], function($a, $b) {
return (strlen($a) < strlen($b)) ? 1 : -1;
});
echo "Longest sequence: " . $m[0][0] . PHP_EOL;
You can achieve the maximum count of consecutive character in a particular string using the below code.
$string = "WWWDDWWWLLWLLLL";
function getLongestSequence($str,$c) {
$len = strlen($str);
$maximum=0;
$count=0;
for($i=0;$i<$len;$i++){
if(substr($str,$i,1)==$c){
$count++;
if($count>$maximum) $maximum=$count;
}else $count=0;
}
return $maximum;
}
$match="W";//change to L for lost count D for draw count
echo getLongestSequence($string,$match);

Performance of tokenizing CSS in PHP

This is a noob question from someone who hasn't written a parser/lexer ever before.
I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft).
It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings.
My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project.
Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into?
I made some optimizations, like checking for '#', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts.
My code (snippet) so far:
$TOKENS = array(
'IDENT' => '...regexp...',
'ATKEYWORD' => '#...regexp...',
'String' => '"...regexp..."|\'...regexp...\'',
//...
);
$string = '...CSS source string...';
$stream = array();
// we reduce $string token by token
while ($string != '') {
$string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the
// start is insignificant but doing a trim reduces exec time by 25%
$matches = array();
// loop through all possible tokens
foreach ($TOKENS as $t => $p) {
// The '&' is used as delimiter, because it isn't used anywhere in
// the token regexps
if (preg_match('&^'.$p.'&Su', $string, $matches)) {
$stream[] = array($t, $matches[0]);
$string = substr($string, strlen($matches[0]));
// Yay! We found one that matches!
continue 2;
}
}
// if we come here, we have a syntax error and handle it somehow
}
// result: an array $stream consisting of arrays with
// 0 => type of token
// 1 => token content
Use a lexer generator.
The first thing I would do would be to get rid of the preg_match(). Basic string functions such as strpos() are much faster, but I don't think you even need that. It looks like you are looking for a specific token at the front of a string with preg_match(), and then simply taking the front length of that string as a substring. You could easily accomplish this with a simple substr() instead, like this:
foreach ($TOKENS as $t => $p)
{
$front = substr($string,0,strlen($p));
$len = strlen($p); //this could be pre-stored in $TOKENS
if ($front == $p) {
$stream[] = array($t, $string);
$string = substr($string, $len);
// Yay! We found one that matches!
continue 2;
}
}
You could further optimize that by pre-calculating the length of all your tokens and storing them in the $TOKENS array, so that you don't have to call strlen() all the time. If you sorted $TOKENS into groups by length, you could reduce the number of substr() calls further as well, as you could take a substr($string) of the current string being analyzed just once for each token length, and run through all the tokens of that length before moving on to the next group of tokens.
the (probably) faster (but less memory friendly) approach would be to tokenize the whole stream at once, using one big regexp with alternatives for each token, like
preg_match_all('/
(...string...)
|
(#ident)
|
(#ident)
...etc
/x', $stream, $tokens);
foreach($tokens as $token)...parse
Don't use regexp, scan character by character.
$tokens = array();
$string = "...code...";
$length = strlen($string);
$i = 0;
while ($i < $length) {
$buf = '';
$char = $string[$i];
if ($char <= ord('Z') && $char >= ord('A') || $char >= ord('a') && $char <= ord('z') || $char == ord('_') || $char == ord('-')) {
while ($char <= ord('Z') && $char >= ord('A') || $char >= ord('a') && $char <= ord('z') || $char == ord('_') || $char == ord('-')) {
// identifier
$buf .= $char;
$char = $string[$i]; $i ++;
}
$tokens[] = array('IDENT', $buf);
} else if (......) {
// ......
}
}
However, that makes the code unmaintainable, therefore, a parser generator is better.
It's an old post but still contributing my 2 cents on this.
one thing that seriously slows down the original code in the question is the following line :
$string = substr($string, strlen($matches[0]));
instead of working on the entire string, take just a part of it (say 50 chars) which are enough for all the possible regexes. then, apply the same line of code on it. when this string shrinks below a preset length, load some more data to it.

Swap every pair of characters in string

I'd like to get all the permutations of swapped characters pairs of a string. For example:
Base string: abcd
Combinations:
bacd
acbd
abdc
etc.
Edit
I want to swap only letters that are next to each other. Like first with second, second with third, but not third with sixth.
What's the best way to do this?
Edit
Just for fun: there are three or four solutions, could somebody post a speed test of those so we could compare which is fastest?
Speed test
I made speed test of nickf's code and mine, and results are that mine is beating the nickf's at four letters (0.08 and 0.06 for 10K times) but nickf's is beating it at 10 letters (nick's 0.24 and mine 0.37)
Edit: Markdown hates me today...
$input = "abcd";
$len = strlen($input);
$output = array();
for ($i = 0; $i < $len - 1; ++$i) {
$output[] = substr($input, 0, $i)
. substr($input, $i + 1, 1)
. substr($input, $i, 1)
. substr($input, $i + 2);
}
print_r($output);
nickf made beautiful solution thank you , i came up with less beautiful:
$arr=array(0=>'a',1=>'b',2=>'c',3=>'d');
for($i=0;$i<count($arr)-1;$i++){
$swapped="";
//Make normal before swapped
for($z=0;$z<$i;$z++){
$swapped.=$arr[$z];
}
//Create swapped
$i1=$i+1;
$swapped.=$arr[$i1].$arr[$i];
//Make normal after swapped.
for($y=$z+2;$y<count($arr);$y++){
$swapped.=$arr[$y];
}
$arrayswapped[$i]=$swapped;
}
var_dump($arrayswapped);
A fast search in google gave me that:
http://cogo.wordpress.com/2008/01/08/string-permutation-in-php/
How about just using the following:
function swap($s, $i)
{
$t = $s[$i];
$s[$i] = $s[$i+1];
$s[$i+1] = $t;
return $s;
}
$s = "abcd";
$l = strlen($s);
for ($i=0; $i<$l-1; ++$i)
{
print swap($s,$i)."\n";
}
Here is a slightly faster solution as its not overusing substr().
function swapcharpairs($input = "abcd") {
$pre = "";
$a="";
$b = $input[0];
$post = substr($input, 1);
while($post!='') {
$pre.=$a;
$a=$b;
$b=$post[0];
$post=substr($post,1);
$swaps[] = $pre.$b.$a.$post;
};
return $swaps;
}
print_R(swapcharpairs());

Categories