I'm using PHP, and am hoping to be able to create a regex that finds and returns the street number portion of an address.
Example:
1234- South Blvd. Washington D.C., APT #306, ZIP45234
In the above example, only 1234 would be returned.
Seems like this should be incredibly simple, but I've yet to be successful. Any help would be greatly appreciated.
Try this:
$str = "1234- South Blvd. Washington D.C., APT #306, ZIP4523";
preg_match("~^(\d+)~", $str, $m);
var_dump($m[1]);
OUTPUT:
string(4) "1234"
I know you requested regex but it may be more efficient to do this without (I haven't done benchmarks yet). Here is a function that you might find useful:
function removeStartInt(&$str)
{
$num = '';
$strLen = strlen($str);
for ($i = 0; $i < $strLen; $i++)
{
if (ctype_digit($str[$i]))
$num .= $str[$i];
else
break;
}
if ($num === '')
return null;
$str = substr($str, strlen($num));
return intval($num);
}
It also removes the number from the string. If you do not want that, simply change (&$str) to ($str) and remove the line: $str = substr($str, strlen($num));.
Related
I am trying to calculate a few 'streaks', specifically the highest number of wins and losses in a row, but also most occurences of games without a win, games without a loss.
I have a string that looks like this; 'WWWDDWWWLLWLLLL'
For this I need to be able to return:
Longest consecutive run of W charector (i will then replicate for L)
Longest consecutive run without W charector (i will then replicate for L)
I have found and adapted the following which will go through my array and tell me the longest sequence, but I can't seem to adapt it to meet the criteria above.
All help and learning greatly appreciated :)
function getLongestSequence($sequence){
$sl = strlen($sequence);
$longest = 0;
for($i = 0; $i < $sl; )
{
$substr = substr($sequence, $i);
$len = strspn($substr, $substr{0});if($len > $longest)
$longest = $len;
$i += $len;
}
return $longest;
}
echo getLongestSequence($sequence);
You can use a regular expression to detect sequences of identical characters:
$string = 'WWWDDWWWLLWLLLL';
// The regex matches any character -> . in a capture group ()
// plus as much identical characters as possible following it -> \1+
$pattern = '/(.)\1+/';
preg_match_all($pattern, $string, $m);
// sort by their length
usort($m[0], function($a, $b) {
return (strlen($a) < strlen($b)) ? 1 : -1;
});
echo "Longest sequence: " . $m[0][0] . PHP_EOL;
You can achieve the maximum count of consecutive character in a particular string using the below code.
$string = "WWWDDWWWLLWLLLL";
function getLongestSequence($str,$c) {
$len = strlen($str);
$maximum=0;
$count=0;
for($i=0;$i<$len;$i++){
if(substr($str,$i,1)==$c){
$count++;
if($count>$maximum) $maximum=$count;
}else $count=0;
}
return $maximum;
}
$match="W";//change to L for lost count D for draw count
echo getLongestSequence($string,$match);
I have the following functions:
function dashesToCamelCase($string)
{
return str_replace(' ', '', ucwords(str_replace('-', ' ', $string)));
}
function camelCaseToDashes($string) {
//This method should not have Regex
$string = preg_replace('/\B([A-Z])/', '-$1', $string);
return strtolower($string);
}
And here is a Test:
$testArray = ['UserProfile', 'UserSettings', 'Settings', 'SuperLongString'];
foreach ($testArray as $testData) {
$dashed = camelCaseToDashes($testData);
$orignal = dashesToCamelCase($dashed);
echo '<pre>' . $dashed . ' | ' . $orignal . '</pre>';
}
This is the expected output:
user-profile | UserProfile
user-settings | UserSettings
settings | Settings
super-long-string | SuperLongString
Now my Question: The method camelCaseToDashes now is using Regex. Can you imagine a better (faster) implementation without Regex?
You should really first check a profiler graph before optimizing the wrong thing.
Visualize the actual execution time your preg_replace takes.
Replacing a regex with various PHP string function workarounds is not usually an optimization.
Try this:
$text = 'CamelCaseString';
$result = '';
for ($i = 0; $i < strlen($text); $i++) {
$result .= $i > 0 && ord($text[$i]) >= 64 && ord($text[$i]) <= 90 ? '-'.$text[$i] : $text[$i];
}
var_dump(strtolower($result));
You can create a benchmark to see performance of each solution.
Actually, according to my benchmark, solution with regular expression seems to be roughly 10× faster:
$ php test2.php
regex: 0.41732287406921
without regex: 3.5226600170135
I guess, that regular expression would be better, because it's implemented in C. My algorithm is implemented in PHP, which has significantly worse performance. With few optimizations I was able to improve time of 100,000 iterations of my algorithm to 2.5 second (compare to 0.4 second of 100,000 iterations of regexp).
Improved version:
$text = 'CamelCaseString';
$result = '';
$lenght = strlen($text);
for ($i = 0; $i < $lenght; $i++) {
$ord = ($text[$i]);
$result .= $i > 0 && $ord >= 64 && $ord <= 90 ? '-'.$text[$i] : $text[$i];
}
$ php test2.php
regex: 0.40657687187195
without regex: 2.5361099243164
Also interesting thing:
As you can see on a profile of your function, function preg_replace takes only 12 % of total time, strtolower takes under 1 %. There is no other code in the function regex. But it's possible that this is overhead of Xdebug. Profile was visualized by qCacheGrind.
If you want a ridiculous gain try this:
function dashesToCamelCase2($string) {
return strtr(ucwords(strtr($string, '-', ' ')), ' ', '');
}
function camelCaseToDashes2($string) {
return strtolower(preg_replace('/(?=[A-Z])\B/', '-', $string));
}
I need to trim words from begining and end of string. Problem is, sometimes the words can be abbreviated ie. only first three letters (followed by dot).
I tried hard to find suitable regular expression. Basicaly I need to chatch three or more initial characters up to length of replacement, but I cannot find regular expression, that will match variable length and will keep order of characters.
For example, if I need to trim 'insurance' from sentence 'insur. companies are rich', then pattern \^[insurance]{3,9}\ comes to my mind, but this pattern will also catch words like 'sensace', because order of characters (and their occurance) inside [] is not important for regexp.
Also, at end of string, I need remove serial-numbers, that are abbreviated from beginig - say 'XK-25F14' is sometimes presented as '25F14'. So I decided to go purely with character by character comparison.
Therefore I end with following php function
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// Reverse $s and $dirt so it will trim from the end of string
$s = strrev($s);
if ($reverse)
return trimWords($s, strrev($dirt), $case_insensitive, false);
// After second run return back-reversed string
return trim($s, ' .-');
}
I'm happy with this function, but it has one drawback. It trims only one occurence of word. How to make it trim more occurances, i.e. remove both 'insurance ' from 'Insurance insur. companies'.
And I'm also curious, it realy does not exists such regular expression, that will match variable length and will respect order of characters in pattern?
Final solution
Thanks to mrhobo I have ended with function based on regular expression. This function can be easily improved and shall also be the most efficient for this task.
I have modified my previous function and it is two times quicker than regexp, but it can remove only one word per single run, so to be able to remove word from begin and end, it has to runs itself twice and performance is same as regexp and to remove more than one occurance of word, it has to runs itself multiple times, which will then be more and more slower.
The final function goes like this.
function trimWords($string, $word, $case_insensitive = false, $min_abbrv = 3)
{
$exc = substr($word, $min_abbrv);
$pat = null;
$i = strlen($exc);
while ($i--)
$pat = '(?>'.preg_quote($exc[$i], '#').$pat.')?';
$pat = substr($word, 0, $min_abbrv).$pat;
$pat = '#(?<begin>^)?(?:\W*\b'.$pat.'\b\W*)+(?(begin)|$)#';
if ($case_insensitive)
$pat .= 'i';
return preg_replace($pat, '', $string);
}
NOTE: with this function, it does not matter, if abbreviation ends with dot or not, it wipes out any shorter form of word and also removes all nonword characters around the word.
EDIT: I just tried create replace pattern like insu(r|ra|ran|ranc|rance) and function with atomic groups is faster by ~30% and with longer words it could be possibly even more efficient.
Matching a word and all possible abbreviations from the nth letter isn't quite an easy task in regex.
Here is how I would do it for the word insurance from the 4th letter:
insu(?>r(?>a(?>n(?>c(?>(?<last>e))?)?)?)?)?(?(last)|\.)
http://regex101.com/r/aL2gV4
It works by using atomic groups to force the regex engine as far as possible forward past the last 'rance' letters using the nested pattern (?>a(?>b)?)?. If the last letter letter is matched we're not dealing with an abbreviation thus no dot is required, otherwise the dot is required. This is coded by (?(last)|\.).
To trim, I would create a function to build the above regex for an abbreviation. Then you can write a while loop that replaces each of the abbreviation regexes with empty space until there are no more matches.
Non regex version
Here is my non regex version that removes multiple words and abbreviated words from a string:
function trimWords($str, $word, $min_abbrv, $case_insensitive = false) {
$len = 0;
$word_len = strlen($word);
$strlen = strlen($str);
$cmp = $case_insensitive ? strncasecmp : strncmp;
for ($i = 0; $i < $strlen; $i++) {
if ($cmp($str[$i], $word[$len], $i) == 0) {
$len++;
} else if ($len > 0) {
if ($len == $word_len || ($len >= $min_abbrv && ($dot = $str[$i] == '.'))) {
$i -= $len;
$len += $dot;
$str = substr($str, 0, $i) . substr($str, $i+$len);
$strlen = strlen($str);
$dot = 0;
}
$len = 0;
}
}
return $str;
}
Example:
$string = 'ins. <- "ins." / insu. insuranc. insurance / insurance. <- "."';
echo trimWords($string, 'insurance', 4);
Output is:
ins. <- "ins." / / . <- "."
I wrote function that constructs regular expression pattern according to mrhobo and also simple test and benchmarked it against my function with pure PHP string comparison.
Here is the code:
$string = 'Insur. companies are nasty rich';
$dirt = 'insurance';
$cycles = 500000;
$start = microtime(true);
$i = $cycles;
while ($i) {
$i--;
regexpStyle($string, $dirt, true);
}
$stop = microtime(true);
$i = $cycles;
while ($i) {
$i--;
trimWords($string, $dirt, true);
}
$end = microtime(true);
$res1 = $stop - $start;
$res2 = $end - $stop;
$winner = $res1 < $res2 ? '<<<' : '>>>';
echo 'regexp: '.$res1.' '.$winner.' string operations: '.$res2;
function trimWords($s, $dirt, $case_insensitive = false, $reverse = true)
{
$pos = 0;
$func = $case_insensitive ? 'strncasecmp' : 'strncmp';
// Get number of initial characters, that match in both strings
while ($func($s, $dirt, $pos + 1) === 0)
$pos++;
// If more than 2 initial characters match, then remove the match
if ($pos > 2)
$s = substr($s, $pos);
// After second run return back-reversed string
return trim($s, ' .-');
}
function regexpStyle($s, $dirt, $case_insensitive, $min_abbrev = 3)
{
$ss = substr($dirt, $min_abbrev);
$arr = str_split($ss);
$patt = '(?>(?<last>'.array_pop($arr).'))?';
$i = count($arr);
while ($i)
$patt = '(?>'.$arr[--$i].$patt.')?';
$patt = '#^'.substr($dirt, 0, $min_abbrev).$patt.'(?(last)|\.)#';
$patt .= $case_insensitive ? 'i' : null;
return trim(preg_replace($patt, '', $s));
}
and the winner is... moment of silence... it is...
a draw
regexp: 8.5169589519501 >>> string operations: 8.0951890945435
but I have strong feeling that regexp approach could be better utilized.
Pattern search within a string.
for eg.
$string = "111111110000";
FindOut($string);
Function should return 0
function FindOut($str){
$items = str_split($str, 3);
print_r($items);
}
If I understand you correctly, your problem comes down to finding out whether a substring of 3 characters occurs in a string twice without overlapping. This will get you the first occurence's position if it does:
function findPattern($string, $minlen=3) {
$max = strlen($string)-$minlen;
for($i=0;$i<=$max;$i++) {
$pattern = substr($string,$i,$minlen);
if(substr_count($string,$pattern)>1)
return $i;
}
return false;
}
Or am I missing something here?
What you have here can conceptually be solved with a sliding window. For your example, you have a sliding window of size 3.
For each character in the string, you take the substring of the current character and the next two characters as the current pattern. You then slide the window up one position, and check if the remainder of the string has what the current pattern contains. If it does, you return the current index. If not, you repeat.
Example:
1010101101
|-|
So, pattern = 101. Now, we advance the sliding window by one character:
1010101101
|-|
And see if the rest of the string has 101, checking every combination of 3 characters.
Conceptually, this should be all you need to solve this problem.
Edit: I really don't like when people just ask for code, but since this seemed to be an interesting problem, here is my implementation of the above algorithm, which allows for the window size to vary (instead of being fixed at 3, the function is only briefly tested and omits obvious error checking):
function findPattern( $str, $window_size = 3) {
// Start the index at 0 (beginning of the string)
$i = 0;
// while( (the current pattern in the window) is not empty / false)
while( ($current_pattern = substr( $str, $i, $window_size)) != false) {
$possible_matches = array();
// Get the combination of all possible matches from the remainder of the string
for( $j = 0; $j < $window_size; $j++) {
$possible_matches = array_merge( $possible_matches, str_split( substr( $str, $i + 1 + $j), $window_size));
}
// If the current pattern is in the possible matches, we found a duplicate, return the index of the first occurrence
if( in_array( $current_pattern, $possible_matches)) {
return $i;
}
// Otherwise, increment $i and grab a new window
$i++;
}
// No duplicates were found, return -1
return -1;
}
It should be noted that this certainly isn't the most efficient algorithm or implementation, but it should help clarify the problem and give a straightforward example on how to solve it.
Looks like you more want to use a sub-string function to walk along and check every three characters and not just break it into 3
function fp($s, $len = 3){
$max = strlen($s) - $len; //borrowed from lafor as it was a terrible oversight by me
$parts = array();
for($i=0; $i < $max; $i++){
$three = substr($s, $i, $len);
if(array_key_exists("$three",$parts)){
return $parts["$three"];
//if we've already seen it before then this is the first duplicate, we can return it
}
else{
$parts["$three"] = i; //save the index of the starting position.
}
}
return false; //if we get this far then we didn't find any duplicate strings
}
Based on the str_split documentation, calling str_split on "1010101101" will result in:
Array(
[0] => 101
[1] => 010
[2] => 110
[3] => 1
}
None of these will match each other.
You need to look at each 3-long slice of the string (starting at index 0, then index 1, and so on).
I suggest looking at substr, which you can use like this:
substr($input_string, $index, $length)
And it will get you the section of $input_string starting at $index of length $length.
quick and dirty implementation of such pattern search:
function findPattern($string){
$matches = 0;
$substrStart = 0;
while($matches < 2 && $substrStart+ 3 < strlen($string) && $pattern = substr($string, $substrStart++, 3)){
$matches = substr_count($string,$pattern);
}
if($matches < 2){
return null;
}
return $substrStart-1;
I'd like to get all the permutations of swapped characters pairs of a string. For example:
Base string: abcd
Combinations:
bacd
acbd
abdc
etc.
Edit
I want to swap only letters that are next to each other. Like first with second, second with third, but not third with sixth.
What's the best way to do this?
Edit
Just for fun: there are three or four solutions, could somebody post a speed test of those so we could compare which is fastest?
Speed test
I made speed test of nickf's code and mine, and results are that mine is beating the nickf's at four letters (0.08 and 0.06 for 10K times) but nickf's is beating it at 10 letters (nick's 0.24 and mine 0.37)
Edit: Markdown hates me today...
$input = "abcd";
$len = strlen($input);
$output = array();
for ($i = 0; $i < $len - 1; ++$i) {
$output[] = substr($input, 0, $i)
. substr($input, $i + 1, 1)
. substr($input, $i, 1)
. substr($input, $i + 2);
}
print_r($output);
nickf made beautiful solution thank you , i came up with less beautiful:
$arr=array(0=>'a',1=>'b',2=>'c',3=>'d');
for($i=0;$i<count($arr)-1;$i++){
$swapped="";
//Make normal before swapped
for($z=0;$z<$i;$z++){
$swapped.=$arr[$z];
}
//Create swapped
$i1=$i+1;
$swapped.=$arr[$i1].$arr[$i];
//Make normal after swapped.
for($y=$z+2;$y<count($arr);$y++){
$swapped.=$arr[$y];
}
$arrayswapped[$i]=$swapped;
}
var_dump($arrayswapped);
A fast search in google gave me that:
http://cogo.wordpress.com/2008/01/08/string-permutation-in-php/
How about just using the following:
function swap($s, $i)
{
$t = $s[$i];
$s[$i] = $s[$i+1];
$s[$i+1] = $t;
return $s;
}
$s = "abcd";
$l = strlen($s);
for ($i=0; $i<$l-1; ++$i)
{
print swap($s,$i)."\n";
}
Here is a slightly faster solution as its not overusing substr().
function swapcharpairs($input = "abcd") {
$pre = "";
$a="";
$b = $input[0];
$post = substr($input, 1);
while($post!='') {
$pre.=$a;
$a=$b;
$b=$post[0];
$post=substr($post,1);
$swaps[] = $pre.$b.$a.$post;
};
return $swaps;
}
print_R(swapcharpairs());