Group regex with fix part - php

$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = "/([\w]+)\s([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/";
preg_match($rgx, $txt, $res);
var_dump($res);
I would like to simplify this pattern by avoiding repeating "([0-9]+)" because i don't know how many they are.
Any one can say me how ?

Here is a direct answer to the question, as you have stated it:
/[\w]+\s[0-9]+(?:\.[0-9]+)+/
However, note that I have removed all of the numbered capture groups. This could be problematic, depending on what you're actually trying to achieve.
It is not possible to "count" with capture groups in regular expressions, so you would need to write some other code (i.e. not just one match, with one regex, and using back-references) to deal with this if you wish to run any queries like "What digits appear after the fifth "."?"

There are two ways you can do this. If you just need to verify that the string matches the pattern, this regex will do the job: \w+\s(?:[0-9]+\.?)+
However, if you need to split the string in to it's component parts (in my interpretation, the beginning word followed by the sequence of decimal separated numbers), then you could use this pattern: (\w+)\s((?:[0-9]+\.?)+)
The second pattern will return the beginning word, toto1 in group 1, followed by the decimal separated numbers in group 2 555.4545.555.999.7465.432.674 which you could then split in PHP if required: $sequence = explode('.', $matches[2]);

What you need can be obtained with a preg_split with a regex matching 1 or more whitespaces or dots:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '/[\s.]+/';
$res = preg_split($rgx, $txt);
print_r($res);
See the PHP demo
If you need a regex approach, you can use a \G based regex with preg_match_all:
'~(?|([\w]+)|(?!\A)\G[\s.]*([0-9]+))~'
See the regex demo and a PHP demo:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '~(?|(\w+)|(?!\A)\G[\s.]*([0-9]+))~';
preg_match_all($rgx, $txt, $res);
print_r($res[1]);
Pattern details:
The (?|...) is a branch reset group to reset group IDs in all the branches
(\w+) - Group 1 matches 1+ word chars
| - or (then goes Branch 2)
(?!\A)\G - the end of the previous successful match
[\s.]* - zero or more whitespaces or dots
([0-9]+) - Group 1 (again!) matching 1 or more digits.

Related

PHP (preg_replace) regex strip image sizes from filename

I'm working on a open-source plugin for WordPress and frankly facing an odd issue.
Consider the following filenames:
/wp-content/uploads/buddha_-800x600-2-800x600.jpg
/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg
/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg
/wp-content/uploads/UI-paths-800x800-1.jpg
The current regex I have:
(-[0-9]{1,4}x[0-9]{1,4}){1}
This will remove both matches from the filename, for example buddha_-800x600-2-800x600.jpg will become buddha_-2.jpg which is invalid.
I have tried a variety of regex:
.*(-\d{1,4}x\d{1,4}) // will trip out everything
(-\d{1,4}x\d{1,4}){1}|.*(-\d{1,4}x\d{1,4}){1} // same as above
(-\d{1,4}x\d{1,4}){1}|(-\d{1,4}x\d{1,4}){1} // will strip out all size matches
Unfortunately my knowledge with regex is quite limited, can someone advise how to achieve the goal please?
The goal is to remove only what is relevant, which would result in:
/wp-content/uploads/buddha_-800x600-2.jpg
/wp-content/uploads/cutlery-tray-800x600-2.jpeg
/wp-content/uploads/custommade-wallet-800x600-2.jpeg
/wp-content/uploads/UI-paths-1.jpg
Much appreciated!
You can use a capture group with a backreference to match strings where there are 2 of the same parts and replace that with a single part.
Or match the dimensions to be removed.
((-\d+x\d+)-\d+)\2|-\d+x\d+
( Capture group 1
(-\d+x\d+) Capture group 2, match - 1+ digits x and 1+ digits
-\d+ Match - and 1+ digits
)\2 Close group 2 followed by a backreference to what is captured in grouip 1
| Or
-\d+x\d+ Match the dimensions format
Regex demo | Php demo
For example
$pattern = '~((-\d+x\d+)-\d+)\2|-\d+x\d+~';
$strings = [
"/wp-content/uploads/buddha_-800x600-2-800x600.jpg",
"/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg",
"/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg",
"/wp-content/uploads/UI-paths-800x800-1.jpg",
];
foreach ($strings as $s) {
echo preg_replace($pattern, '$1', $s) . PHP_EOL;
}
Output
/wp-content/uploads/buddha_-800x600-2.jpg
/wp-content/uploads/cutlery-tray-800x600-2.jpeg
/wp-content/uploads/custommade-wallet-800x600-2.jpeg
/wp-content/uploads/UI-paths-1.jpg
I would try something like this. You can test it yourself. Here is the code:
$a = [
'/wp-content/uploads/buddha_-800x600-2-800x600.jpg',
'/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg',
'/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg',
'/wp-content/uploads/UI-paths-800x800-1.jpg'
];
foreach($a as $img)
echo preg_replace('#-\d+x\d+((-\d+|)\.[a-z]{3,4})#i', '$1', $img).'<br>';
It checks for ending -(number)x(number)(dot)(extension)
This is a clear case of « Match the rejection, revert the match ».
So, you just have to think about the pattern you are searching to remove:
[0-9]+x[0-9]+
which is simply (much condensed):
\d+x\d+
The next step is to build the groups extractor:
^(.*[^0-9])[0-9]+x[0-9]+([^x]*\.[a-z]+)$
We added the extension of the file as a suffix for the extract.
The rejection of the "x" char is a (bad…) trick to ensure the match of the last size only. It won’t work in the case of an alphanumeric suffix between the size and the extension (toto-800x1024-ex.jpg for instance).
And then, the replacement string:
$1$2
For clarity of course, we are only working on a successfully extracted filename. But if you want to treat the whole string, the pattern becames:
^/(.*[^0-9])[0-9]+x[0-9]+([^/x]*\.[a-z]+)$
If you want to split the filename and the folder name:
^/(.*/)([^/]+[^0-9])[0-9]+x[0-9]+([^/x]*)(\.[a-z]+)$
^/(.*/)([^/]+\D)\d+x\d+([^/x]*)(\.[a-z]+)$
$folder=$1;
$filename="$1$2";

split a value into two and then reverse the value in php

I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please
You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.

Using regex to extract first half of string

I have variable strings like the below:
The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO
The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT
Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF
I would like to extract this part of these string in PHP dynamically and i was looking at using regex but haven't had much success:
The.Test.String.A01Y18
The.Regex.F05P78
Word.C05F78
And ultimately to:
The Test String A01Y18
The Regex F05P78
Word C05F78
The first part of the text will be variable in length and will separate each word with a period. The next part will always be the same length with the pattern:
One letter, 2 number, one letter, 2 numbers pattern (C05F78)
Any thing in the string after that is what I would like to remove.
that's it
$x=array(
"The.Test.String.A01Y18.123h.WIB-DI.DO5.1.K.314-ECO",
"The.Regex.F05P78.123h.WIB-DI.DO5.1.K.314-EYT",
"Word.C05F78.342T.DSW-RF.EF5.2.F.342-DDF"
);
for ($i=0, $tmp_count=count($x); $i<$tmp_count; ++$i) {
echo str_replace(".", " ", preg_replace("/^(.+?)([a-z]{1}[0-9]{2}[a-z]{1}[0-9]{2})\..+$/i", "\\1\\2", $x[$i]))."<br />";
}
Using this regular expression should work, replacing each of your strings with the first capturing group:
^((?:\w+\.)+\w\d{2}\w\d{2}).*
See demo at http://regex101.com/r/fR3pM6
This is valid too:
preg_match("\.*[\w\d]{6}", stringVariable)
.* for all digits atleast we found a composition of letters and words of 6 characters ([\w\d]{6})
Result:
Match 1: The.Test.Stsrisng.A01Y18
Match 2: The.Regex.F05P78
Match 3: Word.C05F78

Regex - matching all between second set of brackets ([])

I have the following string that I need to match only the last seven digets between [] brackets. The string looks like this
[15211Z: 2012-09-12] ([5202900])
I only need to match 5202900 in the string contained between ([]), a similar number could appear anywhere in the string so something like this won't work (\d{7})
I also tried the following regex
([[0-9]{1,7}])
but this includes the [] in the string?
If you just want the 7 digits, not the brackets, but want to make sure that the digits are surrounded with brackets:
(?<=\[)\d{7}(?=\])
FYI: This is called a positive lookahead and positive lookbehind.
Good source on the topic: http://www.regular-expressions.info/lookaround.html
Try matching \(\[(\d{7})\]\), so you match this whole regular expression, then you take group 1, the one between unescaped parentheses. You can replace {7} with a '*' for zero or more, + for 1 or more or a precise range like you already showed in your question.
You can try to use
\[(\d{1,7})\]
If first pattern looks like yours (not only digits), then this should work for you to extract group of digits surrounded by brackets like ([123]):
\(\[(\d+)\]\)
From your details, lookbehind and lookaround seems to be good one. You can also use this one:
(\d{7})\]\)$
Since the pattern of seven digit is expected at the end of the line, engine need to work less in order to find the match.
Hope it helps!
Here is a benchmark (in Perl, but I think is close the same in php) that compares lookaround approach and capture group:
use Benchmark qw(:all);
my $str = q/[15211Z: 2012-09-12] ([5202900])/;
my $count = -3;
cmpthese($count, {
'lookaround' => sub {
$str =~ /(?<=\[)\d{7}(?=\])/;
},
'capture group' => sub {
$str =~ /\[(\d{7})\]/;
},
});
result:
Rate lookaround capture group
lookaround 274914/s -- -70%
capture group 931043/s 239% --
As we can see, capture is more than 3 times faster than lookaround.

php regex - need 2 groups captured

I need 2 groups captured: 1-expr (can be empty); 2-essi
see code below
$s = 'regular expr<span>essi</span>on contains';
function my_func($matches){
//I need 2 groups captured
//$matches[1] - "expr" (see $s before span) - can be empty, but I still need to capture it
//$matches[2] - "essi" (between spans)
}
$pattern = "???";
echo preg_replace_callback($pattern, my_func, $s);
$pattern = "~(\w*)<span>(\w+)</span>~";
This should do the trick.
If the second group should be able to match empty strings as well, replace the + by another *. Note that \w will match letters, digits and underscores. If that is too much or insufficient, replace it by an appropriate character class.
One more thing: I think the syntax for preg_replace_callback requires you to hand in the function name as a string.

Categories