preg_match_all for Unknown Sets of 3 Integers - php

I am using preg_match_all, but I have a problem I am not sure can be solved using this method. The following line is part of what I am retrieving:
XXC033-101-143-147-175-142115-
The sets of numbers (033-101-143, etc) are what I want to refer to. However, the number of sets (always containing three integers) is unknown and can range anywhere from 1 to 10. If I knew there would always only be 2 sets, I would have the following:
if (preg_match_all('#([A-Z]{2}C)([0-9]{3})-([0-9]{3})-([0-9]{6})#', $wwalist, $matches))
...rest of code...
Is there anyway to do this when I have no way of knowing the number of possible sets of 3 integers. They will always be between the #([A-Z]{2}C) and -([0-9]{6}).
Any help would be greatly appreciated! Thanks!

Use
'#([A-Z]{2}C)([0-9]{3}-){1,10}([0-9]{6})#'
{1,10} specifies that the preceding subpattern enclosed in brackets [0-9]{3}- will repeat 1-10 times.
In addition:
If it can repeat 0 or more times for an indefinite maximum number, use *.
If it can repeat 1 or more times for an indefinite maximum number, use +.

Targeting only the 3-digit substrings, individually/optionally capture the groups like this:
Pattern: (Demo)
/[A-Z]{2}C\K(\d{3}-)(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?(\d{3}-)?/

Related

Retrieve 0 or more matches from comma separated list inside parenthesis using regex

I am trying to retrieve matches from a comma separated list that is located inside parenthesis using regular expression. (I also retrieve the version number in the first capture group, though that's not important to this question)
What's worth noting is that the expression should ideally handle all possible cases, where the list could be empty or could have more than 3 entries = 0 or more matches in the second capture group.
The expression I have right now looks like this:
SomeText\/(.*)\s\(((,\s)?([\w\s\.]+))*\)
The string I am testing this on looks like this:
SomeText/1.0.4 (debug, OS X 10.11.2, Macbook Pro Retina)
Result of this is:
1. [6-11] `1.0.4`
2. [32-52] `, Macbook Pro Retina`
3. [32-34] `, `
4. [34-52] `Macbook Pro Retina`
The desired result would look like this:
1. [6-11] `1.0.4`
2. [32-52] `debug`
3. [32-34] `OS X 10.11.2`
4. [34-52] `Macbook Pro Retina`
According to the image above (as far as I can see), the expression should work on the test string. What is the cause of the weird results and how could I improve the expression?
I know there are other ways of solving this problem, but I would like to use a single regular expression if possible. Please don't suggest other options.
When dealing with a varying number of groups, regex ain't the best. Solve it in two steps.
First, break down the statement using a simple regex:
SomeText\/([\d.]*) \(([^)]*)\)
1. [9-14] `1.0.4`
2. [16-55] `debug, OS X 10.11.2, Macbook Pro Retina`
Then just explode the second result by ',' to get your groups.
Probably the \G anchor works best here for binding the match to an entry point. This regex is designed for input that is always similar to the sample that is provided in your question.
(?<=SomeText\/|\G(?!^))[(,]? *\K[^,)(]+
(?<=SomeText\/|\G) the lookbehind is the part where matches should be glued to
\G matches where the previous match ended (?!^) but don't match start
[(,]? *\ matches optional opening parenthesis or comma followed by any amount of space
\K resets beginning of the reported match
[^,)(]+ matches the wanted characters, that are none of ( ) ,
Demo at regex101 (grab matches of $0)
Another idea with use of capture groups.
SomeText\/([^(]*)\(|\G(?!^),? *([^,)]+)
This one without lookbehind is a bit more accurate (it also requires the opening parenthesis), of better performance (needs fewer steps) and probably easier to understand and maintain.
SomeText\/([^(]*)\( the entry anchor and version is captured here to $1
|\G(?!^),? *([^,)]+) or glued to previous match: capture to $2 one or more characters, that are not , ) preceded by optional space or comma.
Another demo at regex101
Actually, stribizhev was close:
(?:SomeText\/([^() ]*)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\))
Just had to make that one class expect at least one match
(?:SomeText\/([0-9.]+)\s*\(|(?!^)\G),?\s*([^(),]+)(?=[^()]*\)) is a little more clear as long as the version number is always numbers and periods.
I wanted to come up with something more elegant than this (though this does actually work):
SomeText\/(.*)\s\(([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?([^\,]+)?\,?\s?\)
Obviously, the
([^\,]+)?\,?\s?
is repeated 6 times.
(It can be repeated any number of times and it will work for any number of comma-separated items equal to or below that number of times).
I tried to shorten the long, repetitive list of ([^\,]+)?\,?\s? above to
(?:([^\,]+)\,?\s?)*
but it doesn't work and my knowledge of regex is not currently good enough to say why not.
This should solve your problem. Use the code you already have and add something like this. It will determine where commas are in your string and delete them.
Use trim() to delete white spaces at the start or the end.
$a = strpos($line, ",");
$line = trim(substr($line, 55-$a));
I hope, this helps you!

Regex for exact number / sting match in between

I want to make a regex where I can find the exact number in between a string.
eg. finding the number 2 in 3, 5, 25, 22,2, 15
What I have is /*,2,*/.
But with this regex it matches 22,25 or just anything with a 2 in it. I want it where only match where the number 2 itself is between the commas or without the commas standing alone.
*Update
Both the number(needle) i look for and string(haystack) where i seek it can vary.
Eg if the number i seek is always 2
I want to find them in 2,3,44,23,22,1 or 3,4,22,5,2 or 2 and i should be able to find one match for each of the group of numbers.
You should probably use boundaries (\b) so a leading/trailing comma isn't required.
/\b2\b/
You should do this instead:
,(\d), #for any single digit
,(2), #for 2 in particular
Demo: http://regex101.com/r/vP6jI1

Regex match number consisting of specific range, and length?

I'm trying to match a number that may consist of [1-4], with a length of {1,1}.
I've tried multiple variations of the following, which won't work:
/^string\-(\d{1,1})[1-4]$/
Any guidelines? Thanks!
You should just use:
/^string-[1-4]$/
Match the start of the string followed by the word "string-", followed by a single number, 1 to 4 and the end of the string. This will match only this string and nothing else.
If this is part of a larger string and all you want is the one part you can use something like:
/string-[1-4]\b/
which matches pretty much the same as above just as part of a larger string.
You can (in either option) also wrap the character class ([1-4]) in parentheses to get that as a separate part of the matches array (when using preg_match/preg_match_all).
This is not hard:
/^string-([1-4]{1})$/

Preg_match cuts last zeros from some numbers

I think it only happens when I write a regex. I have a simple regex to validate a set of pagination numbers, that later will be submitted to database, like 5, 10, 25, 50, 100, 250 example:
/all|5|10|25|50|100|250/
When I perform a test, my regex above cuts 0 only from numbers 50, 100 and 250 but not from 10!!?
Online example:
http://viper-7.com/IbKFKw
What am I doing wrong here? What am I really missing this time?
This is because in the string 50, the regex first matches 5, which is valid. In the string 250, the regex first matches 25, which is valid and ends here.
You might try adding anchors:
/^(?:all|5|10|25|50|100|250)$/
This forces the regex to match the whole string, and hence, return the correct match you are looking for.
The alternatives are tried from left to right, so matching 5 takes precedence over 50. But there's no 1 to cut off the 0 from 10. You can simply reorder them:
/all|250|100|50|25|10|5/
Alternatively, add the 0 optionally to the relevant alternatives (and since ? is greedy, the 0 will be matched if present):
/all|50?|100?|250?/
or
/all|(?:5|10|25)0?/
If this is not for matching but for validation (i.e. checking against the entire string), then go with Jerry's suggestion and use anchors to make sure that there are no undesired characters around your number:
/^(?:all|5|10|25|50|100|250)$/
(Of course inside (?:...) you could also use any of my above patterns, but now precedence is irrelevant because incomplete matches are disallowed.)

How to match those numbers?

I have an array of numbers, for example:
10001234
10002345
Now I have a number, which should be matched against all of those numbers inside the array. The number could either be 10001234 (which would be easy to match), but it could also be 100001234 (4 zeros instead of 3) or 101234 (one zero instead of 3) for example. Any combination could be possible. The only fixed part is the 1234 at the end.
I cant get the last 4 chars, because it can also be 3 or 5 or 6 ..., like 1000123456.
Whats a good way to match that? Maybe its easy and I dont see the wood for the trees :D.
Thanks!
if always the first number is one you can use this
$Num=1000436346;
echo(int)ltrim($Num."","1");
output:
436346
$number % 10000
Will return the remainder of dividing a number by 10000. Meaning, the last four digits.
The question doesn't make the criteria for the match very clear. However, I'll give it a go.
First, my assumptions:
The number always starts with a 1 followed by an unknown number of 0s.
After that, we have a sequence of digits which could be anything (but presumably not starting with zero?), which you want to extract from the string?
Given the above, we can formulate an expression fairly easily:
$input='10002345';
if(preg_match('/10+(\d+)/',$input,$matches)) {
$output = $matches[1];
}
$output now contains the second part of the number -- ie 2345.
If you need to match more than just a leading 1, you can replace that in the expression with \d to match any digit. And add a plus sign after it to allow more than one digit here (although we're still relying on there being at least one zero between the first part of the number and the second).
$input='10002345';
if(preg_match('/\d+0+(\d+)/',$input,$matches)) {
$output = $matches[1];
}

Categories