I'm needing to parse some user inputs. They're coming to me in the form of clauses ex:
total>=100
name="foo"
bar!="baz"
I have a list of all of the available operators (<, >, <=, !=, = etc) and was using this to build a regex pattern.
My goal is to get each clause split into 3 pieces:
$result=["total", ">=", "100"]
$result=["name", "=", "foo"]
$result=["bar", "!=", "baz"]
My pattern takes all the operators and builds something like this (condensed for length)(this example only matches > and >=:
preg_split("/(?<=>)|(?=>)|(?<=>=)|(?=>=)/", $clause,3)
So a lookbehind and a lookahead for each operator. I had preg_split restrict to 3 groups in case a string contained an operator character (name="<wow>").
My regex works pretty great, however it fails terribly for any operator which includes characters in another operator. For example, >= is never split right because > is matched and split first. The same for != which is matched by =
Here's what I'm getting:
$result=["total", ">", "=100"]
$result=["bar", "!", "=baz"]
Is it possible to use regex to do what I'm attempting? I need to keep track of the operator and can't simply split the string on it (hence the lookahead/behind solution).
One possiblity I considered would be to force a space or unusual character around all the operators so that > and >= would become, say, {>} and {>=} if the regex had to match the brackets, then it wouldn't be able to match early like it is now. However, this isn't an elegant solution and it seems like some of the regex masters here might know a better way.
Is regex the best solution or should I use string functions?
This question is somewhat similar, but I don't believe the answer's pseudocode is accurate - I couldn't get it to work well. How to manipulate and validate string containing conditions that will be evaluated by php
I'd suggest matching instead of splitting, as the result will still be an array.
^(.*?)([!<>=|]=?)(.*?)$
Here is a demo.
PHP code:
$re = "/^(.*?)([!<>=|]=?)(.*?)$/m";
$str = "total>=100\nname=\"foo\"\nbar!=\"baz\"";
preg_match_all($re, $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => total>=100
[1] => name="foo"
[2] => bar!="baz"
)
[1] => Array
(
[0] => total
[1] => name
[2] => bar
)
[2] => Array
(
[0] => >=
[1] => =
[2] => !=
)
[3] => Array
(
[0] => 100
[1] => "foo"
[2] => "baz"
)
)
You can try this regexp
/^(.*)([><!]?[=]+|[>]+|[<]+)(.*)$/mgU
I have tried it here: https://regex101.com/ with input:
xxx>"sdads"
yyy<"sadasd"
name="foo"
total>=100
total<=100
total<=100
bar!="baz"
and it matched everything in right place
Using the regex: /([^<=>!]*)([<=>!]{1,2})(.*)/ with preg_match on each line will get you the desired result; at least for your examples, but likely much more.
I think one syntax that is useful and maybe you didn't know about is [].
[...] means match any character in the braces
[^...] means match any character NOT in the braces
Code example
$test = 'total>=100';
$regex = '/([^<=>!]*)([<=>!]{1,2})(.*)/';
preg_match($regex, $test, $match);
print_r($match);
result:
array(4
0 => total>=100
1 => total
2 => >=
3 => 100
)
Related
Im trying to split between operators and float/int values in a string.
Example:
$input = ">=2.54";
Output should be:
array(0=>">=",1=>"2.54"); .
Operators cases : >,>=,<,<=,=
I tried something like this:
$input = '0.2>';
$exploded = preg_split('/[0-9]+\./', $input);
but its not working.
Here is a working version using preg_split:
$input = ">=2.54";
$parts = preg_split("/(?<=[\d.])(?=[^\d.])|(?<=[^\d.])(?=[\d.])/", $input);
print_r($parts);
This prints:
Array
(
[0] => >=
[1] => 2.54
)
Here is an explanation of the regex used, which says to split when:
(?<=[\d.])(?=[^\d.]) a digit/dot precedes and a non digit/dot follows
| OR
(?<=[^\d.])(?=[\d.]) a non digit/dot precedes and a digit/dot follows
That is, we split at the interface between a number, possibly a decimal, and an arithmetic symbol.
Try :
$input = ">=2.54";
preg_match("/([<>]?=?) ?(\d*(?:\.\d+)?)/",$input,$exploded);
If you want to split between the operators, you might use and alternation to match the variations of the operators, and use \K to reset the starting point of the reported match.
This will give you the position to split on. Then assert using lookarounds that there is a digit on the left or on the right.
\d\K(?=[<>=])|(?:>=?|<=?|=)\K(?=\d)
Explanation
\d\K(?=[<>=]) Match a digit, forget what was matched and assert either <, > or = on the right
| Or
(?:>=?|<=?|=)\K(?=\d) Match an operator, forget what was matched and assert a digit on the right
Regex demo | Php demo
For example
$strings = [
">=2.54",
"=5",
"0.2>"
];
$pattern = '/\d\K(?=[<>=])|(?:>=?|<=?|=)\K(?=\d)/';
foreach ($strings as $string) {
print_r(preg_split($pattern, $string));
}
Output
Array
(
[0] => >=
[1] => 2.54
)
Array
(
[0] => =
[1] => 5
)
Array
(
[0] => 0.2
[1] => >
)
I'm trying to get all substrings matched with a multiplier:
$list = '1,2,3,4';
preg_match_all('|\d+(,\d+)*|', $list, $matches);
print_r($matches);
This example returns, as expected, the last match in [1]:
Array
(
[0] => Array
(
[0] => 1,2,3,4
)
[1] => Array
(
[0] => ,4
)
)
However, I would like to get all strings matched by (,\d+), to get something like:
Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
Is there a way to do this with a single function such as preg_match_all()?
According to Kobi (see comments above):
PHP has no support for captures of the same group
Therefore this question has no solution.
It's true that PHP (or better to say PCRE) doesn't store values of repeated capturing groups for later access (see PCRE docs):
If a capturing subpattern is matched repeatedly, it is the last portion of the string that it matched that is returned.
But in most cases the known token \G does the job. \G 1) matches the beginning of input string (as \A or ^ when m modifier is not set) or 2) starts match from where the previous match ends. Saying that, you have to use it like the following:
preg_match_all('/^\d+|\G(?!^)(,?\d+)\K/', $list, $matches);
See live demo here
or if capturing group doesn't matter:
preg_match_all('/\G,?\d+/', $list, $matches);
by which $matches will hold this (see live demo):
Array
(
[0] => Array
(
[0] => 1
[1] => ,2
[2] => ,3
[3] => ,4
)
)
Note: the benefit of using \G over the other answers (like explode() or lookbehind solution or just preg_match_all('/,?\d+/', ...)) is that you are able to validate the input string to be only in the desired format ^\d+(,\d+)*$ at the same time while exporting the matches:
preg_match_all('/(?:^(?=\d+(?:,\d+)*$)|\G(?!^),)\d+/', $list, $matches);
Using lookbehind is a way to do the job:
$list = '1,2,3,4';
preg_match_all('|(?<=\d),\d+|', $list, $matches);
print_r($matches);
All the ,\d+ are in group 0.
output:
Array
(
[0] => Array
(
[0] => ,2
[1] => ,3
[2] => ,4
)
)
Splitting is only an option when the character to split isn't used in the patterns to match itself.
I had a situation where a badly formatted comma separated line has to be parsed into any of a number of known options.
i.e. options '1,2', '2', '2,3'
subject '1,2,3'.
Splitting on ',' will result in '1', '2', and '3'; only one ('2') of which is a valid match, this happens because the separator is also part of the options.
The naïve regex would be something like '~^(1,2|2|2,3)(?:,(1,2|2|2,3))*$~i', but this runs into the problem of same-group captures.
My "solution" was to just expand the regex to match the maximum number of matches possible:
'~^(1,2|2|2,3)(?:,(1,2|2|2,3))?(?:,(1,2|2|2,3))?$~i'
(if more options were available, just repeat the '(?:,(1,2|2|2,3))?' bit.
This does result in empty string results for "unused" matches.
It's not the cleanest solution, but works when you have to deal with badly formatted input data.
Why not just:
$ar = explode(',', $list);
print_r($ar);
From http://www.php.net/manual/en/regexp.reference.repetition.php :
When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.
Also similar thread:
How to get all captures of subgroup matches with preg_match_all()?
I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )
You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.
let me start by saying the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
Test String:
1-gc-communications/edit/profile_picture
Expected Output:
Array ( [0] => 1 [1] => gc-communications [2] => /edit/profile_picture )
The best I could come up with was the following patterns (along with their results - with a limit of 3)
Pattern: /-|edit\/profile_picture/
Result: Array ( [0] => 1 [1] => gc [2] => communications/edit/profile_picture )
^ This one is flawed because it does both dashes.
Pattern: /~-~|edit\/profile_picture/
Result: Array ( [0] => 1-gc-communications/ [1] => )
^ major fail.
I know I can do a 2-element limit and just break on the first / and then do a preg_split on the result array, but I would love a way to make this work with one line.
If this is a no-go I am open to other "one liner" solutions.
Try this one
$str = '1-gc-communications/edit/profile_picture';
$match = preg_split('#([^-]+)-([^/]+)/(.*)#', $str, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
return like as
array (
0 => '',
1 => '1',
2 => 'gc-communications',
3 => 'edit/profile_picture',
4 => '',
)
the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
This task seems a great candidate for sscanf() -- it is specifically designed for parsing (scanning) a formatted string. Not only is the syntax brief, you know that you do not need to make repeated matches with the pattern. The output, in case it matters, can be pre-cast as an integer or string for convenience. The remaining string from the first occurring slash are simply ignored.
Code: (Demo)
$str = '1-gc-communications/edit/profile_picture';
var_export(
sscanf($str, '%d-%[^/]')
# ^^ ^^^^^- greedily match one or more non-slash characters
# ^^------- greedily match one or more numeric characters
);
Output:
array (
0 => 1, #<-- integer-typed
1 => 'gc-communications', #<-- string-typed
)
I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!
You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]
$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)
Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone
As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex