I'm trying to use regex to match any characters that's not a '+' in between the words 'begin', and 'end' but it's not matching for some reason. What am I doing wrong?
$content = "begin+text1+end begin text2 end";
$regex = "~begin(^\++?)end~";
preg_match_all($regex, $content, $matches);
print_r($matches);
Result:
Array ( [0] => Array ( ) [1] => Array ( ) )
Expected Result:
Array ( [0] => Array ( begin text2 end ) [1] => Array ( text2 ) )
You need to put the anchor ^ within a character class to create a negated character class:
"~begin([^+]+)end~"
If you change your pattern slightly you should get the expected result:
(?=[^\s\d]+)begin([^+\n]+)end
This pattern uses a positive lookahead - asserting that space or digit will be inversely matched before the word begin. The capture group then grabs anything between that and end which doesn't contain a + or line break \n.
Result:
Array ([0] => begin text2 end)[1] => Array([0] => text2))
Example:
https://regex101.com/r/qS6iL1/2
Related
I'm trying to find simple key-value-pairs in strings, given as JSON-objects, while using preg_replace_callback().
Unfortunately, the values given can be of type string, number, boolean, null, array - and worst of all - objects. My own attempts solving this problem resulted in either an incomplete selection or over-selecting multiple JSON occurances as one.
Here the things i tried:
String:
text text {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(.+?)\}
Match:
{"key":"value"
Above: This ignores the inner }-bracket
String:
text text {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(.+)\}
Match:
{"key":"value"}
Above: This would (theoretically) work, but when having multiple JSON occurances, i get:
{"key":"value"}} {"key":{"key":"value"}
Next attempt:
String:
text text {"key":{"key":"value"}} {"key":{"key":"value"}} text
Regex:
\{"(.+?)"\:(?:(\{(?:.+?)\})|(?:")?(.+?)(?:")?)\}
Match:
{"key":"value"}
Above: Again, that would theoreticcally work. But when taking, for example, the following string:
text text {"key":{"key":{"key":"value"}}} text
The result is...
{"key":{"key":"value"}
Missing one bracket
PCRE supports recursive matching for that kind of nested structures. Here is a demo:
$data = 'text text
{"key":{"key":"value{1}","key2":false}}
{"key":{"key":"value2"}}
{"key":{"key":{"key":"value3"}}} text';
$pattern = '(
\{ # JSON object start
(
\s*
"[^"]+" # key
\s*:\s* # colon
(
# value
(?:
"[^"]+" | # string
\d+(?:\.\d+)? | # number
true |
false |
null
) |
(?R) # pattern recursion
)
\s*
,? # comma
)*
\} # JSON object end
)x';
preg_replace_callback(
$pattern,
function ($match) {
var_dump(json_decode($match[0]));
},
$data
);
With the additional requirements of using preg_replace_callback() and not knowing the depth of the json objects ahead of time, perhaps this is another possible approach (more information on {1,} here):
<?php
// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';
function callback($array) {
// Your function here...
print_r($array);
echo "Found:\n";
echo "{$array[0]}\n";
}
preg_replace_callback('/\{"(.+?)"\:(.+?)\}{1,}/', 'callback', $str);
?>
Output (PHP 7.3.19):
$ php q18.php
Array
(
[0] => {"key":{"key":"value1"}}
[1] => key
[2] => {"key":"value1"
)
Found:
{"key":{"key":"value1"}}
Array
(
[0] => {"key":{"key":"value2"}}
[1] => key
[2] => {"key":"value2"
)
Found:
{"key":{"key":"value2"}}
Array
(
[0] => {"key":{"key":{"key":"value3"}}}
[1] => key
[2] => {"key":{"key":"value3"
)
Found:
{"key":{"key":{"key":"value3"}}}
Previous idea:
Would something like this be helpful for your use case(s)?
<?php
// ref: https://stackoverflow.com/q/66379119/1167750
$str = 'text text {"key":{"key":"value1"}} {"key":{"key":"value2"}} {"key":{"key":{"key":"value3"}}} text';
preg_match_all('/\{"(.+?)"\:(.+?)\}{1,3}/', $str, $matches);
print_r($matches);
echo "Found:\n";
print_r($matches[0]);
?>
Output (PHP 7.3.19):
$ php q18.php
Array
(
[0] => Array
(
[0] => {"key":{"key":"value1"}}
[1] => {"key":{"key":"value2"}}
[2] => {"key":{"key":{"key":"value3"}}}
)
[1] => Array
(
[0] => key
[1] => key
[2] => key
)
[2] => Array
(
[0] => {"key":"value1"
[1] => {"key":"value2"
[2] => {"key":{"key":"value3"
)
)
Found:
Array
(
[0] => {"key":{"key":"value1"}}
[1] => {"key":{"key":"value2"}}
[2] => {"key":{"key":{"key":"value3"}}}
)
If you knew ahead of time that the maximum depth these nested structures might be, you can adjust the {1,3} part ahead of time to a different setting. For example: {1,4}, {1,5}, etc. More information on that part can be found in the documentation here.
If I do the regex matching
preg_match('/^[*]{2}((?:[^*]|[*][^*]*[*])+?)[*]{2}(?![*]{2})/s', "**A** **B**", $matches);
I get the result for $matches I want of
Array ( [0] => **A** [1] => A )
but I am not sure how to modify the regex to yield the same result in $matches from the input text without the space in the middle, that is, "**A****B**".
It looks like the regex matching
preg_match('/^[*]{2}((?:[^*]|[*][^*]*[*])+?)[*]{2}/s', "**A****B**", $matches);
yields the result for $matches I want of
Array ( [0] => **A** [1] => A )
I have a string that looks something like this:
535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)
The text between () is always at the end of the string (at least for now).
i need to split it into an array similar to this:
[0] => [0] 535354,345356,3543674,34667,2345347,-3536,4532452
[1] => [0] 234536,2345634,-4513453
=> [1] 2345,-13254,13545
What expression should i use for preg_match_all?
Best i could get with my limited knowledge is /([0-9]{1,}){1,}.*(?=(\(.*\)))/U but i still get some unwanted elements.
You may use a regex that will match chunks of numbers outside of parentheses and those inside with "~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~" where a $numrx stands for the number regex (that can be enhanced further).
The -?\d+(?:\s+-?\d+)* matches an optional -, 1 or more digits, and then 0+ sequences of 1+ whitespaces followed with optional - and 1+ digits. (?<=\()\s*$numrx\s*(?=\)) matches the same only if preceded with ( and followed with ).
See this PHP snippet:
$s = "535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)";
$numrx = "-?\d+(?:\s+-?\d+)*";
preg_match_all("~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~", $s, $m);
$res = array();
foreach ($m[0] as $k) {
array_push($res,explode(" ",trim($k)));
}
print_r($res);
Output:
[0] => Array
(
[0] => 535354
[1] => 345356
[2] => 3543674
[3] => 34667
[4] => 2345347
[5] => -3536
[6] => 4532452
)
[1] => Array
(
[0] => 234536
[1] => 2345634
[2] => -4513453
)
[2] => Array
(
[0] => 2345
[1] => -13254
[2] => 13545
)
You can use this regex in preg_match_all:
$re = '/\d+(?=[^()]*[()])/';
RegEx Demo
RegEx Breakup:
\d+ # match 1 or more digits
(?= # lookahead start
[^()]* # match anything but ( or )
[()] # match ( or )
) # lookahead end
I am trying to get the value after the dots, and I would like to get all of them (each as their own key/value).
The following is what I am running:
$string = "div.cat.dog#mouse";
preg_match_all("/\.(.+?)(\.|#|$)/", $string, $matches);
and when I do a dump of $matches I am getting this:
Array
(
[0] => Array
(
[0] => .cat.
)
[1] => Array
(
[0] => cat
)
[2] => Array
(
[0] => .
)
)
Where item [1] is, it is only returning 1 value. What I was expecting was for it to return (for this case) 2 items cat and dog. How come dog isn't getting picked up by preg_match_all?
Use lookahead:
\.(.+?)(?=\.|#|$)
RegEx Demo
Problem in your regex is that you're matching DOT on LHS and a DOT or HASH or end of input on RHS of match. After matching that internal pointer moves ahead leaving no DOT to be matched for next word.
(?=\.|#|$) is a positive lookahead that doesn't match these characters but just looks ahead so pointer remains at the cat instead of DOT after cat..
I have the following text and I would like to preg_match_all what is within the {'s and }'s if it contains only a-zA-Z0-9 and :
some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf
I am trying to match {SOMETHING21} {SOMETHI32NG:MORE} {TEXT:GET:2} there can be several :'s within the tag.
What I currently have is:
preg_match_all('/\{([a-zA-Z0-9\-]+)(\:([a-zA-Z0-9\-]+))*\}/', $from, $matches, PREG_SET_ORDER);
It works as expected for {SOMETHING21} and {SOMETHI32NG:MORE} but for {TEXT:GET:2} it only matches TEXT and 2
So it only matches the first and last word within the tag, and leaves the middle ones out of the $matches array. Is this even possible or should I just match them and then explode on : ?
-- edit --
Well the question isn't if I can get the tags, the question is if I can get them grouped without having to explode the results again. Even though my current regex finds all the results the subpattern does not come back with all the matches in $matches.
I hope the following will clear it up abit more:
\{ // the match has to start with {
([a-zA-Z0-9\-]+) // after the { the match needs to have alphanum consisting out of 1 or more characters
(
\: // if we have : it should be followed by alphanum consisting out of 1 or more characters
([a-zA-Z0-9\-]+) // <---- !! this is what it is about !! even though this subexpression is between brackets it is not put into $matches if more then one of these is found
)* // there could be none or more of the previous subexpression
\} // the match has to end with }
You can't get all the matched values of a capturing group, you only get the last one.
So you have to match the pattern:
preg_match_all('/{([a-z\d-]+(?::[a-z\d-]+)*)}/i', $from, $matches);
and then split each element in $matches[1] on :.
I used non-capture groupings to eliminate the inner groups, and just capture the outer complete colon-separated list.
$from = "some text,{SOMETHING21} {SOMETHI32NG:MORE}some msdf{TEXT:GET:2}sdfssdf sdf sdf";
preg_match_all('/\{((?:[a-zA-Z0-9\-]+)(?:\:(?:[a-zA-Z0-9\-]+))*)\}/', $from, $matches, PREG_SET_ORDER);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => SOMETHING21
)
[1] => Array
(
[0] => {SOMETHI32NG:MORE}
[1] => SOMETHI32NG:MORE
)
[2] => Array
(
[0] => {TEXT:GET:2}
[1] => TEXT:GET:2
)
)
Maybe I didn't understand the requirement, but...
preg_match_all('/{[A-Za-z0-9:-]+}/', $from, $matches, PREG_PATTERN_ORDER);
results in:
Array
(
[0] => Array
(
[0] => {SOMETHING21}
[1] => {SOMETHI32NG:MORE}
[2] => {TEXT:GET:2}
)
)