Capturing date and time between square brackets in PHP using preg_match

Capturing date and time between square brackets in PHP using preg_match - php

I need some way of capturing date and time between square brackets. So for the following string:
$str= '10.1.1.107 - - [27/Oct/2016:06:40:58 +0000] "GET /advise/asi/3571502300/sky/2/con/113 HTTP/1.1"';
I'm tring to get advise and con as follows:
preg_match("/advise\/([a-zA-Z0-9\-]+)\/sky\/2\/.*con\/([0-9]+)/", $str, $matches);
The function returns the following $matches:
Array (
[0] =>
array(2) {
[0]=>
"3571502300"
[1]=>
"113"
}
)
Then I want to get date and time between square brackets, I have the following regular expression:
/\[([0[1-9]|[1-2][0-9]|3[0-1]\/Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec\/20\d\d:\d{2}:\d{2}:\d{2}\+0000)]\]\/advise\/([a-zA-Z0-9\-]+)\/sky\/2\/.* con\/([0-9]+)/
But it captures nothing
Is my regular expression wrong?
I get an array like this:
Array (
[0] =>
array(3) {
[0]=>
27/Oct/2016:06:40:58 +0000
[1]=>
"3571502300"
[2]=>
"113"
}
)

$re = '/\[(?P<dt>\d\d\/[A-Z][a-z]{2}\/\d{4}(?:\:\d\d){3} \+\d{4})\] ' .
'"[A-Z]{3,4} \/advise\/asi\/(?P<asi>\d+)\/sky\/\d+\/con\/(?P<con>\d+)/';
preg_match($re, $str, $m);
var_dump($m['dt'], $m['asi'], $m['con']);
// or, if your prefer numeric indices:
//var_dump($m[1], $m[2], $m[3]);
Output
string(26) "27/Oct/2016:06:40:58 +0000"
string(10) "3571502300"
string(3) "113"
Description
The values are captured using named subpatterns in the form:
(?P<name>pattern)
where name is the key name in the matches array.
(?:\:\d\d){3} is a non-capturing group for the part after the year (in particular, :06:40:58).
The rest is simple.
Errors in your Regular Expression
Note that in the sample code above the square brackets are escaped with a backslash: \[, \], since in regular expressions they mean a set of characters. You didn't escape the square brackets, so the characters between are interpreted as a set of characters.
The part sky\/2\/.* con\/ is wrong because the original string doesn't contain spaces before con/.
You have hard-coded the timezone offset (\+0000). Although it is unlikely that the timezone will change on your host, it still is possible. So it is better to write it in a more genetic form, e.g. \+\d{4}.

You need to group your alternative versions, otherwise the or affects the whole regex.
For example:
^12|34$
Allows 12 or 34 but
^1(2|3)4$
Allows 124 or 134.
Your string also has a space between the timezone offset and the seconds so you need to add that literally (or you could use the \h metacharacter).
Demo: https://regex101.com/r/ykuAP9/3
So the regex should be:
~\[((?:[0[1-9]|[1-2][0-9]|3[0-1])/(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/20\d\d:\d{2}:\d{2}:\d{2} \+0000)\]~

Related

PHP preg_split curly brackets

Who can help me out?
I have a string like this:
$string = '<p>{titleInformation}<p>';
I want to split this string so that I get the following array:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
I'm new to regular expressions and I tried multiple patterns with the preg_match_all() function but I cant get the correct one. Also looked at this question PHP preg_split if not inside curly brackets, but I don't have spaces in my string.
Thank you in advance.

Use preg_match() with capture groups. You need to escape the curly braces because they have special meaning in regular expressions.
preg_match('/(.*?)(\\{[^}]*\\})(.*)/', $string, $match);
var_dump($match);
Result:
array(4) {
[0]=>
string(24) "<p>{titleInformation}<p>"
[1]=>
string(3) "<p>"
[2]=>
string(18) "{titleInformation}"
[3]=>
string(3) "<p>"
}
$match[0] contains the match for the entire regexp, elements 1-3 contain the parts that you want.

In my opinion, the best function to call for your task is: preg_split(). It has a flag called PREG_SPLIT_DELIM_CAPTURE which allows you to retain your chosen delimiter in the output array. It is a very simple technique to follow and using negated character classes ([^}]*) is a great way to speed up your code. Further benefits of using preg_split() versus preg_match() include:
improved efficiency due to less capture groups
shorter pattern which is easier to read
no useless "fullstring" match in the output array
Code: (PHP Demo) (Pattern Demo)
$string = '<p>{titleInformation}<p>';
var_export(
preg_split('/({[^}]*})/', $string, 0, PREG_SPLIT_DELIM_CAPTURE)
);
Output:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
If this answer doesn't work for all of your use cases, please edit your question to include the sample input strings and ping me -- I will update my answer.

With preg_split it can be done this way
preg_split('/[{}]+/', $myString);

PHP: How to split a string in substrings that match and don't match a regex pattern?

I'm new here and need your help. I want to match an string that is an URI - and get an array of substrings.
This is my input string:
/part-of-{INTERESETINGSTUFF}/an-url/{MORESTUFF}
I want this output array:
array(4) {
[0]=>
string(9) "/part-of-"
[1]=>
string(19) "{INTERESETINGSTUFF}"
[2]=>
string(8) "/an-url/"
[3]=>
string(11) "{MORESTUFF}"
}
I'm already able to preg_match everything with curly brackets by the pattern \{[^}]+\}. But how can I achieve my desired result?
Best regards!

What about this. See on regex101
\/.*?(?={|\/)\/?|{.*?}
Then store the matches in an array.
\/.*? matches all character following /
(?={|\/) stops match if next character is { or /
\/? match optional / at the end of match
{.*?} matches brackets and everything between

You can use a very simple ({[^{}]*}) regex matching {, zero or more symbols other than { and } with preg_split and PREG_SPLIT_DELIM_CAPTURE option (that will put all the captured subvalues into the resulting array):
$str = '/part-of-{INTERESETINGSTUFF}/an-url/{MORESTUFF}';
$res = array_filter(preg_split('/({[^{}]*})/', $str, -1, PREG_SPLIT_DELIM_CAPTURE));
print_r($res);
// => Array( [0] => /part-of- [1] => {INTERESETINGSTUFF} [2] => /an-url/ [3] => {MORESTUFF} )
See PHP demo
The array_filter is used to remove all empty elements from the resulting array.

php preg_match_all returning array of arrays

I want to replace some template tags:
$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);
output is:
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "{first}"
}
[1]=> array(2) {
[0]=> string(4) "name"
[1]=> string(5) "first"
}
}
why are there inside 2 arrays? How to achieve only second one?

The sort answer:
Is there an alternative? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily:
preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);
Will dump this:
array(1) {
[0]=>
array(2) {
[0]=>
string(4) "name"
[1]=>
string(5) "first"
}
}
The pattern:
(?<=\{): positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it)
\w+: word characters are matches
(?=}): only match preceding pattern if it is followed by a } character (but don't capture the } char)
It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them
Explaining this $matches array structure a bit:
The reason why $matches looks the way it does is quite simple: when using preg_match(_all), the first entry in the match array will always be the entire string matched by the given regex. That's why I used zero-width lookaround assertions, instead of groups. Your expression matches "{name}" in its entirety, and extracts "name" through grouping.
The matches array will hold the full match on index 0, and add groups at every subsequent index, in your case that means that:
$matches[0] will contain all substrings matching /\{\w+\}/ as a pattern.
$matches[1] will contain all substrings that were captured (/\{(\w+)\}/ captures (\w+)).
If you were to have a regex like this: /\{((\w)([^}]+))}/ the matches array will look something like this:
[
0 => [
'{name}',//as if you'd written /\{\w[^}]+}/
],
1 => [
'name',//matches group (\w)([^}]+), as if you wrote (\w[^}]+)
],
2 => [
'n',//matches (\w) group
],
3 => [
'ame',//and this is the ([^}]+) group obviously
]
]
Why? simple because the pattern contains 3 matching groups. Like I said: the first index in the matches array will always be the full match, regardless of capture groups. The groups are then appended to the array in the order the appear in in the expression. So if we analyze the expression:
\{: not matches, but part of the pattern, will only be in the $matches[0] values
((\w)([^}]+)): Start of first matching group, \w[^}]+ match is grouped here, $matches[1] will contain these values
(\w): Second group, a single \w char (ie first character after {. $matches[2] will therefore contain all first characters after a {
([^}]+): Third group, matches rest of string after {\w until a } is encountered, this will make out the $matches[3] values
To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101. Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. For example:
/\{((\w)([^}]+))}/
Is broken down like this:
/\{((\w)([^}]+))}/
\{ matches the character { literally
1st Capturing group ((\w)([^}]+))
2nd Capturing group (\w)
\w match any word character [a-zA-Z0-9_]
3rd Capturing group ([^}]+)
[^}]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
} the literal character }
} matches the character } literally
Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters.
Of course, this may leave you wondering why $matches is a 2D array. Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. What you Can't predict, though, is how many matches you'll find.
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this:
$total = count($matches);
foreach ($matches[0] as $k => $full) {
echo $full . ' contains: ' . PHP_EOL;
for ($i=1;$i<$total;++$i) {
printf(
'Group %d: %s' . PHP_EOL,
$i, $matches[$i][$k]
);
}
}
If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too

Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. The first one[0] ist always the whole match.
If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. Doing this would result in the following
array(2) {
[0]=> array(2) {
[0]=> string(6) "{name}"
[1]=> string(7) "name"
}
[1]=> array(2) {
[0]=> string(4) "{first}"
[1]=> string(5) "first"
}
}
this could be easier if you loop over your result in a foreach loop.
If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1]

How preg_match_all() processes strings?

I'm still learning a lot about PHP and string alteration is something that is of interest to me. I've used preg_match before for things like validating an email address or just searching for inquiries.
I just came from this post What's wrong in my regular expression? and was curious as to why the preg_match_all function produces 2 strings, 1 w/ some of the characters stripped and then the other w/ the desired output.
From what I understand about the function is that it goes over the string character by character using the RegEx to evaluate what to do with it. Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
and so you don't have to go to the other thread
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
for($i=0;$i<count($newStr[0]);$i++)
{
echo $newStr[0][$i].'<br>';
}
echo '<br><br><br>';
for($i=0;$i<count($newStr[1]);$i++)
{
echo $newStr[1][$i].'<br>';
}
This will output
^Jony~^Smith~^example-free#wpdevelop.com~JonySmithexample-free#wpdevelop.com
I'm curious if the reason for 2 array entries was due to the original sytax of the string or if it is the normal processing response of the function. Sorry if this shouldn't be here, but I'm really curious as to how this works.
thanks,
Brodie

It's standard behavior for preg_match and preg_match_all - the first string in the "matched values" array is the FULL string that was caught by the regex pattern. The subsequent array values are the 'capture groups', whose existence depends on the placement/position of () pairs in the regex pattern.
In your regex's case, /\^([^^]*?)\~/, the full matching string would be
^ Jony ~
| | |
^ ([^^]*?) ~ -> $newstr[0] = ^Jony~
-> $newstr[1] = Jony (due to the `()` capture group).

Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
Absolutely. Use assertions. This regex:
preg_match_all('/(?<=\^)[^^]*?(?=~)/', $str, $newStr);
Results in:
Array
(
[0] => Array
(
[0] => Jony
[1] => Smith
[2] => example-free#wpdevelop.com
)
)

As the manual states, this is the expected result (for the default PREG_PATTERN_ORDER flag). The first entry of $newStr contains all full pattern matches, the next result all matches for the first subpattern (in parentheses) and so on.

The first array in the result of preg_match_all returns the strings that match the whole pattern you passed to the preg_match_all() function, in your case /\^([^^]*?)\~/. Subsequent arrays in the result contain the matches for the parentheses in your pattern. Maybe it is easier to understand with an example:
$string = 'abcdefg';
preg_match_all('/ab(cd)e(fg)/', $string, $matches);
The $matches array will be
array(3) {
[0]=>
array(1) {
[0]=>
string(7) "abcdefg"
}
[1]=>
array(1) {
[0]=>
string(2) "cd"
}
[2]=>
array(1) {
[0]=>
string(2) "fg"
}
}
The first array will contain the match of the entire pattern, in this case 'abcdefg'. The second array will contain the match for the first set of parentheses, in this case 'cd'. The third array will contain the match for the second set of parentheses, in this case 'fg'.

[0] contains entire match, while [1] only a portion (the part you want to extract)...
You can do var_dump($newStr) to see the array structure, you'll figure it out.
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
$newStr = $newStr[1];
foreach($newStr as $key => $value)
{
echo $value."\n";
}
This will result in... (weird result, haven't modified expression)
Jony
Smith
example-
free#wpdevelop.com

Whenever you have problems to imagine the function of preg_match_all you should use an evaluator like preg_match_all tester # regextester.net
This shows you the result in realtime and you can configure things like the result order, meta instructions, offset capturing and many more.

PHP preg_match_all RegEx conflict

if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[/protected\]).)+)\s*\[/protected\]/g", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
The text is
<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>
But $matches when var_dump ed (outside the if statement), it gives out NULL
Help people!

You're using / (slash) as the regex delimiter, but you also have unescaped slashes in the regex. Either escape them or (preferably) use a different delimiter.
There's no g modifier in PHP regexes. If you want a global match, you use preg_match_all(); otherwise you use preg_match().
...but there is an s modifier, and you should be using it. That's what enables . to match newlines.
After changing your regex to this:
'~\[protected\]\s*((?:(?!\[/?protected\]).)+?)\s*\[/protected\]~s'
...I get this output:
array(2) {
[0]=>
array(1) {
[0]=>
string(42) "[protected]<br> STUFFFFFF<br>
[/protected]"
}
[1]=>
array(1) {
[0]=>
string(18) "<br> STUFFFFFF<br>"
}
}
string(93) "<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>"
Additional changes:
I switched to using single-quotes around the regex; double-quotes are subject to $variable interpolation and {embedded code} evaluation.
I shortened the lookahead expression by using an optional slash (/?).
I switched to using a reluctant plus (+?) so the whitespace following the closing tag doesn't get included in the capture group.
I changed the innermost group from capturing to non-capturing; it was only saving the last character in the matched text, which seems pointless.

$text= '<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>';
if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[\/protected\]).)+)\s*\[\/protected\]/x", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
There is no g modifier in preg_match - you can read more at Pattern Modifiers . Using x modifier works fine thou.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Capturing date and time between square brackets in PHP using preg_match - php

Related

PHP preg_split curly brackets

PHP: How to split a string in substrings that match and don't match a regex pattern?

php preg_match_all returning array of arrays

How preg_match_all() processes strings?

PHP preg_match_all RegEx conflict

Categories

Resources