Who can help me out?
I have a string like this:
$string = '<p>{titleInformation}<p>';
I want to split this string so that I get the following array:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
I'm new to regular expressions and I tried multiple patterns with the preg_match_all() function but I cant get the correct one. Also looked at this question PHP preg_split if not inside curly brackets, but I don't have spaces in my string.
Thank you in advance.
Use preg_match() with capture groups. You need to escape the curly braces because they have special meaning in regular expressions.
preg_match('/(.*?)(\\{[^}]*\\})(.*)/', $string, $match);
var_dump($match);
Result:
array(4) {
[0]=>
string(24) "<p>{titleInformation}<p>"
[1]=>
string(3) "<p>"
[2]=>
string(18) "{titleInformation}"
[3]=>
string(3) "<p>"
}
$match[0] contains the match for the entire regexp, elements 1-3 contain the parts that you want.
In my opinion, the best function to call for your task is: preg_split(). It has a flag called PREG_SPLIT_DELIM_CAPTURE which allows you to retain your chosen delimiter in the output array. It is a very simple technique to follow and using negated character classes ([^}]*) is a great way to speed up your code. Further benefits of using preg_split() versus preg_match() include:
improved efficiency due to less capture groups
shorter pattern which is easier to read
no useless "fullstring" match in the output array
Code: (PHP Demo) (Pattern Demo)
$string = '<p>{titleInformation}<p>';
var_export(
preg_split('/({[^}]*})/', $string, 0, PREG_SPLIT_DELIM_CAPTURE)
);
Output:
array (
0 => '<p>',
1 => '{titleInformation}',
2 => '<p>',
)
If this answer doesn't work for all of your use cases, please edit your question to include the sample input strings and ping me -- I will update my answer.
With preg_split it can be done this way
preg_split('/[{}]+/', $myString);
Related
I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
I'd like to extract the numbers specifically with a PHP regex expression, I don't get the regex very much although I'm currently trying with the regex101 website. Thing is, I have this:
66
28006 MadridVer teléfono
(Literally that, it's seen with a lot of more spaces and 28006 MadridVer teléfono is presented in the next line actually). And I'd like to extract the number 28006 or at least split the findings of the expression in a way I have the 28006 separately in one of the groups. What would be my php regex expresion like? Maybe apart from capturing spaces I should capture a new line or something. But I am totally lost in this (yes, I'm an absolute regex novice yet).
I don't see a need for regex.
Remove the new line and explode on space.
Then use array_filter to remove empty values from the array and rearrange the array with array_values.
$str = "66
28006 MadridVer teléfono";
$str = str_replace("\n", " ", $str);
$arr = explode(" ", $str);
$arr = array_values(array_filter($arr));
var_dump($arr);
Returns:
array(4) {
[0]=>
string(2) "66"
[1]=>
string(5) "28006"
[2]=>
string(9) "MadridVer"
[3]=>
string(9) "teléfono"
}
I need some way of capturing date and time between square brackets. So for the following string:
$str= '10.1.1.107 - - [27/Oct/2016:06:40:58 +0000] "GET /advise/asi/3571502300/sky/2/con/113 HTTP/1.1"';
I'm tring to get advise and con as follows:
preg_match("/advise\/([a-zA-Z0-9\-]+)\/sky\/2\/.*con\/([0-9]+)/", $str, $matches);
The function returns the following $matches:
Array (
[0] =>
array(2) {
[0]=>
"3571502300"
[1]=>
"113"
}
)
Then I want to get date and time between square brackets, I have the following regular expression:
/\[([0[1-9]|[1-2][0-9]|3[0-1]\/Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec\/20\d\d:\d{2}:\d{2}:\d{2}\+0000)]\]\/advise\/([a-zA-Z0-9\-]+)\/sky\/2\/.* con\/([0-9]+)/
But it captures nothing
Is my regular expression wrong?
I get an array like this:
Array (
[0] =>
array(3) {
[0]=>
27/Oct/2016:06:40:58 +0000
[1]=>
"3571502300"
[2]=>
"113"
}
)
$re = '/\[(?P<dt>\d\d\/[A-Z][a-z]{2}\/\d{4}(?:\:\d\d){3} \+\d{4})\] ' .
'"[A-Z]{3,4} \/advise\/asi\/(?P<asi>\d+)\/sky\/\d+\/con\/(?P<con>\d+)/';
preg_match($re, $str, $m);
var_dump($m['dt'], $m['asi'], $m['con']);
// or, if your prefer numeric indices:
//var_dump($m[1], $m[2], $m[3]);
Output
string(26) "27/Oct/2016:06:40:58 +0000"
string(10) "3571502300"
string(3) "113"
Description
The values are captured using named subpatterns in the form:
(?P<name>pattern)
where name is the key name in the matches array.
(?:\:\d\d){3} is a non-capturing group for the part after the year (in particular, :06:40:58).
The rest is simple.
Errors in your Regular Expression
Note that in the sample code above the square brackets are escaped with a backslash: \[, \], since in regular expressions they mean a set of characters. You didn't escape the square brackets, so the characters between are interpreted as a set of characters.
The part sky\/2\/.* con\/ is wrong because the original string doesn't contain spaces before con/.
You have hard-coded the timezone offset (\+0000). Although it is unlikely that the timezone will change on your host, it still is possible. So it is better to write it in a more genetic form, e.g. \+\d{4}.
You need to group your alternative versions, otherwise the or affects the whole regex.
For example:
^12|34$
Allows 12 or 34 but
^1(2|3)4$
Allows 124 or 134.
Your string also has a space between the timezone offset and the seconds so you need to add that literally (or you could use the \h metacharacter).
Demo: https://regex101.com/r/ykuAP9/3
So the regex should be:
~\[((?:[0[1-9]|[1-2][0-9]|3[0-1])/(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)/20\d\d:\d{2}:\d{2}:\d{2} \+0000)\]~
I'm still learning a lot about PHP and string alteration is something that is of interest to me. I've used preg_match before for things like validating an email address or just searching for inquiries.
I just came from this post What's wrong in my regular expression? and was curious as to why the preg_match_all function produces 2 strings, 1 w/ some of the characters stripped and then the other w/ the desired output.
From what I understand about the function is that it goes over the string character by character using the RegEx to evaluate what to do with it. Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
and so you don't have to go to the other thread
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
for($i=0;$i<count($newStr[0]);$i++)
{
echo $newStr[0][$i].'<br>';
}
echo '<br><br><br>';
for($i=0;$i<count($newStr[1]);$i++)
{
echo $newStr[1][$i].'<br>';
}
This will output
^Jony~^Smith~^example-free#wpdevelop.com~JonySmithexample-free#wpdevelop.com
I'm curious if the reason for 2 array entries was due to the original sytax of the string or if it is the normal processing response of the function. Sorry if this shouldn't be here, but I'm really curious as to how this works.
thanks,
Brodie
It's standard behavior for preg_match and preg_match_all - the first string in the "matched values" array is the FULL string that was caught by the regex pattern. The subsequent array values are the 'capture groups', whose existence depends on the placement/position of () pairs in the regex pattern.
In your regex's case, /\^([^^]*?)\~/, the full matching string would be
^ Jony ~
| | |
^ ([^^]*?) ~ -> $newstr[0] = ^Jony~
-> $newstr[1] = Jony (due to the `()` capture group).
Could this RegEx have been structured in such a way as to bypass the first array entry and just produce the desired result?
Absolutely. Use assertions. This regex:
preg_match_all('/(?<=\^)[^^]*?(?=~)/', $str, $newStr);
Results in:
Array
(
[0] => Array
(
[0] => Jony
[1] => Smith
[2] => example-free#wpdevelop.com
)
)
As the manual states, this is the expected result (for the default PREG_PATTERN_ORDER flag). The first entry of $newStr contains all full pattern matches, the next result all matches for the first subpattern (in parentheses) and so on.
The first array in the result of preg_match_all returns the strings that match the whole pattern you passed to the preg_match_all() function, in your case /\^([^^]*?)\~/. Subsequent arrays in the result contain the matches for the parentheses in your pattern. Maybe it is easier to understand with an example:
$string = 'abcdefg';
preg_match_all('/ab(cd)e(fg)/', $string, $matches);
The $matches array will be
array(3) {
[0]=>
array(1) {
[0]=>
string(7) "abcdefg"
}
[1]=>
array(1) {
[0]=>
string(2) "cd"
}
[2]=>
array(1) {
[0]=>
string(2) "fg"
}
}
The first array will contain the match of the entire pattern, in this case 'abcdefg'. The second array will contain the match for the first set of parentheses, in this case 'cd'. The third array will contain the match for the second set of parentheses, in this case 'fg'.
[0] contains entire match, while [1] only a portion (the part you want to extract)...
You can do var_dump($newStr) to see the array structure, you'll figure it out.
$str = 'text^name1^Jony~text^secondname1^Smith~text^email1^example-
free#wpdevelop.com~';
preg_match_all('/\^([^^]*?)\~/', $str, $newStr);
$newStr = $newStr[1];
foreach($newStr as $key => $value)
{
echo $value."\n";
}
This will result in... (weird result, haven't modified expression)
Jony
Smith
example-
free#wpdevelop.com
Whenever you have problems to imagine the function of preg_match_all you should use an evaluator like preg_match_all tester # regextester.net
This shows you the result in realtime and you can configure things like the result order, meta instructions, offset capturing and many more.
I'm trying to use PHP's split() (preg_split() is also an option if your answer works with it) to split up a string on 2 or more \r\n's. My current effort is:
split("(\r\n){2,}",$nb);
The problem with this is it matches every time there is 2 or 3 \r\n's, then goes on and finds the next one. This is ineffective with 4 or more \r\n's.
I need all instances of two or more \r\n's to be treated the same as two \r\n's. For example, I'd need
Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow
to become
array('Hello','My','Name is\r\nShadow');
preg_split() should do it with
$pattern = "/(\\r\\n){2,}/";
What about the following suggestion:
$nb = implode("\r\n", array_filter(explode("\r\n", $nb)));
It works for me:
$nb = "Hello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow";
$parts = split("(\r\n){2,}",$nb);
var_dump($parts);
var_dump($parts === array('Hello','My',"Name is\r\nShadow"));
Prints:
array(3) {
[0]=>
string(5) "Hello"
[1]=>
string(2) "My"
[2]=>
string(15) "Name is
Shadow"
}
bool(true)
Note the double quotes in the second test to get the characters represented by \r\n.
Adding the PREG_SPLIT_NO_EMPTY flag to preg_replace() with Tomalak's pattern of "/(\\r\\n){2,}/" accomplished this for me.
\R is shorthand for matching newline sequences across different operating systems. You can prevent empty elements being created at the start and end of your output array by using the PREG_SPLIT_NO_EMPTY flag or you could call trim() on the string before splitting.
Code: (Demo)
$string = "\r\n\r\nHello\r\n\r\nMy\r\n\r\n\r\n\r\n\r\n\r\nName is\r\nShadow\r\n\r\n\r\n\r\n";
var_export(preg_split('~\R{2,}~', $string, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_split('~\R{2,}~', trim($string)));
Output from either technique:
array (
0 => 'Hello',
1 => 'My',
2 => 'Name is
Shadow',
)