REGEX condition with non-capture groups - php

I've got the Regular Expression:
([A-Za-z0-9_]+?)[ ]?(\()?(?(2)([A-Za-z0-9=\-\/°%= ]*)\))_([A-Za-z0-9]*)$
^
|
condition
It should match following:
name (unit)_type
name(unit)_type
long_name_type
name_type
The problem is that I've got 4 capture groups instead of 3:
[1] => Array
(
[0] => name
)
[2] => Array
(
[0] => (
)
[3] => Array
(
[0] => unit
)
[4] => Array
(
[0] => type
)
However when I change the capture group for parenthesis to non-capture group like that:
([A-Za-z0-9_]+?)[ ]?(?:\()?(?(2)([A-Za-z0-9=\-\/°%= ]*)\))_([A-Za-z0-9]*)$
^
|
here
It does not work.
Is there any chance to get matches like that?
[1] => Array
(
[0] => name
)
[2] => Array
(
[0] => unit
)
[3] => Array
(
[0] => type
)
EDIT:
After all your tips I've simplified it like that:
(\w+?) *(?:\(([A-Za-z0-9\/°%= -]*)\))?_([A-Za-z0-9]*)$

It doesn't look like you really need that regex condition.
Why not simply use an optional non-capture group:
([A-Za-z0-9_]+?)[ ]?(?:\(([A-Za-z0-9=\-\/°% ]*)\))?_([A-Za-z0-9]*)$
^^^^ ^
regex101 demo
[Note: you have 2 = signs in the character class, I removed one of them since it's redundant to use two in a character class]

Looks like you could simplify it down quite a bit using \w and eliminating some unnecessary character classes. You can then use your non-capture groups:
(\w+?) *(?:\(([A-Za-z0-9\/°%= -]*)\))?_([A-Za-z0-9]*)$
Working example: http://regex101.com/r/wZ8nP8
Also, you don't need to escape - in a character class if it's at the beginning or end.
Per suggestion by #nhahtdh fixed up the last section to exclude _ (back to character class). Also noticed that the previous example broke long_name.

Related

PHP Regex containing its limiters as ocurrences

I have this string:
{include="folder/file" vars="key:value"}
I have a regex to catch the file and the vars like this:
|\{include\=[\'\"](.*)\/(.*)[\'\"](.*)\}|U
First (.*) = folder
Second (.*) = file
Third (.*) = params (and I have some functions to parse it)
But there are some cases where I need to catch the params where they contains brackets {}. Like this:
{include="file" vars="key:{value}"}
The regext is working but it catches the results only until the first closing bracket. Like this:
{include="file" vars="key:{value}
So some part of the code remains out.
How can I make to allow those brackets as part of the results instead as a closing limiter???
Thanks!
You can use this regex:
\{include=['"](?:(.*)\/(.*?)|(\w+))['"] vars="(.*?)"\}
Working demo
MATCH 1
1. [10-16] `folder`
2. [17-21] `file`
4. [29-38] `key:value`
MATCH 2
3. [51-55] `file`
4. [63-74] `key:{value}`
Having in mind what #naomik said, I think I should change my regex.
What I want to make now is detecting this structure:
{word="value" word="value" ... n times}
I have this regex: (\w+)=['"](.*?)['"]
it detects :
{include="folder/file"}
{include="folder/file" vars="key:value"}
{vars="key:{value}" include="folder/file"} (order changed)
it works fine BUT I dont know how to add the initial and final brackets to the regex. When I add them it doesnt work like I want anymore
Live Demo
Another robust regexp that covers your first question :
preg_match_all("{include=[\"']{1}([^\"']+)[\"']{1} vars=[\"']{1}([^\"]+)[\"']{1}}", $str, $matches);
You'll get this kind of result into $matches :
Array
(
[0] => Array
(
[0] => {include="folder/file" vars="key:{value}"}
[1] => {include="folder/file" vars="key:value"}
[2] => {include="folder/file" vars="key:value"}
[3] => {include="file" vars="key:{value}"}
)
[1] => Array
(
[0] => folder/file
[1] => folder/file
[2] => folder/file
[3] => file
)
[2] => Array
(
[0] => key:{value}
[1] => key:value
[2] => key:value
[3] => key:{value}
)
)
you can access to what matters this way : $matches[1][0] and $matches[2][0] for the first elem, $matches[1][1] $matches[2][1] for the second, etc.
It does not store folder or file in separate results. For this, you'll have to write a sub piece of code. There is no elegant way to write a regex that is covering both include="folder/file" and include="file".
It does not support the inversion of include and vars. If you want to support this, you'll have to split your input data into chunks (line by line or text between braces) before your try to match the content with something like this :
preg_match_all("([\w]+)=[\"']{1}([^\"']+)[\"']{1}", $chunk, $matches);
then matches will contain something like this :
Array
(
[0] => Array
(
[0] => vars="key:{value}"
[1] => include="folder/file"
)
[1] => Array
(
[0] => vars
[1] => include
)
[2] => Array
(
[0] => key:{value}
[1] => folder/file
)
)
Then you know that $matches[1][0] contains 'vars', you can gets vars value in $matches[2][0]. For $matches[1][1] it contais 'include', you can then get 'folder/file' in $matches[2][1].

Regex - everything except slash

I wrote small routing system, but I have problem with it. My regex is reading slash as a normal string and I've got confused how to make it working. For example:
I defined route home/[a-zA-Z0-9_] that will show profile, but i defined also home/user/\d. When I write down second case, home/user/45, it will write down first case. It will take user/45 as one string. How I can exclude that / using regex.
Have you tried something like this?
[^\/]+
Or better
[a-zA-Z0-9_]+[^\/]+
If you go to regexr.com and put your string ( home/user/45 ) it will select only home,user,45 (excepting the slash /)
You should match the following pattern:
/home\/user\/(\\d+)/
And replace it with the following:
home/user$1
In the first regex, I used a delimiter: a slash. If a delimiter isn't required, remove the first and the last slash.
try home/(?!user/)[a-zA-Z0-9_]
for first case
Try one of these:
<?php
$path = '/home/user/45';
preg_match_all('/home\/(\w+)/', $path, $matches);
/* Would set $matches to Array:
(
[0] => Array
(
[0] => home/user
)
[1] => Array
(
[0] => user
)
)
*/
preg_match_all('/home\/(\w+)/(\d+)/', $path, $matches);
/* Would set $matches to Array:
(
[0] => Array
(
[0] => home/user/45
)
[1] => Array
(
[0] => user
)
[2] => Array
(
[0] => 45
)
)
?>
Since you are trusting the regex on this you really should limit the pattern. And also another trick to make patterns more readable is to use another pattern character:
|^/home/(\w+)/?$|
|^/home/user/(\d+)/?$|

how to remove unuseful array items in preg_match_all result?

how to remove unuseful array items in preg_match_all result?
some of the items in the regex is not useful for me , I don't want them display in my $result array , how can I do it? I remmbered that preg_match can remove not useful "(xxx)" when get the result , but i don't remember how to code it now
<?php
$url='http://www.new_pm.com/fr/lookbook/2.html';
preg_match_all('#([a-z]{2})?(lookbook)/?(\d+)?(\.html)?#',$url,$result);
print_r($result);
/* -------
Array
(
[0] => Array
(
[0] => lookbook/2.html
)
[1] => Array // I don't want $result has this item
(
[0] =>
)
[2] => Array
(
[0] => lookbook
)
[3] => Array
(
[0] => 2
)
[4] => Array // I don't want $result has this item
(
[0] => .html
)
)
------- */
?>
Every time you add parentheses into a pattern, that captures whatever was matched inside those parentheses and returns it in the result. Not only can this be annoying as in your case, it's also unnecessary overhead. For those reasons, whenever you don't actually need the result, either remove the parentheses (if possible) or use a non-capturing group (?:...) if you do need the grouping:
#(?:[a-z]{2})?(lookbook)/?(\d+)?(?:\.html)?#
Note that (\d+)? is the same as (\d*) (not in all cases and all flavors, but in your case it is):
#(?:[a-z]{2})?(lookbook)/?(\d*)(?:\.html)?#
Working demo.

split regular expression php

I have a string like that :
0d(Hi)i(Hello)4d(who)i(where)540d(begin)i(began)
And i want to make it an array with that.
I try first to add separator, in order to use the php function explode.
;0,d(Hi),i(Hello);4,d(who),i(where);540,d(begin),i(began)
It works but the problem is I want to minimize the separator to save disk space.
Therefore i want to know by using preg_split, regular expression, if it's possible to have a huge array like that without using separator :
Array ( [0] => Array ( [0] => 0 [1] => d(hi) [2] => i(Hello) )
[1] => Array ( [0] => 4 [1] => d(who) [2] => i(where) )
[2] => Array ( [0] => 540 [1] => d(begin) [2] => i(began) )
)
I try some code & regex, but I saw that the value in the regular expression was not present in the final result (like explode function, in the final array we do not have the delimitor.)
More over, i have some difficulties to build the regex. Here is the one that I made :
$modif = preg_split("/[0-9]+(d(.+))?(i(.+))?/", $data);
I must precise that d() and i() can not be present (but at least one)
Thanks
If you do
preg_match_all('/(\d+)(d\([^()]*\))?(i\([^()]*\))?/', $subject, $result, PREG_SET_ORDER);
on your original string, then you'll get an array where
$result[$i][0]
contains the ith match (i. e. $result[0][0] would be 0d(Hi)i(Hello)) and where
$result[$i][$c]
contains the cth capturing group of the ith match (i. e. $result[0][1] is 0, $result[0][2] is d(Hi) and $result[0][2] is i(Hello)).
Is that what you wanted?

Unexpected result with very simple regexp

I am fairly new to regexp and have encountered a regexp that delivers an unexpected result, when trying to match name parts in name of the form firstname-fristname firstname:
preg_match_all('/([^- ])*/i', 'aNNA-äöå Åsa', $result);
gives a print_r($result) that looks like this:
Array
(
[0] => Array
(
[0] => aNNA
[1] =>
[2] => äöå
[3] =>
[4] => Åsa
[5] =>
)
[1] => Array
(
[0] => A
[1] =>
[2] => å
[3] =>
[4] => a
[5] =>
)
)
Now the $result[0] has the items I would want and expect as result, but where the heck do the $results[1] come from - I see it's the word endings, but how come they are matched?
And as a little side question, how do I prevent the empty matches ($results[0][1], $results[0][3], ...), or better even: Why do they show up - they are not not- or not-space either?
Have a try with:
preg_match_all('/([^- ]+)/', 'aNNA-äöå Åsa', $result);
Your regex:
/([^- ])*/i
means: find one char that is not ^ or space and keep it in a group 0 or more times
This one:
/([^- ]+)/
means: find one or more char that is not ^ or space and keep it in a group
Moreover, there's no need for case insensitive.
The * means "0 or more of the preceding." Since a "-" is exactly 0 of the the character class, it is matched. However, since it is omitted from the character class, the capture fails to grab anything, leaving you an empty entry. The expression giving you the expected behavior would be:
preg_match_all('/([^- ])+/i', 'aNNA-äöå Åsa', $result);
("+" means "1 or more of the preceding.")
http://php.net/manual/en/function.preg-match-all.php says:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
Check the URL for more details

Categories