how to remove unuseful array items in preg_match_all result? - php

how to remove unuseful array items in preg_match_all result?
some of the items in the regex is not useful for me , I don't want them display in my $result array , how can I do it? I remmbered that preg_match can remove not useful "(xxx)" when get the result , but i don't remember how to code it now
<?php
$url='http://www.new_pm.com/fr/lookbook/2.html';
preg_match_all('#([a-z]{2})?(lookbook)/?(\d+)?(\.html)?#',$url,$result);
print_r($result);
/* -------
Array
(
[0] => Array
(
[0] => lookbook/2.html
)
[1] => Array // I don't want $result has this item
(
[0] =>
)
[2] => Array
(
[0] => lookbook
)
[3] => Array
(
[0] => 2
)
[4] => Array // I don't want $result has this item
(
[0] => .html
)
)
------- */
?>

Every time you add parentheses into a pattern, that captures whatever was matched inside those parentheses and returns it in the result. Not only can this be annoying as in your case, it's also unnecessary overhead. For those reasons, whenever you don't actually need the result, either remove the parentheses (if possible) or use a non-capturing group (?:...) if you do need the grouping:
#(?:[a-z]{2})?(lookbook)/?(\d+)?(?:\.html)?#
Note that (\d+)? is the same as (\d*) (not in all cases and all flavors, but in your case it is):
#(?:[a-z]{2})?(lookbook)/?(\d*)(?:\.html)?#
Working demo.

Related

PHP Regex containing its limiters as ocurrences

I have this string:
{include="folder/file" vars="key:value"}
I have a regex to catch the file and the vars like this:
|\{include\=[\'\"](.*)\/(.*)[\'\"](.*)\}|U
First (.*) = folder
Second (.*) = file
Third (.*) = params (and I have some functions to parse it)
But there are some cases where I need to catch the params where they contains brackets {}. Like this:
{include="file" vars="key:{value}"}
The regext is working but it catches the results only until the first closing bracket. Like this:
{include="file" vars="key:{value}
So some part of the code remains out.
How can I make to allow those brackets as part of the results instead as a closing limiter???
Thanks!
You can use this regex:
\{include=['"](?:(.*)\/(.*?)|(\w+))['"] vars="(.*?)"\}
Working demo
MATCH 1
1. [10-16] `folder`
2. [17-21] `file`
4. [29-38] `key:value`
MATCH 2
3. [51-55] `file`
4. [63-74] `key:{value}`
Having in mind what #naomik said, I think I should change my regex.
What I want to make now is detecting this structure:
{word="value" word="value" ... n times}
I have this regex: (\w+)=['"](.*?)['"]
it detects :
{include="folder/file"}
{include="folder/file" vars="key:value"}
{vars="key:{value}" include="folder/file"} (order changed)
it works fine BUT I dont know how to add the initial and final brackets to the regex. When I add them it doesnt work like I want anymore
Live Demo
Another robust regexp that covers your first question :
preg_match_all("{include=[\"']{1}([^\"']+)[\"']{1} vars=[\"']{1}([^\"]+)[\"']{1}}", $str, $matches);
You'll get this kind of result into $matches :
Array
(
[0] => Array
(
[0] => {include="folder/file" vars="key:{value}"}
[1] => {include="folder/file" vars="key:value"}
[2] => {include="folder/file" vars="key:value"}
[3] => {include="file" vars="key:{value}"}
)
[1] => Array
(
[0] => folder/file
[1] => folder/file
[2] => folder/file
[3] => file
)
[2] => Array
(
[0] => key:{value}
[1] => key:value
[2] => key:value
[3] => key:{value}
)
)
you can access to what matters this way : $matches[1][0] and $matches[2][0] for the first elem, $matches[1][1] $matches[2][1] for the second, etc.
It does not store folder or file in separate results. For this, you'll have to write a sub piece of code. There is no elegant way to write a regex that is covering both include="folder/file" and include="file".
It does not support the inversion of include and vars. If you want to support this, you'll have to split your input data into chunks (line by line or text between braces) before your try to match the content with something like this :
preg_match_all("([\w]+)=[\"']{1}([^\"']+)[\"']{1}", $chunk, $matches);
then matches will contain something like this :
Array
(
[0] => Array
(
[0] => vars="key:{value}"
[1] => include="folder/file"
)
[1] => Array
(
[0] => vars
[1] => include
)
[2] => Array
(
[0] => key:{value}
[1] => folder/file
)
)
Then you know that $matches[1][0] contains 'vars', you can gets vars value in $matches[2][0]. For $matches[1][1] it contais 'include', you can then get 'folder/file' in $matches[2][1].

Regex - Does not contain certain Characters preg_match

I need a regex that match if the array contain certain it could anywhere for example, this array :
Array
(
[1] => Array
(
[0] => http://www.test1.com
[1] => 4
[2] => 4
)
[2] => Array
(
[0] => http://www.test2.fr/blabla.html
[1] => 2
[2] => 2
)
[3] => Array
(
[0] => http://www.stuff.com/admin/index.php
[1] => 2
[2] => 2
)
[4] => Array
(
[0] => http://www.test3.com/blabla/bla.html
[1] => 2
[2] => 2
)
[5] => Array
(
[0] => http://www.stuff.com/bla.html
[1] => 2
[2] => 2
)
I want to return all but the array that have the word stuff in it, and when i try to test with this it doesn't quite work :
return !preg_match('/(stuff)$/i', $element[0]);
any solution for that ?
Thanks
You don't need a regular expression for performing a simple search. Use array_filter() in conjunction with strpos():
$result = array_filter($array, function ($elem) {
return (strpos($elem[0], 'stuff') !== FALSE);
});
Now, to answer your question, your current regex pattern will only match strings that contain stuff at the end of the line. You don't want that, so get rid of the "end of the line" anchor $ from your regex.
The updated regex should look like below:
return !preg_match('/stuff/i', $element[0]);
If the actual use-case is different from what is shown in your question and if the operation involves more than just a simple pattern matching, then preg_match() is the right tool. As shown above, this can be used with array_filter() to create a new array that satisifes your requirements.
Here's how you'd do it with a callback function:
$result = array_filter($array, function ($elem) {
return preg_match('/stuff/i', $elem[0]);
});
Note: The actual regex might be more complex - I've used /stuff/ as an example. Also, note that I've removed the negation !... from the statement.
Your pattern will only match a string where stuff appears at the end of the string or line. To fix this, just get rid of the end anchor ($):
return !preg_match('/stuff/i', $element[0]);

REGEX condition with non-capture groups

I've got the Regular Expression:
([A-Za-z0-9_]+?)[ ]?(\()?(?(2)([A-Za-z0-9=\-\/°%= ]*)\))_([A-Za-z0-9]*)$
^
|
condition
It should match following:
name (unit)_type
name(unit)_type
long_name_type
name_type
The problem is that I've got 4 capture groups instead of 3:
[1] => Array
(
[0] => name
)
[2] => Array
(
[0] => (
)
[3] => Array
(
[0] => unit
)
[4] => Array
(
[0] => type
)
However when I change the capture group for parenthesis to non-capture group like that:
([A-Za-z0-9_]+?)[ ]?(?:\()?(?(2)([A-Za-z0-9=\-\/°%= ]*)\))_([A-Za-z0-9]*)$
^
|
here
It does not work.
Is there any chance to get matches like that?
[1] => Array
(
[0] => name
)
[2] => Array
(
[0] => unit
)
[3] => Array
(
[0] => type
)
EDIT:
After all your tips I've simplified it like that:
(\w+?) *(?:\(([A-Za-z0-9\/°%= -]*)\))?_([A-Za-z0-9]*)$
It doesn't look like you really need that regex condition.
Why not simply use an optional non-capture group:
([A-Za-z0-9_]+?)[ ]?(?:\(([A-Za-z0-9=\-\/°% ]*)\))?_([A-Za-z0-9]*)$
^^^^ ^
regex101 demo
[Note: you have 2 = signs in the character class, I removed one of them since it's redundant to use two in a character class]
Looks like you could simplify it down quite a bit using \w and eliminating some unnecessary character classes. You can then use your non-capture groups:
(\w+?) *(?:\(([A-Za-z0-9\/°%= -]*)\))?_([A-Za-z0-9]*)$
Working example: http://regex101.com/r/wZ8nP8
Also, you don't need to escape - in a character class if it's at the beginning or end.
Per suggestion by #nhahtdh fixed up the last section to exclude _ (back to character class). Also noticed that the previous example broke long_name.

split regular expression php

I have a string like that :
0d(Hi)i(Hello)4d(who)i(where)540d(begin)i(began)
And i want to make it an array with that.
I try first to add separator, in order to use the php function explode.
;0,d(Hi),i(Hello);4,d(who),i(where);540,d(begin),i(began)
It works but the problem is I want to minimize the separator to save disk space.
Therefore i want to know by using preg_split, regular expression, if it's possible to have a huge array like that without using separator :
Array ( [0] => Array ( [0] => 0 [1] => d(hi) [2] => i(Hello) )
[1] => Array ( [0] => 4 [1] => d(who) [2] => i(where) )
[2] => Array ( [0] => 540 [1] => d(begin) [2] => i(began) )
)
I try some code & regex, but I saw that the value in the regular expression was not present in the final result (like explode function, in the final array we do not have the delimitor.)
More over, i have some difficulties to build the regex. Here is the one that I made :
$modif = preg_split("/[0-9]+(d(.+))?(i(.+))?/", $data);
I must precise that d() and i() can not be present (but at least one)
Thanks
If you do
preg_match_all('/(\d+)(d\([^()]*\))?(i\([^()]*\))?/', $subject, $result, PREG_SET_ORDER);
on your original string, then you'll get an array where
$result[$i][0]
contains the ith match (i. e. $result[0][0] would be 0d(Hi)i(Hello)) and where
$result[$i][$c]
contains the cth capturing group of the ith match (i. e. $result[0][1] is 0, $result[0][2] is d(Hi) and $result[0][2] is i(Hello)).
Is that what you wanted?

Regular Expression with wordpress shortcodes

I'm trying to find all shortcodes within a string which looks like this:
 [a_col] One
 [/a_col]
outside
[b_col]
Two
[/b_col] [c_col] Three [/c_col]
I need the content (eg "Three") and the letter from the col (a, b or c)
Here's the expression I'm using
preg_match_all('#\[(a|b|c)_col\](.*)\[\/\1_col\]#m', $string, $hits);
but $hits contains only the last one.
The content can have any character even "[" or "]"
EDIT:
I would like to get "outside" as well which can be any string (except these cols). How can I handle that or should I parse this in a second step?
This will capture anything in the content, as well as attributes, and will allow any characters in the content.
<?php
$input = '[a_col some="thing"] One[/a_col]
[b_col] Two [/b_col]
[c_col] [Three] [/c_col] ';
preg_match_all('#\[(a|b|c)_col([^\[]*)\](.*?)\[\/\1_col\]#msi', $input, $matches);
print_r($matches);
?>
EDIT:
You may want to then trim the matches, since it appears there may be some whitespace. Alternatively, you can use regex for removing the whitespace in the content:
preg_match_all('#\[(a|b|c)_col([^\[]*)\]\s*(.*?)\s*\[\/\1_col\]#msi', $input, $matches);
OUTPUT:
Array
(
[0] => Array
(
[0] => [a_col some="thing"] One[/a_col]
[1] => [b_col] Two [/b_col]
[2] => [c_col] [Three] [/c_col]
)
[1] => Array
(
[0] => a
[1] => b
[2] => c
)
[2] => Array
(
[0] => some="thing"
[1] =>
[2] =>
)
[3] => Array
(
[0] => One
[1] => Two
[2] => [Three]
)
)
It might also be helpful to use this for capturing the attribute names and values stored in $matches[2]. Consider $atts to be the first element in $matches[2]. Of course, would iterate over the array of attributes and perform this on each.
preg_match_all('#([^="\'\s]+)[\t ]*=[\t ]*("|\')(.*?)\2#', $atts, $att_matches);
This gives an array where the names are stored in $att_matches[1] and their corresponding values are stored in $att_matches[3].
use ((.|\n)*) instead of (.*) to capture multiple lines...
<?php
$string = "
[a_col] One
[/a_col]
[b_col]
Two
[/b_col] [c_col] Three [/c_col]";
preg_match_all('#\[(a|b|c)_col\]((.|\n)*)\[\/\1_col\]#m', $string, $hits);
echo "<textarea style='width:90%;height:90%;'>";
print_r($hits);
echo "</textarea>";
?>
I don't have an environment I can test with here but you could use a look behind and look ahead assertion and a back reference to match tags around the content. Something like this.
(?<=\[(\w)\]).*(?=\[\/\1\])

Categories