preg_match() behaving strange? - php

I want to compare two strings against url:
$reg1 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsindex\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$)/";
$reg2 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsassoc\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$)/";
$uri = "www.domain.com/paramsindex/cont/meth/par1/par2/par3/";
$r1 = preg_match($reg1, $uri);
echo "<p>First regex returned: {$r1}</p>";
$r2 = preg_match($reg2, $uri);
echo "<p>Second regex returned: {$r2}</p>";
Now these strings are not the same, difference is this:
www.domain.com/paramsindex/cont/meth/par1/par2/par3/
vs.
www.domain.com/paramsassoc/cont/meth/par1/par2/par3/
And yet PHP preg_match returns 1 for both of them.
Now you will say this is a long regex and why use that. And the thing is I could built shorter regex but it is built on the fly and... it youst needs to be like that.
And what bothers me is that in Rubular regexs works as it should.
When testing them I was using Rubular, and now i PHP it wont work. I know Rubular is Ruby regex editor but I tought it should be the same :(
Rubular testing:here
What is problem here? How should I write that regex in PHP so preg_match can see the difference? And regex should be as close to the one I already wrote, is there some simple fix to my problem? Something im overlooking?

That behavior is by design, preg_match returns 1 when a match is found. If you want to capture matches, see the matches parameter at: http://php.net/manual/en/function.preg-match.php
Edit: For example
$matches = array();
$r2 = preg_match($reg2, $uri, $matches);
echo "<p>Second regex returned: ";
print_r($matches);
echo "</p>";
I'll leave the above to document my own stupidity for not answering the right question.
At the end of your regex you have |()\/?$)/ which will make the regex match URL that ends with a slash. Take it out and it looks like you're golden from my tests.

Always remember to group your operands!
I can assume that this one is can be quite hard to spot, but it's all because of your use of the or-operator |. You are not grouping the operands correctly and therefore the result described in your post is being yield.
Your use of |() in the provided case will match either nothing or the full regular expression to the left of your operator |.
To solve this issue you will need to put parentheses around the operands that should be ORed.
An easy method of seeing where everything goes wrong is to run this below snippet:
$reg1 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsindex\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$
$reg2 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsassoc\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$
$uri = "www.domain.com/paramsindex/cont/meth/par1/par2/par3/";
var_dump (preg_match($reg1, $uri, $match1));
var_dump (preg_match($reg2, $uri, $match2));
print_r ($match1);
print_r ($match2);
output
int(1)
int(1)
Array
(
[0] => www.domain.com/paramsindex/cont/meth/par1/par2/par3
[1] => www.domain.com/paramsindex/cont/meth/par1/par2/par3
[2] => www.
[3] => www.
[4] => www.
[5] =>
[6] => cont
[7] => meth
[8] => par1/par2/par3
[9] => par1
[10] => par1
[11] =>
[12] => /par3
[13] => par3
)
Array
(
[0] => /
[1] => /
[2] =>
[3] =>
[4] =>
[5] =>
[6] =>
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] =>
[13] =>
[14] =>
[15] =>
)
As you see $reg2 matches a bunch of empty strings in $uri, which is an indication of what I described earlier.
If you come up with a short description of what you are trying to do I can provide you with a fully functional (and probably a bit neater than you current) regular expression.

Your RegEx is a mess and you will have to change it if you want it to work.
Check out the Rubular for your paramsindex: http://www.rubular.com/r/3ptjQ5aIrD
Now, for paramsassoc: http://www.rubular.com/r/o7GCbCsHyX
They both return a result. Sure it's an array full of empty strings, but it is a result nontheless.
That is why both are TRUE.

Related

Regex - Does not contain certain Characters preg_match

I need a regex that match if the array contain certain it could anywhere for example, this array :
Array
(
[1] => Array
(
[0] => http://www.test1.com
[1] => 4
[2] => 4
)
[2] => Array
(
[0] => http://www.test2.fr/blabla.html
[1] => 2
[2] => 2
)
[3] => Array
(
[0] => http://www.stuff.com/admin/index.php
[1] => 2
[2] => 2
)
[4] => Array
(
[0] => http://www.test3.com/blabla/bla.html
[1] => 2
[2] => 2
)
[5] => Array
(
[0] => http://www.stuff.com/bla.html
[1] => 2
[2] => 2
)
I want to return all but the array that have the word stuff in it, and when i try to test with this it doesn't quite work :
return !preg_match('/(stuff)$/i', $element[0]);
any solution for that ?
Thanks
You don't need a regular expression for performing a simple search. Use array_filter() in conjunction with strpos():
$result = array_filter($array, function ($elem) {
return (strpos($elem[0], 'stuff') !== FALSE);
});
Now, to answer your question, your current regex pattern will only match strings that contain stuff at the end of the line. You don't want that, so get rid of the "end of the line" anchor $ from your regex.
The updated regex should look like below:
return !preg_match('/stuff/i', $element[0]);
If the actual use-case is different from what is shown in your question and if the operation involves more than just a simple pattern matching, then preg_match() is the right tool. As shown above, this can be used with array_filter() to create a new array that satisifes your requirements.
Here's how you'd do it with a callback function:
$result = array_filter($array, function ($elem) {
return preg_match('/stuff/i', $elem[0]);
});
Note: The actual regex might be more complex - I've used /stuff/ as an example. Also, note that I've removed the negation !... from the statement.
Your pattern will only match a string where stuff appears at the end of the string or line. To fix this, just get rid of the end anchor ($):
return !preg_match('/stuff/i', $element[0]);

Splitting a string using regex

I would like to split a string where any character is a space or punctuation (excluding apostrophes). The following regex works as intended.
/[^a-z']/i
Words like I'll and Didn't are accepted, which is great.
The problem is with words like 'ere and 'im. I would like to remove the beginning apostrophe and have the words im and ere.
I would ideally like to stop/remove this within the regex pattern if possible.
Thanks in advance.
Try this
$str = "Words like I'll and Didn't are accepted, which is great.
The problem is with words like 'ere and 'im";
print_r(preg_split("/'?[^a-z']+'?/i", $str));
//Array ( [0] => Words [1] => like [2] => I'll [3] => and [4] => Didn't ...
// [16] => ere [17] => and [18] => im )

Simple regex for comma

Hello can someone help me with this regex please
here is my $lang_file:
define(words_picture,"Снимка");
define(words_amount,"бр.");
define(words_name,"Име");
define(words_price_piece,"Ед. цена");
define(words_total,"Обща цена");
define(words_del,"Изтрий");
define(words_delivery,"Доставка,но няма");
this is my code :
$fh = fopen($lang_file, 'r');
$data = str_replace($rep,"",fread($fh, filesize($lang_file)));
fclose($fh);
preg_match_all('/define\((.*?)\)/i', $data,$defines,PREG_PATTERN_ORDER);
when i print $defines i get this :
[0] => words_picture,"Снимка"
[1] => words_amount,"бр."
[2] => words_name,"Име"
[3] => words_price_piece,"Ед. цена"
[4] => words_total,"Обща цена"
[5] => words_del,"Изтрий"
[6] => words_delivery,"Доставка" //here is the part that is missing and i need it :-)
so when there is a comma inside the string it breaks the string there, and doesn't return correct value.
Try (koko.*?) as the match. That'll return koko for koko,goko. If you want it to return koko,goko, remove the ?. Make it (koko.*). That will return koko,goko for koko,goko.
Here's a site that I use to test my regex against a number of cases:
http://www.cyber-reality.com/regexy.html
based on your edit I'd say you're looking for (koko.*). If your code worked for everything else, use this:
preg_match_all('\(/define.*)\)/i', $data,$defines,PREG_PATTERN_ORDER);

Unexpected result with very simple regexp

I am fairly new to regexp and have encountered a regexp that delivers an unexpected result, when trying to match name parts in name of the form firstname-fristname firstname:
preg_match_all('/([^- ])*/i', 'aNNA-äöå Åsa', $result);
gives a print_r($result) that looks like this:
Array
(
[0] => Array
(
[0] => aNNA
[1] =>
[2] => äöå
[3] =>
[4] => Åsa
[5] =>
)
[1] => Array
(
[0] => A
[1] =>
[2] => å
[3] =>
[4] => a
[5] =>
)
)
Now the $result[0] has the items I would want and expect as result, but where the heck do the $results[1] come from - I see it's the word endings, but how come they are matched?
And as a little side question, how do I prevent the empty matches ($results[0][1], $results[0][3], ...), or better even: Why do they show up - they are not not- or not-space either?
Have a try with:
preg_match_all('/([^- ]+)/', 'aNNA-äöå Åsa', $result);
Your regex:
/([^- ])*/i
means: find one char that is not ^ or space and keep it in a group 0 or more times
This one:
/([^- ]+)/
means: find one or more char that is not ^ or space and keep it in a group
Moreover, there's no need for case insensitive.
The * means "0 or more of the preceding." Since a "-" is exactly 0 of the the character class, it is matched. However, since it is omitted from the character class, the capture fails to grab anything, leaving you an empty entry. The expression giving you the expected behavior would be:
preg_match_all('/([^- ])+/i', 'aNNA-äöå Åsa', $result);
("+" means "1 or more of the preceding.")
http://php.net/manual/en/function.preg-match-all.php says:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
Check the URL for more details

I need help with this regex pattern

Hi I have a problem with my regex pattern:
preg_match_all('/!!\d{3}/', '!!333!!333 !!333 test', $result);
I want this to match !!333 but not !!333!333. How can I modify this regex to match only a max length of 5 characters - two ! and three numbers.
/^!!\d{3}$/
You need the anchors ^, that match the beginning of a string and $ for the end. Its like saying: "It must begin at the start of the string and it must end at the end of it." If you omit one (or both) the pattern allows arbitrary symbols at the beginning and/or the end.
Update
As I found out in the comments the question was very misleading. Now I suggest to split the string before applying the pattern
$string = '!!333!!333 !!333 test';
$result = array();
foreach (explode(' ', $string) as $index => $item) {
if (preg_match('/^!!\d{3}$/', $item)) {
$result[$index] = $item;
}
}
This also respects the index of the item. If you dont need it, remove the $index stuff or just ignore it ;)
Its much easier then trying to find a pattern, that fulfill your request all at once.
^!!\d{3}$
You need to anchor your pattern.
If you want to match a string with !!333 in it, you may want something like:
(^|\s)!!\d{3}($|\s)
With further explanation we can have a further refinement:
(^|\s)!!\d{3}(?=$|\s)
Which will not capture the trailing space allowing multiple matches in the same line to match one after another.
I find the easiest and most descriptive way to do this is with negative lookaheads and lookbehinds.
See:
preg_match_all('/(?<![^\s])!!\d{3}(?![^\s])/', '!!333 !!333!!333 !!333 test !!333', result);
This says: match anything of the form !![0-9][0-9][0-9] which doesn't have anything other than a space in front or behind it. Note that these lookaheads/lookbehinds aren't matched themselves, they are "zero-width assertions", they are thrown away and so you only get "!!333" etc in your match, not " !!333" etc.
It returns
[0] => Array
(
[0] => !!333
[1] => !!333
[2] => !!333
)
)
Also
preg_match_all(
'/(?<![^\s])!!\d{3}(?![^\s])/',
'!!333 !!555 !!333 !!123 !!555 !!456 !!333 !!333 !!444 !!444 !!123 !!123 !!123!!123',
$result));
returns
[0] => Array
(
[0] => !!333
[1] => !!555
[2] => !!333
[3] => !!123
[4] => !!555
[5] => !!456
[6] => !!333
[7] => !!333
[8] => !!444
[9] => !!444
[10] => !!123
[11] => !!123
)
That is, all but the last two which are too long.
See Lookahead tutorial.

Categories