Capturing a pattern of unknown repitition in PCRE - php

This may be a quick question for experienced regular expressionists, but I'm having trouble getting my match to execute correctly.
Suppose I had a string that looked like this:
http://aaa-bbbb-cc-ddddd-eee-.sub.dom
I would like to go capture all of the "aaa", "bbbb", "cc", and "ddddd" substrings, but I'm not sure how many there will be (e.g., having all triplets up through "zzz").
This is the regular expression I'm trying to use right now:
/http:\/\/(\w*?\-)+\.sub\.dom/
I wrote it this way because:
I want to match substrings, but I want each to terminate when a - is parsed
I want to capture one or more of these substrings
But it seems to only be saving the last match that it makes (in the above case, it would only match "eee-".
Is there a good way to capture all of the matched substrings?
More information: I'm using PHP's PCRE function preg_replace_callback. Thanks!

No, it is not possible to match an unknown number of capture groups.
If you try to repeat a capture group, it will always contain the last value captured.
Could you explain a bit more broadly what you're trying to do? Perhaps there is another simple way to do it (possibly without regular expressions).

If you want the items in the subdomain, and then all matches between the dashes... This should work:
$string = "http://aaa-bbbb-cc-ddddd-eee-.sub.dom";
preg_match("/^http:\/\/([\w-]+?)\..*$/i", $string, $match);
$parts = explode('-', $match[1]);
print_r($parts);
Short of that you will probably have to build a small parsing script to parse the string yourself if that doesn't do it for you.

Related

Correct regex for this pattern

I've got some issues understanding this regex.
I tried doing a pattern but does not work like intended.
What I want is [A-Za-z]{2,3}[0-9]{2,30}
That is 2-3 letters in the beginning and 2-30 numbers after that
FA1321321
BFA18098097
I want to use it to validate an input field but can't figure out how the regex should look like.
Can any one that can help me out even explain a bit about it?
Your regex is correct - just make sure to surround it with / in PHP, and perhaps ^, $ if you want it to strictly match the entire string (no extra characters before/after).
$pattern = "/^[A-Za-z]{2,3}[0-9]{2,30}$/"
$found = preg_match($pattern, $your_str);
From the PHP documentation:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.

regular expressions checking two strings

Hi wonder if anyone can help - I'm trying to check for occurance of one of two possible strings using regex - but my knowlege of regex is very limited, so I'm not having much sucess.
I'm trying to look for 'Email' and 'eMailConfirm', this is what I have so far and is working for Email
subject is the id of a input field, so it could be 'name','Email','eMailConfirm'
$subject = $getPromoOuter['label'];
$pattern = '/^Email/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 0);
I tried a number of potential expressions to try and incorporate the second string but I can't seem to get it to play (plus a few guesswork ones based on others)
any idea how I can concatenate those two strings and check for an occurance of either?
Thanks for looking
I'll just place an answer here, as I do think I have a good idea what your requirement is.
Your current regex is /^Email/ which matches any string which starts with 'Email'. (whether or not it has to start with it is unclear to me).
In case you need to match either Email or eMailConfirm, not at the start of the string, you should go for
/Email|eMailConfirm/
If the matches do need to be at the front of the string, just prepend both with a '^' character:/^Email|^eMailConfirm/

PHP regular expression : match the closest one

I have a string like this
<div><span style="">toto</span> some character <span>toto2</span></div>
My regex:
/(<span .*>)(.*)(<\/span>)/
I used preg_match and it returns the entire string
<span style="">toto</span> some character <span>toto2</span>
I want it returns:
<span style="">toto</span>
and
<span>toto2</span>
What do I need to do to achieve this? Thanks.
How about this:
/(<span[^>]*>)(.*?)(<\/span>)/
Check the docs here at PHP preg_match Repetition:
By default, the quantifiers are "greedy", that is, they match as much as possible
and
However, if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible
Even though I guess all previous answers are correct, I just want to add that as you only want to capture the whole expressions (i.e. from to ) you don't have to capture eveything inside the regexp with ()
The following does what you expect without capturing additional expressions
/(<span\w*[^>]*>[^<]*<\/span>)/
(tested on http://rubular.com/)
EDIT : of course there might be some differences between PHP and ruby regexp implementations, but the idea is the same :)

Regex matching optional section

So I have two possible strings here for example.
/user/name
and
/user/name?redirect=1
I'm trying to figure out the proper regex to match either with a result of:
Array ([0] => /user/name [1] => user [2] => name)
I think the part I'm having an issue with is that the question mark and the GET query after it are optional and will only be there some of the time. I've tried many different things and can't seem to come up with a regex to match the strings whether the ?** is there or not.
Don't use a regex,
Use parse_url(), and explode()
$result = parse_url("/here/is/a/path?query=string");
$pieces = explode("/", $result['path']);
? is the "zero-or-one" quantifier. So you could append (\?.*)? to your regex, which will optionally match zero or one instances of a literal question-mark followed by any number of characters.
In regex you can specify something as optional using the ? parameter. So for instance, the regex n?ever matches ever and never.
In your case, you might want something like /([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?redirect=1)?
This will match /.../... (given the "..." consist of letters and numbers) or /.../...?redirect=1
If there are more possible flags that could come after the question mark than simply redirect=1, try the more general:
/([A-Za-z0-9]+)/([A-Za-z0-9]+)(\?[A-Za-z0-9]+=[A-Za-z0-9]+)?(&[A-Za-z0-9]+=[A-Za-z0-9]+)*
preg_match('{^/(user)/(name)(?=\?redirect=1)?$}', $subject, $matches);
This is a look ahead assertion. It won't be included in the match itself.
But like the other answers suggest you shouldn't use regex to parse URLs. Just posting the actual answer to the specific question for completeness.

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.
$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.
If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)
In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Categories