PHP Regex, get multiple value with preg_match - php

I have this text string:
$text="::tower_unit::7::/tower_unit::<br/>::tower_unit::8::/tower_unit::<br/>::tower_unit::9::/tower_unit::";
Now I want to get the value of 7,8, and 9
how to do that in preg_match_all ?
I've tried this:
$pattern="/::tower_unit::(.*)::\/tower_unit::/i";
preg_match($pattern,$text,$matches);
print_r($matches);
but it still all wrong...

You forgot to escape the slash in your pattern. Since your pattern includes slashes, it's easier to use a different regex delimiter, as suggested in the comments:
$pattern="#::tower_unit::(\d+)::/tower_unit::#";
preg_match_all($pattern,$text,$matches);
I also converted (.*) to (\d+), which is better if the token you're looking for will always be a number. Plus, you might want to lose the i modifier if the text is always lower cased.

Your regex is "greedy".
Use the following one
$pattern="#::tower_unit::(.*?)::/tower_unit::#i";
or
$pattern="#::tower_unit::(.*)::/tower_unit::#iU";
and, if you wish, \d+ instead of .*? or .*
the function should be preg_match_all

Related

greedy character matching at end of string

I'm trying to match the following string:
controller1/action1/something
With the following regex:
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.*)
For some reason it doesn't find the last part of the string: something. But it works when i change the * to + at the end of the regex:
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.+)
With that regex it does find the something string. But i want to use .* (or .*?) because i want this regex to succeed also when it doesn't have something at the end.
So it should also succeed when the string is: controller1/action1/
So why doesn't it work with (.*) or (.*?) but works with .+? The difference should simply be that the first says "zero or more characters" and the last "one or more". I simply want to check for "zero or more".
PS. I don't want to use ^ and $ to denote the beginning and end of the string due to a complexer problem. Simply stated, this pattern doesn't always occur for strings at the end.
So it should also succeed when the string is: controller1/action1/
I suspect since this input is part of some bigger string that's why .* isn't working for you. suggest you to post some real examples of your input text.
Meanwhile can you try this regex:
"#(?P<controller>[^/]+)/(?P<action>[^/]+)/([^/]*)#"
You just have to make the last group optional to make it match controller1/action1/
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.+)?

PHP regex lookbehind with wildcard

I have two strings in PHP:
$string = '<a href="http://localhost/image1.jpeg" /></a>';
and
$string2 = '[caption id="attachment_5" align="alignnone" width="483"]<a href="http://localhost/image1.jpeg" /></a>[/caption]';
I'm trying to match strings of the first type. That is strings that are not surrounded by '[caption ... ]' and '[/caption]'. So far, I would like to use something like this:
$pattern = '/(?<!\[caption.*\])(?!\[\/caption\])(<a.*><img.*><\/a>)/';
but PHP matches out the first string as well with this pattern even though it is NOT preceeded by '[caption' and zero or more characters followed by ']'. What gives? Why is this and what's the correct pattern?
Thanks.
Variable length look-behind is not supported in PHP, so this part of your pattern is not valid:
(?<!\[caption.*\])
It should be warning you about this.
In addition, .* always matches the larges possible amount. Thus your pattern may result in a match that overlaps multiple tags. Instead, use [^>] (match anything that is not a closing bracket), because closing brackets should not occur inside the img tag.
To solve the look-behind problem, why not just check for the closing tag only? This should be sufficient (assuming the caption tags are only used in a way similar to what you have shown).
$pattern = '|(<a[^>]*><img[^>]*></a>)(?!\[/caption\])|';
When matching patterns that contain /, use another character as the pattern delimiter to avoid leaning toothpick syndrome. You can use nearly any non-alphanumeric character around the pattern.
Update: the previous regex is based on the example regex you gave, rather than the example data. If you want to match links that don't contain images, do this:
$pattern = '|(<a[^>]*>[^<]*</a>)(?!\[/caption\])|';
Note that this doesn't allow any tags in the middle of the link. If you allow tags (such as by using .*?), a regex could match something starting within the [caption] and ending elsewhere.
I don't see how your regexp could match either string, since you're looking for <a.*><img.*><\/a>, and both anchors don't contain an <img... tag. Also, the two subexpressions looking for and prohibiting the caption-bits look oddly positioned to me. Finally, you need to ensure your tag-matching bits don't act greedy, i.e. don't use .* but [^>]*.
Do you mean something like this?
$pattern = '/(<a[^>]*>(<img[^>]*>)?<\/a>)(?!\[\/caption\])/'
Test it on regex101.
Edit: Removed useless lookahead as per dan1111's suggestion and updated regex101 link.
Lookbehind doesn't allow non fixed length pattern i.e. (*,+,?), I think this /<a.*><\/a>(?!\[\/caption\])/ is enough for your requirement

How to match 2nd instance in regex

get_by_my_column
If I only want to match the get_by portion of the above string, how can I do this? I keep reading on this regex cheatsheet that I should use \n but I can't figure out how to implement it properly...
I've tried variations of the following...
/((_){2})/
/(_+){2}/
/(\w+?_\w+?)_\w+/ (use non greedy quantifiers, your substring should be in capture group 1)
or just /\w+?_\w+?/ <---(edit: won't work, you do need that second underscore as regex structure to force the non greedy \w up to it :])
Do you need to use a regex for this? You could use explode() and just grab the first two elements of the resulting array.
Try
preg_match('/(^[a-z]+[_][a-z]+)/', $string, $results);
This matches a string that starts with a group of letters followed by an underscore followed by another set of letters.
Edit: (lowercase letters)
try /^get_by. ^ for the condition that g must be the starting character.

Replacing multiple slashes with exception in regex

There are quite a few questions on removing multiple slashes using regex in PHP. However, I have a special case I would like to exclude.
I have a full URL as my input: http://localhost/path/to/whatever
I have written to regex to convert backslashes to forward slashes, and then remove multiple consecutive slashes:
$cleaned = preg_replace('/(\\\+)|(\/+)/', "/", trim($input));
This works fine for the most part, however I need to be able to exclude the :// case, otherwise using that expression will result in which is not the intended result:
http:/localhost/path/to/whatever
I have tried using /(\\\+)|^[:](\/+)/, but this doesn't seem to work.
How can I exclude the :// case in my expression?
$cleaned = preg_replace('~(?<!https:|http:)[/\\\\]+~', "/", trim($input));
The subexpression inside the lookbehind can't use quantifiers, so the obvious approach - (?<!https?:) - won't work. But it can be made up of two or more fixed-length alternatives with different lengths. For example:
(?<!https:|http:) # OK
Be aware that the alternation has to be at the top level of the lookbehind, so this won't work:
(?<!(https:|http:)) # error
There is something called "negative look behind" (also available in positive or look ahead)
http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html
With this you could add an exception by something like
(?<=^https?:)
Then your expression will only match in places NOT preceded by "http:"
Simply a negative look-behind for a colon, preceding two or more forward or backward slashes:
$cleaned = preg_replace('/(?<!:)(?:\\/|\\\\){2,}/', "/", trim($input));

Regular Expression (preg_match) - id and random characters

I want to find URL like following with preg_match.
http://www.website.com/THE_ID_WHICH_I_WANT/RANDOM_CHARACTERS_AND_NUMBERS.RANDOM_SOMETHING.html
This is how far I got:
preg_match_all('%http://www.website\.com\/(\w+)%', $string, $matches);
But I also want that it to get the random characters.
Thank you.
For matching anything it's customary to use .+ or the non-greedy .*?
You might want to use \S+ which matches anything that isn't a space character. And even then it might be too much. But you didn't really elaborate about the context in which you want to use it.
preg_match_all('%http://www\.website\.com/(\w+)/(.*)\.html%', $string, $matches);
The above is assuming that you want to separate "THE_ID_WHICH_I_WANT" from the other random characters.
Example: http://regexr.com?2v9t7

Categories