PHP Regex, get multiple value with preg_match

PHP Regex, get multiple value with preg_match - php

I have this text string:
$text="::tower_unit::7::/tower_unit::<br/>::tower_unit::8::/tower_unit::<br/>::tower_unit::9::/tower_unit::";
Now I want to get the value of 7,8, and 9
how to do that in preg_match_all ?
I've tried this:
$pattern="/::tower_unit::(.*)::\/tower_unit::/i";
preg_match($pattern,$text,$matches);
print_r($matches);
but it still all wrong...

You forgot to escape the slash in your pattern. Since your pattern includes slashes, it's easier to use a different regex delimiter, as suggested in the comments:
$pattern="#::tower_unit::(\d+)::/tower_unit::#";
preg_match_all($pattern,$text,$matches);
I also converted (.*) to (\d+), which is better if the token you're looking for will always be a number. Plus, you might want to lose the i modifier if the text is always lower cased.

Your regex is "greedy".
Use the following one
$pattern="#::tower_unit::(.*?)::/tower_unit::#i";
or
$pattern="#::tower_unit::(.*)::/tower_unit::#iU";
and, if you wish, \d+ instead of .*? or .*
the function should be preg_match_all

Related

greedy character matching at end of string

I'm trying to match the following string:
controller1/action1/something
With the following regex:
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.*)
For some reason it doesn't find the last part of the string: something. But it works when i change the * to + at the end of the regex:
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.+)
With that regex it does find the something string. But i want to use .* (or .*?) because i want this regex to succeed also when it doesn't have something at the end.
So it should also succeed when the string is: controller1/action1/
So why doesn't it work with (.*) or (.*?) but works with .+? The difference should simply be that the first says "zero or more characters" and the last "one or more". I simply want to check for "zero or more".
PS. I don't want to use ^ and $ to denote the beginning and end of the string due to a complexer problem. Simply stated, this pattern doesn't always occur for strings at the end.

So it should also succeed when the string is: controller1/action1/
I suspect since this input is part of some bigger string that's why .* isn't working for you. suggest you to post some real examples of your input text.
Meanwhile can you try this regex:
"#(?P<controller>[^/]+)/(?P<action>[^/]+)/([^/]*)#"

You just have to make the last group optional to make it match controller1/action1/
(?P<controller>[[:alnum:]]+)/(?P<action>[[:alnum:]]+)/(.+)?

PHP regex lookbehind with wildcard

I have two strings in PHP:
$string = '<a href="http://localhost/image1.jpeg" /></a>';
and
$string2 = '[caption id="attachment_5" align="alignnone" width="483"]<a href="http://localhost/image1.jpeg" /></a>[/caption]';
I'm trying to match strings of the first type. That is strings that are not surrounded by '[caption ... ]' and '[/caption]'. So far, I would like to use something like this:
$pattern = '/(?<!\[caption.*\])(?!\[\/caption\])(<a.*><img.*><\/a>)/';
but PHP matches out the first string as well with this pattern even though it is NOT preceeded by '[caption' and zero or more characters followed by ']'. What gives? Why is this and what's the correct pattern?
Thanks.

Variable length look-behind is not supported in PHP, so this part of your pattern is not valid:
(?<!\[caption.*\])
It should be warning you about this.
In addition, .* always matches the larges possible amount. Thus your pattern may result in a match that overlaps multiple tags. Instead, use [^>] (match anything that is not a closing bracket), because closing brackets should not occur inside the img tag.
To solve the look-behind problem, why not just check for the closing tag only? This should be sufficient (assuming the caption tags are only used in a way similar to what you have shown).
$pattern = '|(<a[^>]*><img[^>]*></a>)(?!\[/caption\])|';
When matching patterns that contain /, use another character as the pattern delimiter to avoid leaning toothpick syndrome. You can use nearly any non-alphanumeric character around the pattern.
Update: the previous regex is based on the example regex you gave, rather than the example data. If you want to match links that don't contain images, do this:
$pattern = '|(<a[^>]*>[^<]*</a>)(?!\[/caption\])|';
Note that this doesn't allow any tags in the middle of the link. If you allow tags (such as by using .*?), a regex could match something starting within the [caption] and ending elsewhere.

I don't see how your regexp could match either string, since you're looking for <a.*><img.*><\/a>, and both anchors don't contain an <img... tag. Also, the two subexpressions looking for and prohibiting the caption-bits look oddly positioned to me. Finally, you need to ensure your tag-matching bits don't act greedy, i.e. don't use .* but [^>]*.
Do you mean something like this?
$pattern = '/(<a[^>]*>(<img[^>]*>)?<\/a>)(?!\[\/caption\])/'
Test it on regex101.
Edit: Removed useless lookahead as per dan1111's suggestion and updated regex101 link.

Lookbehind doesn't allow non fixed length pattern i.e. (*,+,?), I think this /<a.*><\/a>(?!\[\/caption\])/ is enough for your requirement

How to match 2nd instance in regex

get_by_my_column
If I only want to match the get_by portion of the above string, how can I do this? I keep reading on this regex cheatsheet that I should use \n but I can't figure out how to implement it properly...
I've tried variations of the following...
/((_){2})/
/(_+){2}/

/(\w+?_\w+?)_\w+/ (use non greedy quantifiers, your substring should be in capture group 1)
or just /\w+?_\w+?/ <---(edit: won't work, you do need that second underscore as regex structure to force the non greedy \w up to it :])

Do you need to use a regex for this? You could use explode() and just grab the first two elements of the resulting array.

Try
preg_match('/(^[a-z]+[_][a-z]+)/', $string, $results);
This matches a string that starts with a group of letters followed by an underscore followed by another set of letters.
Edit: (lowercase letters)

try /^get_by. ^ for the condition that g must be the starting character.

Replacing multiple slashes with exception in regex

There are quite a few questions on removing multiple slashes using regex in PHP. However, I have a special case I would like to exclude.
I have a full URL as my input: http://localhost/path/to/whatever
I have written to regex to convert backslashes to forward slashes, and then remove multiple consecutive slashes:
$cleaned = preg_replace('/(\\\+)|(\/+)/', "/", trim($input));
This works fine for the most part, however I need to be able to exclude the :// case, otherwise using that expression will result in which is not the intended result:
http:/localhost/path/to/whatever
I have tried using /(\\\+)|^[:](\/+)/, but this doesn't seem to work.
How can I exclude the :// case in my expression?

$cleaned = preg_replace('~(?<!https:|http:)[/\\\\]+~', "/", trim($input));
The subexpression inside the lookbehind can't use quantifiers, so the obvious approach - (?<!https?:) - won't work. But it can be made up of two or more fixed-length alternatives with different lengths. For example:
(?<!https:|http:) # OK
Be aware that the alternation has to be at the top level of the lookbehind, so this won't work:
(?<!(https:|http:)) # error

There is something called "negative look behind" (also available in positive or look ahead)
http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html
With this you could add an exception by something like
(?<=^https?:)
Then your expression will only match in places NOT preceded by "http:"

Simply a negative look-behind for a colon, preceding two or more forward or backward slashes:
$cleaned = preg_replace('/(?<!:)(?:\\/|\\\\){2,}/', "/", trim($input));

Regular Expression (preg_match) - id and random characters

I want to find URL like following with preg_match.
http://www.website.com/THE_ID_WHICH_I_WANT/RANDOM_CHARACTERS_AND_NUMBERS.RANDOM_SOMETHING.html
This is how far I got:
preg_match_all('%http://www.website\.com\/(\w+)%', $string, $matches);
But I also want that it to get the random characters.
Thank you.

For matching anything it's customary to use .+ or the non-greedy .*?
You might want to use \S+ which matches anything that isn't a space character. And even then it might be too much. But you didn't really elaborate about the context in which you want to use it.

preg_match_all('%http://www\.website\.com/(\w+)/(.*)\.html%', $string, $matches);
The above is assuming that you want to separate "THE_ID_WHICH_I_WANT" from the other random characters.
Example: http://regexr.com?2v9t7

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex, get multiple value with preg_match - php

Your regex is "greedy". Use the following one $pattern="#::tower_unit::(.?)::/tower_unit::#i"; or $pattern="#::tower_unit::(.)::/tower_unit::#iU"; and, if you wish, \d+ instead of .? or . the function should be preg_match_all

Related

greedy character matching at end of string

PHP regex lookbehind with wildcard

How to match 2nd instance in regex

Replacing multiple slashes with exception in regex

Regular Expression (preg_match) - id and random characters

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex, get multiple value with preg_match - php

Your regex is "greedy". Use the following one $pattern="#::tower_unit::(.*?)::/tower_unit::#i"; or $pattern="#::tower_unit::(.*)::/tower_unit::#iU"; and, if you wish, \d+ instead of .*? or .* the function should be preg_match_all

Related

greedy character matching at end of string

PHP regex lookbehind with wildcard

How to match 2nd instance in regex

Replacing multiple slashes with exception in regex

Regular Expression (preg_match) - id and random characters

Categories

Resources

Your regex is "greedy". Use the following one $pattern="#::tower_unit::(.?)::/tower_unit::#i"; or $pattern="#::tower_unit::(.)::/tower_unit::#iU"; and, if you wish, \d+ instead of .? or . the function should be preg_match_all