PHP preg_match Everything Between Slashes - php

I need the help of a regex wizard, or someone who knows more about this than me (which means there's lots of candidates :)
I am trying to match everything that occurs between the first and second slash, excluding those slashes, or nothing if there's no starting and trailing slash:
$subject = '/1234-abcd/blahblah';
$pattern = '/^\/(.*)\//';
preg_match($pattern, $subject, $matches);
print_r($matches);
Here are the results:
Array
(
[0] => /1234-abcd/
[1] => 1234-abcd
)
I'm close. $matches[1] has the result I'm after, but it's not matching this as its first array item (and instead, the first captured subpattern).
How do I exclude the starting and trailing slashes in this regex pattern?
Thanks!

You can use this regex:
$pattern = '#(?<=/)[^/]+#';
And use preg_match_all instead of preg_match
PS: Note that you can also use explode to split your input by / and avoid using regex altogether.

Related

Preg_match for alternative values in regex php not work [duplicate]

The first question is this:
I am using http://www.phpliveregex.com/ to check my regex is right and it finds more than one matching lines.
I am doing this regex:
$lines = explode('\n', $text);
foreach($lines as $line) {
$matches = [];
preg_match("/[0-9]+[A-Z][a-z]+ [A-Z][a-z]+S[0-9]+\-[0-9]+T[0-9]+/uim", $line, $matches);
print_r($matches);
}
on the $text which looks like this: http://pastebin.com/9UQ5wNRu
The problem is that printed matches is only one match:
Array
(
[0] => 3Bajus StanislavS2415079249-2615T01
)
Why is it doing to me? any ideas what could fix the problem?
The second question
Maybe you've noticed not regular alphabetic characters of slovak language inside the text (from pastebin). How to match those characters and select the users which have this format:
{number}{first_name}{space}{last_name}{id_number}
how to do that?
Ok first issue is fixed. Thank you #chris85 . I should have used preg_match_all and do it on the whole text. Now I get an array of all students which have non-slovak (english) letters in the name.
preg_match is for one match. You need to use preg_match_all for a global search.
[A-Z] does not include an characters outside that range. Since you are using the i modifier that character class actual is [A-Za-z] which may or may not be what you want. You can use \p{L} in place of that for characters from any language.
Demo: https://regex101.com/r/L5g3C9/1
So your PHP code just be:
preg_match_all("/^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$/uim", $text, $matches);
print_r($matches);
You can also use T-Regx library:
pattern("^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$", 'uim')->match($text)->all();

PHP match for strings between two (starting, ending) delimters

String;
RandomValue1:|RandomSentence1.|RandomValue2:|RandomSentence2.|
I'm trying to match RandomSentence1. and RandomSentence2.. I figured the "." in the sentence could be used to help the matching since every sentence ends with a period. So if I don't have the period in my match. I'm OK with that. I've never been very good at RegEx but I'm always willing to try and learn. Through the results on here I haven't been able to come up with anything that works. I'd be coding this in PHP. I believe either preg_match() or preg_split() would be the usage here.
I initially tried; .*:\|.*\.\|
But that just matches the entire string since it ends with .|.
Then I tried this; .*:\|\s*(.*?)\s*\|
But that only matched the RandomSentence2.
These are adaptions of what I've found online.
This should work for a regex to capture all. Look for NOT . or | followed by . and |:
preg_match_all('/([^.|]+\.)\|/', $string, $matches);
print_r($matches[1]);
An alternate if you want to do something with the other entries would be to split and then find what you want. Split on | then grep for array values ending in .:
$matches = preg_grep('/\.$/', explode('|', $string));
Since you already know there is a dot at the end, you can just match all
with something simple (?<=\|)[^|.]+(?=\.\|)
https://regex101.com/r/ZsHcWq/1
(?<= \| )
[^|.]+
(?= \.\| )

PHP preg_match returns only first match

The first question is this:
I am using http://www.phpliveregex.com/ to check my regex is right and it finds more than one matching lines.
I am doing this regex:
$lines = explode('\n', $text);
foreach($lines as $line) {
$matches = [];
preg_match("/[0-9]+[A-Z][a-z]+ [A-Z][a-z]+S[0-9]+\-[0-9]+T[0-9]+/uim", $line, $matches);
print_r($matches);
}
on the $text which looks like this: http://pastebin.com/9UQ5wNRu
The problem is that printed matches is only one match:
Array
(
[0] => 3Bajus StanislavS2415079249-2615T01
)
Why is it doing to me? any ideas what could fix the problem?
The second question
Maybe you've noticed not regular alphabetic characters of slovak language inside the text (from pastebin). How to match those characters and select the users which have this format:
{number}{first_name}{space}{last_name}{id_number}
how to do that?
Ok first issue is fixed. Thank you #chris85 . I should have used preg_match_all and do it on the whole text. Now I get an array of all students which have non-slovak (english) letters in the name.
preg_match is for one match. You need to use preg_match_all for a global search.
[A-Z] does not include an characters outside that range. Since you are using the i modifier that character class actual is [A-Za-z] which may or may not be what you want. You can use \p{L} in place of that for characters from any language.
Demo: https://regex101.com/r/L5g3C9/1
So your PHP code just be:
preg_match_all("/^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$/uim", $text, $matches);
print_r($matches);
You can also use T-Regx library:
pattern("^[0-9]+\p{L}+ \p{L}+S[0-9]+\-[0-9]+T[0-9]+$", 'uim')->match($text)->all();

Regular expression searching special tag

I have a special tag in text [Attachment: image;upload;url] to parse it I need to find all this tags, I have wrote this regular expression:
preg_match_all("/.*(\[Attachment: (.*);upload;(.*)\]).*/", $text, $matches);
All work fine, it returns this
Array
(
[0] => Array
(
Text
)
[1] => Array
(
[Attachment: image;upload;url]
)
[2] => Array
(
image
)
[3] => Array
(
url
)
)
But here is one problem, when text contains two or more tags, it will return info only about last founded tag.
You should match only the tags, not the surrounding text:
"/\[Attachment: ([^;]*);upload;([^\]]*)\]/"
Instead of the negative character set you could also use .*? to use non-greedy matching; however, I prefer to use the look-ahead set.
Remove the .* part from the end of the regex. With the .*, the regex matches to the end of the string, including any of the other substrings that you want to find. (Or at least all the ones on the same line - I can't remember what the default settings are in PHP.) After that it looks for more matches from the end of the string, but can't find any.
This regex should do it:
$regex = '/[Attachment: (.*?);(.*?);(.*?)]/';
preg_match_all($regex, $string, $matches);
For me, this came back with what you wanted (3 results);

PHP- Regular expression - how to read from Right to left

I have below example
$game = "hello999hello888hello777last";
preg_match('/hello(.*?)last/', $game, $match);
The above code returns 999hello888hello777, what I need is to retrieve the value just before Last, i.e 777. So I need to read regular expression to read from right to left.
$game = strrev($game);
How about that? :D
Then just reverse the regular expression ^__^
Why not just reverse the string? Use PHP's strrev and then just reverse your regular expression.
$game = "hello999hello888hello777last";
preg_match('/tsal(.*?)elloh/', strrev($game), $match);
This will return the last set of digits before the string last
$game = "hello999hello888hello777last";
preg_match('/hello(\d+)last$/', $game, $match);
print_r($match);
Output Example:
Array
(
[0] => hello777last
[1] => 777
)
So you would need $match[1]; for the 777 value
Your problem is that although .* matches reluctantly, i. e. as few characters as possible, it still starts matching right after hello, and since it matches any characters, it will match right across "boundaries" (last and hello in your case).
Therefore you need to be more explicit about the fact that it's not legal to match across boundaries, and that's what lookahead assertions are for:
preg_match('/hello((?:(?!hello|last).)*)last(?!.*(?:hello|last)/', $game, $match);
Now the match between hello and last is prohibited from containing hello and/or last, and it's not allowed to have hello or last after the match.

Categories