Using preg_match to match this specific pattern - php

I have a preg_match matching for specific patterns, but it's just not matching the pattern I'm trying to match. What am I doing wrong?
<?php
$string = "tell me about cats";
preg_match("~\b(?:tell me about|you know(?: of| about)?|what do you think(?: of| about)?|(?:what|who) is|(?:whats|whos)) ((?:[a-z]+ ){1,2})$~", $string, $match);
print_r($match);
?>
Expected Result:
array(0 => tell me about 1 => cats)
Actual Result:
array()

You are having an extra space in (but there are no spaces after cat making the entire regex to fail)
((?:[a-z]+ ){1,2})
^^
||
here
also, you don't have capturing group for first part (due to (?:..)). Make a capturing group and make the spaces optional using ? (if you want to capture at most two words)
\b(tell me about|you know(?: of| about)?|what do you think(?: of| about)?|(?:what|who) is|(?:whats|whos)) ((?:[a-z]+){1,2} ?)$
Regex Demo
PHP Code
$string = "tell me about cats";
preg_match("~\b(tell me about|you know(?: of| about)?|what do you think(?: of| about)?|(?:what|who) is|(?:whats|whos)) ((?:[a-z]+ ?){1,2})$~", $string, $match);
print_r($match);
NOTE :- $match[1] and $match[2] will contain your result. $match[0] is reserved for entire match found by the regex in the string.
Ideone Demo

Related

Find a pattern in a string

I am trying to detect a string inside the following pattern: [url('example')] in order to replace the value.
I thought of using a regex to get the strings inside the squared brackets and then another to get the text inside the parenthesis but I am not sure if that's the best way to do it.
//detect all strings inside brackets
preg_match_all("/\[([^\]]*)\]/", $text, $matches);
//loop though results to get the string inside the parenthesis
preg_match('#\((.*?)\)#', $match, $matches);
To match the string between the parenthesis, you might use a single pattern to get a match only:
\[url\(\K[^()]+(?=\)])
The pattern matches:
\[url\( Match [url(
\K Clear the current match buffer
[^()]+ Match 1+ chars other than ( and )
(?=\)]) Positive lookahead, assert )] to the right
See a regex demo.
For example
$re = "/\[url\(\K[^()]+(?=\)])/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[0]);;
}
Output
string(9) "'example'"
Another option could be using a capture group. You can place the ' inside or outside the group to capture the value:
\[url\(([^()]+)\)]
See another regex demo.
For example
$re = "/\[url\(([^()]+)\)]/";
$text = "[url('example')]";
if (preg_match($re, $text, $match)) {
var_dump($match[1]);;
}
Output
string(9) "'example'"

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.
If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo
Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);
Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

PHP exploding url from text, possible?

i need to explode youtube url from this line:
[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]
It is possible? I need to delete [embed] & [/embed].
preg_match is what you need.
<?php
$str = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
preg_match("/\[embed\](.*)\[\/embed\]/", $str, $matches);
echo $matches[1]; //https://www.youtube.com/watch?v=L3HQMbQAWRc
$string = '[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]';
$string = str_replace(['[embed]', '[/embed]'], '', $string);
See str_replace
why not use str_replace? :) Quick & Easy
http://php.net/manual/de/function.str-replace.php
Just for good measure, you can also use positive lookbehind's and lookahead's in your regular expressions:
(?<=\[embed\])(.*)(?=\[\/embed\])
You'd use it like this:
$string = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
$pattern = '/(?<=\[embed\])(.*)(?=\[\/embed\])/';
preg_match($pattern, $string, $matches);
echo $match[1];
Here is an explanation of the regex:
(?<=\[embed\]) is a Positive Lookbehind - matches something that follows something else.
(.*) is a Capturing Group - . matches any character (except a newline) with the Quantifier: * which provides matches between zero and unlimited times, as many times as possible. This is what is matched between the groups prior to and after. This are the droids you're looking for.
(?=\[\/embed\]) is a Positive Lookahead - matches things that come before it.

PHP Replace consecutive occurrence of characters in sentence

I want to replace all consecutive characters in each WORD if there are more than three (three being the most possible in German language, two for English so I know the output example is grammatically wrong).
Example input:
Hellooooo Louis, whaaaaaat's up pal?
Expected output:
Hellooo Louis, whaaat's up pal?
I tried to change:
preg_replace('/(\w)\1+/', '$1', $word);
to
preg_replace('/(\w)\3+/', '$1', $word);
However, it doesn't output anything.
You can use the following regex:
((\w)\2{2})\2+
See demo
Replace with $1.
IDEONE:
$re = "#((\w)\\2{2})\\2+#";
$str = "Hellooooo Louis, whaaaaaat's up pal?";
$subst = "$1";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
Hellooo Louis, whaaat's up pal?
EXPLANATION:
We capture the symbol with (\w) - it is Group 2 value. Then, we check if it is followed by the same character with \2{2} exactly 2 times, and we capture it into Group 1. Then, we match any more identical subsequent characters with the \2 backreference.
Here is a way to go:
preg_replace('/((\w)\2\2)\2+/', '$1', $word);
Also you can use \K for resetting after and replace with empty, which is a bit more efficient:
(\w)\1\1\K\1+
See regex101

Regex for a record that starts with a number and ends with a particular string

I get some results using file_get_contents like below.
30049988.html" >Title1
297816.html" >Title2
2979922.html" >Title3
29736.html" >Title4
22833.html" >Title5
I want to remove the ugly part (number.html" >) and get the titles only, how can I achieve it?
You could use preg_replace function.
preg_replace('~.*?>~', '', $string);
DEMO
.*? will do a non-greedy match of zero or more characters.
OR
preg_replace('~^\d+\.html" >~', '', $string);
The preg_replace method will work, but to answer the original question for anyone else wondering.
<?php
$string = <<<EOF
30049988.html" >Title1
297816.html" >Title2
2979922.html" >Title3
29736.html" >Title4
22833.html" >Title5
EOF;
preg_match_all('~[^>]+>([^\\n]+)$~smU', $string, $matches);
if (!isset($matches[1])) {
echo 'No results found ..'. PHP_EOL;
exit;
}
foreach ($matches[1] as $match) {
echo $match.PHP_EOL;
}
You cat try this regex.
(?=T)(\w+)
How this works
(?=T) - This is a positive lookahead. It checks if the pattern starts with T and only then proceeds next.
(\w+) - This groups all word characters from T.
Output:
Title1
Title2
Title3
Title4
Title5
Here is the regex in action.

Categories