PHP Regex Negation For Youtube URLs - php

Let's say I have HTML in a database that looks like this:
Hello world!
ABC
Blah blah blah...
https://www.youtube.com/watch?v=df82vnx07s
Blah blah blah...
<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>
Now I want to use PHP regex to grab the 2nd and 3rd URLs, but ignore the first.
The regex equation I have so far is:
\s*[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
It works pretty well, but I don't know how to make it exclude/negate the first type of URL, one which starts with: href="
Please help, thanks!

You can use the "negative lookbehind" regular expression feature to accomplish what you're after. I've modified the very beginning of your regex by adding ((?<!href=[\'"])http) to implement one. Hope it helps!
$regex = '/((?<!href=[\'"])http)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)/';
$useCases = [
1 => 'ABC',
2 => "<a href='https://www.youtube.com/watch?v=m7t75u72vd'>ABC</a>",
3 => 'https://www.youtube.com/watch?v=df82vnx07s',
4 => '<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>'
];
foreach ($useCases as $index => $useCase) {
$matches = [];
preg_match($regex, $useCase, $matches);
if ($matches) {
echo 'The regex was matched in usecase #' . $index . PHP_EOL;
}
}
// Echoes:
// The regex was matched in usecase #3
// The regex was matched in usecase #4

All you need is to add a (?![^<]*>) negative lookahead that will fail the match if the match is followed with 0+ chars other than < followed with >:
[a-zA-Z\/:.]*youtu(?:be\.com\/watch\?v=|\.be\/)([a-zA-Z0-9\-_]+)(?![^<]*>)
^^^^^^^^^^
See the regex demo
Note I also escaped . symbols to match literal dots, and used a non-capturing group with be part. You may replace ([a-zA-Z0-9\-_]+) with [a-zA-Z0-9_-]+ if you are not interested in the capture, and you also may replace [a-zA-Z\/\/:\.]* part with a more precise pattern, like https?:\/\/[a-zA-Z.]*.

Example solution:
(?![^<]*>)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
Visualization with an explanation

Related

Replace a character between two words

I have a string like blah blah [START]Hello-World[END] blah blah.
I want to replace - with , between [START] and [END].
So the result should be blah blah[START]Hello,World[END] blah blah.
I would suggest to use preg_replace_callback:
$string = "blah-blah [START]Hello-World. How-are-you?[END] blah-blah" .
" [START]More-text here, [END] end of-message";
$string = preg_replace_callback('/(\[START\])(.*?)(\[END\])/', function($matches) {
return $matches[1] . str_replace("-", ",", $matches[2]). $matches[3];
}, $string);
echo $string;
Output:
blah-blah [START]Hello,World. How,are,you?[END] blah-blah [START]More,text here, [END] end of-message
The idea of the regular expression is to get three parts: "START", "END" and the part between it. The function passes these three text fragments to the callback function, which performs a simple str_replace of the middle part, and returns the three fragments.
This way you are sure that the replacements will happen not only for the first occurrence (of the hyphen or whatever character you replace), but for every occurrence of it.
You will have to use regular expressions to accomplish what you need
$string = "blah blah [START]Hello-World[END] blah blah";
$string = preg_replace('/\[START\](.*)-(.*)\[END\]/', '[START]$1,$2[END]', $string));
Here's what the regular expression does:
\[START\] The backslash is needed to escape the square brackets. It also tells the preg_replace to look in the string where it starts with [START].
(.*) This will capture anything after the [START] and will be referenced later on as $1.
- This will capture the character you want to replace, in our case, the dash.
(.*) This will target anything after the dash and be referenced as $2 later on.
\[END\] Look for the [END] to end the regex.
Now as for the replace part [START]$1,$2[END], this will replace the string it found with the regular expression where the $1 and $2 is the references we got from earlier.
The var_dump of $string would be:
string(43) "blah blah [START]Hello,World[END] blah blah"

Make two simple regex's into one

I am trying to make a regex that will look behind .txt and then behind the "-" and get the first digit .... in the example, it would be a 1.
$record_pattern = '/.txt.+/';
preg_match($record_pattern, $decklist, $record);
print_r($record);
.txt?n=chihoi%20%283-1%29
I want to write this as one expression but can only seem to do it as two. This is the first time working with regex's.
You can use this:
$record_pattern = '/\.txt.+-(\d)/';
Now, the first group contains what you want.
Your regex would be,
\.txt[^-]*-\K\d
You don't need for any groups. It just matches from the .txt and upto the literal -. Because of \K in our regex, it discards the previously matched characters. In our case it discards .txt?n=chihoi%20%283- string. Then it starts matching again the first digit which was just after to -
DEMO
Your PHP code would be,
<?php
$mystring = ".txt?n=chihoi%20%283-1%29";
$regex = '~\.txt[^-]*-\K\d~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 1

Looping within a regular expression

can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.

preg_match_all ignore words

I try to create a regex to capture emails ending not .info/.con containing no aaa/bbb.
Is this the correct syntax?
Eg: // search email ending in .com/.info containing no aaa/bbb
preg_match_all('#((?=.*#.*(?:com|info))(!.*(?:aaa|bbb)).*)#ui', $html, $emails);
To get this:
caaac#ccc.com = no
ccc#ccbbb.com = no
cccc#cccc.com = good (address syntax correct + term absent before or after the #)
Thank you for your reply.
This syntax works fine SEE HERE (thank you to STEMA) except for a string that includes spaces.
e.g:
$string = "email1#address.com blah email2#aaaaess.com blah email3#address.info embbbil4#adress.com";
preg_match_all("#^(?!.*aaa)(?!.*bbb).*#.*\.(?:com|info)$#im", $string, $matches);
Cordially
Simply use a positive expression and check that it did not match anything.
if (preg_match(...) == 0)
Also, there is no need to use preg_match_all if you are just interested whether a pattern matched or not.
If I understand your requirements right, then this would be the regex you can use together with #Tomalak answer.
preg_match('#.*#.*(?:aaa|bbb)|\.(?:com|info)$#ui', $html, $emails);
This pattern matches the stuff you don't want.
.*#.*(?:aaa|bbb) matches aaa or bbb after the #
the \.(?:com|info)$ is the other part, this matches if your email address ends with .com or .info
You can see it online here on Regexr
Update:
.*(?:aaa|bbb).*\.(?:com|info)$
This will match aaa or bbb and the string has to end with .com or .info
See it online here on Regexr
Here's the solution:
#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im
Function:
function get_emails($str){
preg_match_all('#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im', $str, $output);
if(is_array($output[0]) && count($output[0])>0) {
return array_unique($output[0]);
}
}
Cordially

Regex Problem in PHP

I'm attempting to utilize the following Regex pattern:
$regex = '/Name: [a-zA-Z ]*] [0-9]/';
When testing it in Rubular, it works fine, but when using PHP the expression never returns true, even when it should. Incidentally, if I remove the "[0-9]" part, it works fine. Is there some difference in PHP's regex syntax that I'm overlooking?
Edit:
I'm looking for the characters "Name:" then a name containing any number of letters or spaces, then a "]", then a space, then a single number. So
"Name: Chris] 5" would return true and
"Name: Chris] [lorem ipsum]" should return false.
I also tried escaping the second bracket "\[" but this did not fix the problem.
It's not clear without examples what your use case, but it seems like you want something like this?
$regex = '/Name\:\ ([\w]+)\ ([\w]+)/';
Update: try this:
$regex = '/Name\:\ [\w\s]+?\]\ [\d]{1}/';
For me this matches
Name: Foo Bar] 2
..but not these:
Name: Foo Bar] foo
Name: Foo Baz 5
I'm also using short-hand expressions for character classes:
[\w] is short for [a-zA-Z0-9] ( eg all alphanumeric characters )
[\s] matches any whitespace
[\d] matches any number
For safety I'm also using the '?' to match in a non-greedy way, to make sure thw [\w\s]+ match doesn't consume too much of the string.
i think this might be because of the space in the regex also u want to escape the second ]. try
$regex = '/Name:\s[a-zA-Z ]*\]\s[0-9]/';
Or use a modifier
$regex = '/Name: [a-zA-Z ]*\] [0-9]/x';
more on modifiers here
PHP: Possible modifiers in regex patterns - Manual
Your regex works nicely for me with the two examples you gave.
$arr = array('Name: Chris] 5', 'Name: Chris] [lorem ipsum]');
foreach ($arr as $str) {
if (preg_match('/Name: [a-zA-Z ]*] [0-9]/', $str)) {
echo "$str : OK\n";
} else {
echo "$str : KO\n";
}
}
Output:
Name: Chris] 5 : OK
Name: Chris] [lorem ipsum] : KO
May be there are more than one space between ] and the digit, so your regex should be:
[a-zA-Z ]*]\s+[0-9]/
I ended up resolving the issue by attempting different regexes that did basically the same thing. This is what ended up working:
$regex = '/Name: [\w ]*][^[]{2}/';
Evidently the brackets weren't the problem, but there was something in my original code that wasn't working properly. Thank you everyone for all the help.

Categories