preg_match_all ignore words

preg_match_all ignore words - php

I try to create a regex to capture emails ending not .info/.con containing no aaa/bbb.
Is this the correct syntax?
Eg: // search email ending in .com/.info containing no aaa/bbb
preg_match_all('#((?=.*#.*(?:com|info))(!.*(?:aaa|bbb)).*)#ui', $html, $emails);
To get this:
caaac#ccc.com = no
ccc#ccbbb.com = no
cccc#cccc.com = good (address syntax correct + term absent before or after the #)
Thank you for your reply.
This syntax works fine SEE HERE (thank you to STEMA) except for a string that includes spaces.
e.g:
$string = "email1#address.com blah email2#aaaaess.com blah email3#address.info embbbil4#adress.com";
preg_match_all("#^(?!.*aaa)(?!.*bbb).*#.*\.(?:com|info)$#im", $string, $matches);
Cordially

Simply use a positive expression and check that it did not match anything.
if (preg_match(...) == 0)
Also, there is no need to use preg_match_all if you are just interested whether a pattern matched or not.

If I understand your requirements right, then this would be the regex you can use together with #Tomalak answer.
preg_match('#.*#.*(?:aaa|bbb)|\.(?:com|info)$#ui', $html, $emails);
This pattern matches the stuff you don't want.
.*#.*(?:aaa|bbb) matches aaa or bbb after the #
the \.(?:com|info)$ is the other part, this matches if your email address ends with .com or .info
You can see it online here on Regexr
Update:
.*(?:aaa|bbb).*\.(?:com|info)$
This will match aaa or bbb and the string has to end with .com or .info
See it online here on Regexr

Here's the solution:
#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im
Function:
function get_emails($str){
preg_match_all('#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im', $str, $output);
if(is_array($output[0]) && count($output[0])>0) {
return array_unique($output[0]);
}
}
Cordially

Related

Find next word after colon in regex

I am getting a result as a return of a laravel console command like
Some text as: 'Nerad'
Now i tried
$regex = '/(?<=\bSome text as:\s)(?:[\w-]+)/is';
preg_match_all( $regex, $d, $matches );
but its returning empty.
my guess is something is wrong with single quotes, for this i need to change the regex..
Any guess?

Note that you get no match because the ' before Nerad is not matched, nor checked with the lookbehind.
If you need to check the context, but avoid including it into the match, in PHP regex, it can be done with a \K match reset operator:
$regex = '/\bSome text as:\s*'\K[\w-]+/i';
See the regex demo
The output array structure will be cleaner than when using a capturing group and you may check for unknown width context (lookbehind patterns are fixed width in PHP PCRE regex):
$re = '/\bSome text as:\s*\'\K[\w-]+/i';
$str = "Some text as: 'Nerad'";
if (preg_match($re, $str, $match)) {
echo $match[0];
} // => Nerad
See the PHP demo

Just come from the back and capture the word in a group. The Group 1, will have the required string.
/:\s*'(\w+)'$/

PHP Regex Negation For Youtube URLs

Let's say I have HTML in a database that looks like this:
Hello world!
ABC
Blah blah blah...
https://www.youtube.com/watch?v=df82vnx07s
Blah blah blah...
<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>
Now I want to use PHP regex to grab the 2nd and 3rd URLs, but ignore the first.
The regex equation I have so far is:
\s*[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
It works pretty well, but I don't know how to make it exclude/negate the first type of URL, one which starts with: href="
Please help, thanks!

You can use the "negative lookbehind" regular expression feature to accomplish what you're after. I've modified the very beginning of your regex by adding ((?<!href=[\'"])http) to implement one. Hope it helps!
$regex = '/((?<!href=[\'"])http)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)/';
$useCases = [
1 => 'ABC',
2 => "<a href='https://www.youtube.com/watch?v=m7t75u72vd'>ABC</a>",
3 => 'https://www.youtube.com/watch?v=df82vnx07s',
4 => '<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>'
];
foreach ($useCases as $index => $useCase) {
$matches = [];
preg_match($regex, $useCase, $matches);
if ($matches) {
echo 'The regex was matched in usecase #' . $index . PHP_EOL;
}
}
// Echoes:
// The regex was matched in usecase #3
// The regex was matched in usecase #4

All you need is to add a (?![^<]*>) negative lookahead that will fail the match if the match is followed with 0+ chars other than < followed with >:
[a-zA-Z\/:.]*youtu(?:be\.com\/watch\?v=|\.be\/)([a-zA-Z0-9\-_]+)(?![^<]*>)
^^^^^^^^^^
See the regex demo
Note I also escaped . symbols to match literal dots, and used a non-capturing group with be part. You may replace ([a-zA-Z0-9\-_]+) with [a-zA-Z0-9_-]+ if you are not interested in the capture, and you also may replace [a-zA-Z\/\/:\.]* part with a more precise pattern, like https?:\/\/[a-zA-Z.]*.

Example solution:
(?![^<]*>)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
Visualization with an explanation

Regex take everything after word and before character in PHP

I'm trying to get regex to work to take everything after "test" and before "#" in an email so "test-12345#example.com would become 12345.
I've got this far to get it to return everything before the "#" symbol. (Working in PHP)
!(\d+)#!

Either you can use capturing groups and use the regex
test-(\d+)#
and use $1 or use lookaheads and behinds like (?<=test-)\d+(?=#) which will just match 12345

(?<=test-)[^#]+
You can try this.No need to use groups.See demo.
https://regex101.com/r/eZ0yP4/28

You want everything between test and # so don't use \d.
$myRegexPattern = '#test([^#])*##Ui';
preg_match ($myRegexPattern, $input, $matches);
$whatYouNeed = $matches[1];

Try this
$input = 'test-12345#example.com';
$regexPattern = '/^test(.*?)\#/';
preg_match ($regexPattern, $input, $matches);
$whatYouNeed = $matches[1];
var_dump($whatYouNeed);

How to use preg_match to extract data?

I am pretty new to the use of preg_match. Searched a lot for an answer before posting this question. Found a lot of posts to get data based on youtube ID etc. But nothing as per my needs. If its silly question, please forgive me.
I need to get the ID from a string with preg_match. the string is in the format
[#1234] Subject
How can I extract only "1234" from the string?

One solution is:
\[#(\d+)\]
This matches the left square bracket and pound sign [#, then captures one or more digits, then the closing right square bracket ].
You would use it like:
preg_match( '/\[#(\d+)\]/', '[#1234] Subject', $matches);
echo $matches[1]; // 1234
You can see it working in this demo.

You can try this:
preg_match('~(?<=\[#)\d+(?=])~', $txt, $match);
(?<=..) is a lookbehind (only a check)
(?=..) is a lookahead

Your regular expression:
preg_match('/^\[\#([0-9]+)\].+/i', $string, $array);

That's a way you could do it:
<?php
$subject = "[#1234] Subject";
$pattern = '/^\[\#([0-9]+)/';
preg_match($pattern, $subject, $matches);
echo $matches[1]; // 1234
?>

To get only the integer you can use subpatterns http://php.net/manual/en/regexp.reference.subpatterns.php
$string="[#1234] Subject";
$pattern="/\[#(?P<my_id>\d+)](.*?)/s";
preg_match($pattern,$string,$match);
echo $match['my_id'];

regex function[filename] pattern and function[string_with_escaped_characters] pattern

I'm trying to script and parse a file,
Please help with regex in php to find and replace the following patterns:
From: "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]"
To: "This is a bar_txt_content within a bar2_txt_content"
Something along those lines:
$subject = "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]";
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo['.$match[0].']', file_get_contents($match[0]), $subject);
}
And my second request is to have:
From: 'This is a foo2[bar bar ] bar bar].'
To: "this is a returned"
Something along those lines:
$subject = 'This is a foo2[bar bar \] bar bar].';
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo2['.$match[0].']', my_function($match[0]), $subject);
}
Please help in constructing these patterns...

If you always have a structure like foo[ ... ]
Then is very easy:
foo\[([^]]+)\]
That is .NET syntax but i'm sure the expressions is simple enough for you to convert.
Description of the regex:
Match the characters “foo” literally «foo»
Match the character “[” literally «[»
Match the regular expression below and capture its match into backreference number 1 «([^]]+)»
Match any character that is NOT a “]” «[^]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “]” literally «]»

Luc,
this should help you get started.
http://php.net/manual/en/function.preg-replace.php
You may have to setup a loop and increase the counter, using preg_replace with a limit of 1 to replace only the first instance.
In order to match foo[/www/bar.txt]:
the regex should be something like:
foo\[\/www\/([A-Za-z0-9]*)\.txt\]
The backslashes are there to cancel the special meaning of some characters in your regexp.
It will match foo[/www/.[some file name].txt, and ${1} will contain the filename without the .txt as brackets form groups which can be used in the replaced expression. ${1} will contain what was matched in the first round brackets, ${2} will contain what was matched in the second one, etc ...
Therefore your replaced expression should be something like "${1}_txt_content". Or in the second iteration "${1}2_txt_content".
[A-Za-z0-9]* means any alphanumeric character 0 or more times, you may want to replace the * with a + if you want at least 1 character.
So try:
$pattern = foo\[\/www\/([A-Za-z0-9]*)\.txt\];
$replace = "${1}_txt_content";
$total_count = 1;
do {
echo preg_replace($pattern, $replace, $subject, 1, $count);
$replace = "${1}" + ++$total_count + "_txt_content";
} while ($count != 0);
(warning, this is my first ever PHP program, so it may have mistakes as I cannot test it ! but I hope you get the idea)
Hope that helps !
Tony
PS: I am not a PHP programmer but I know this works in C#, for example, and looking at the PHP documentation it seems that it should work.
PS2: I always keep this website bookmarked for reference when I need it: http://www.regular-expressions.info/

$pattern = '/\[([^\]]+)\]/';
preg_match_all($pattern, $subject, $matches);
print_r($matches['1']);

found the correct regex I needed for escaping:
'/foo\[[^\[]*[^\\\]\]/'

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match_all ignore words - php

Simply use a positive expression and check that it did not match anything. if (preg_match(...) == 0) Also, there is no need to use preg_match_all if you are just interested whether a pattern matched or not.

Related

Find next word after colon in regex

PHP Regex Negation For Youtube URLs

Regex take everything after word and before character in PHP

How to use preg_match to extract data?

regex function[filename] pattern and function[string_with_escaped_characters] pattern

Categories

Resources