regular expression to detect mentions but not detecting emails

regular expression to detect mentions but not detecting emails - php

I have the following code:
preg_match('/#([^# ]+)/', $image->caption->text, $matches)
and I wanted to basically detect mentions in a string. However the issue now is that it is confused with email address such that it detects email as a mention, so for example if I have aksdjasd#yahoo.com then this counts as a match. I guess what I want to say here is that before the # sign there should be a space. But how do I put that in to this regex?
EDIT:
I also wanted to detect #mentions at the beginning of the string as well

before the # sign there should be a space
You can use lookbehind (edited based on OP's comment below):
preg_match('/(?<= |^)#[^# ]+/', $image->caption->text, $matches);
Working Demo

My take:
preg_match('/(?<=\W|^)#(\w+)/', "#Easy? No.#anubhava try harder! #\t", $matches);
preg_match('/(?<=\W|^)#(\w+)/', "Easy? No.#anubhava try harder! #\t", $matches);
preg_match('/(?<=\W|^)#(\w+)/', "Easy? No.anubhava try harder! e#m #\t", $match);
Correctly recognizes #Easy , #anubhava and not the tab or email.

You can use look behinds to check behind your regex.
Try using: preg_match('/(?<![^\s])#([^#\s]+)/', $image->caption->text, $matches);
Important thing to note here, is that you also want to use \s in the class you check for AFTER the #. If not, you could end up matching #stuff<NEWLINE><nonmatch>. I did some quick testing and found the problems with just using space for a number of reasons. Here is a link to the tests.
This is also using a negative look behind, because if the mention is at the front of the string it would not match using a positive look behind. You have to account for blank in front of the positive matches.
I think just checking for a space alone could have risks. I think you might want to check for any whitespace character, just in case the mention is at the very beginning of the string.

Related

(preg_match ('#^/thank-you/hello/#', $_SERVER['REQUEST_URI'])

So basically I'm trying to select all content that is in /thank-you/hello/, so this can be /thank-you/hello/x/, /thank-you/hello/y/, /thank-you/hello/z/, etc.
This is what I'm using right now:
preg_match ('#^/thank-you/hello/#', $_SERVER['REQUEST_URI']
This block of code only works for stuff that is in /thank-you/hello/.
How should I change this snippet to include all the other folders that are after /hello/?

I suggest you read more about regex
I also recommend regex101 to test and study the site
In the desired pattern you can replace the desired word from .*?
.: Matches any character other than newline (or including line terminators with the /s flag)
a*: Matches zero or more consecutive a characters.
a?: Matches an a character or nothing.
They may seem a little incomplete without their examples
I suggest you see their examples on regex101
example:
preg_match('#^/thank-you/hello/.*?/#', $_SERVER['REQUEST_URI']);
It may not be exactly what you want
Or something may increase or decrease later and you may want to make a change
I think everyone should learn regex so that they can implement what they want according to their own desires.
I do not think it is a good idea to use patterns that you do not know what they mean

Regex For PHP Code?

I have the following code
<?
php drupal_set_message("Your registration submission has been received.");
drupal_goto("/events-initiatives/events-listing");
?>
And I want to remove everything but the Your registration submission has been received. and this message will change, so I need it to be a wildcard. So it would also make say
<?php
drupal_set_message("Testing!!!");
drupal_goto("/events-initiatives/events-listing");
?>
But I can't figure out how to do the PHP code, my current one is
preg_replace('#(<?php drupal_set_message(").*?("); drupal_goto("/guidelines-resources/professionals/lending-library"); ?>)#', '$1$2', $string);
but that isn't working, it seems to have problems with the ( in it.
Any idea how I could do this?

From looking at your original post, (before your regex was changed into a PHP snippet) I'd suggest you are looking for a regex along these lines:
#<\?php\s+drupal_set_message\(".*?"\);\s+drupal_goto\("/guidelines-resources/professionals/lending-library"\);\s+\?>#
Note that this regex:
escapes all special characters (e.g., ?, ( and )) with preceding slashes
replaces a single space with \s+ which matches one or more consecutive whitespace characters
EDIT
After rereading your question, if the only thing you want left is the text that is passed as an argument to drupal_set_message, then try this:
$pattern = '#\bdrupal_set_message\("(.*?)"\)#';
$found = preg_match($pattern, $subject, $matches);
// if found, $matches[1] will contain the argument to drupal_set_message

You can escape the special characters (though really, just the open and close parentheses) with backslashes. On a side note, if you have a decent IDE then it should have sophisticated regex-capable search-and-replace; use it (although if you do, you'll probably need to also escape the forward slashes, as those are the most likely delimiters that your IDE would use).

Regex ignore URL already in HTML tags

I'm having a little problem with my Regex
I've made a custom BBcode for my website, however I also want URLs to be parsed too.
I'm using preg_replace and this is the pattern used to identify URLS:
/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/is
Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:
//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>
I tried using this:
/([^"][\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/][^"])/is
With no luck.
Any help will be appreciated.
Edit:
For solution See the 2nd comment on stema's answer.

Try this
(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])
See it here on Regexr
To make it more general you can simplify your lookbehind to check only for "=""
(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])
See it on Regexr
(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.
\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.

Is this a PHP preg_match bug or am I doing something wrong?

I use the following regular expression to match valid domain names:
/^([a-z0-9]([a-z0-9]*-*[a-z0-9])*\.)+[a-z][a-z]+$/
This works fine. But when I replace part of it with a domain name to match the domain name itself and sub domains of it, it doesn't work. For example, if I use
/^([a-z0-9]([a-z0-9]*-*[a-z0-9])*\.)*mycarbrokedown.be$/
to match ns1.mycarbrokedown.be, preg_match returns 0.
I've used a couple online testers which confirm that my regular expression does match my string. Curiously, regextester.com doesn't return anything when I use the preg option.
All of this leads me to think that it's a bug in PHP. As I have no idea what's causing the bug, I haven't been able to find matching bug reports.
What's going on here?

Please try this regex and let me know of the results. It might be that you have whitespace or something else in front/the end of the string and that's the reason it doesn't match, so I removed ^ and &. Also this is a regex from RegexBuddy Library.
preg_match_all('/([a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}/i', $subject, $result, PREG_PATTERN_ORDER);

Need a good regex to convert URLs to links but leave existing links alone

I have a load of user-submitted content. It is HTML, and may contain URLs. Some of them will be <a>'s already (if the user is good) but sometimes users are lazy and just type www.something.com or at best http://www.something.com.
I can't find a decent regex to capture URLs but ignore ones that are immediately to the right of either a double quote or '>'. Anyone got one?

Jan Goyvaerts, creator of RegexBuddy, has written a response to Jeff Atwood's blog that addresses the issues Jeff had and provides a nice solution.
\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&##/%=~_|$?!:,.]*[A-Z0-9+&##/%=~_|$]
In order to ignore matches that occur right next to a " or >, you could add (?<![">]) to the start of the regex, so you get
(?<![">])\b(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&##/%=~_|$?!:,.]*[A-Z0-9+&##/%=~_|$]
This will match full addresses (http://...) and addresses that start with www. or ftp. - you're out of luck with addresses like ars.userfriendly.org...

This thread is old as the hills, but I came across it while working on my own problem: That is, convert any urls into links, but leave alone any that are already within anchor tags. After a while, this is what has popped out:
(?!(?!.*?<a)[^<]*<\/a>)(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&#/%=~_|$?!:,.]*[A-Z0-9+&#/%=~_|$]
With the following input:
http://www.google.com
http://google.com
www.google.com
<p>http://www.google.com<p>
this is a normal sentence. let's hope it's ok.
www.google.com
This is the output of a preg_replace:
http://www.google.com
http://google.com
www.google.com
<p>http://www.google.com<p>
this is a normal sentence. let's hope it's ok.
www.google.com
Just wanted to contribute back to save somebody some time.

I made a slight modification to the Regex contained in the original answer:
(?<![.*">])\b(?:(?:https?|ftp|file)://|[a-z]\.)[-A-Z0-9+&#/%=~_|$?!:,.]*[A-Z0-9+&#/%=~_|$]
which allows for more subdomains, and also runs a more full check on tags. To apply this to PHP's preg replace, you can use:
$convertedText = preg_replace( '#(?<![.*">])\b(?:(?:https?|ftp|file)://|[a-z]\.)[-A-Z0-9+&#/%=~_|$?!:,.]*[A-Z0-9+&#/%=~_|$]#i', '\0', $originalText );
Note, I removed # from the regex, in order to use it as a delimiter for preg_replace. It's pretty rare that # would be used in a URL anyway.
Obviously, you can modify the replacement text, and remove target="_blank", or add rel="nofollow" etc.
Hope that helps.

To skip existing ones just use a look-behind - add (?<!href=") to the beginning of your regular expression, so it would look something like this:
/(?<!href=")http://\S*/
Obviously this isn't a complete solution for finding all types of URLs, but this should solve your problem of messing with existing ones.

if (preg_match('/\b(?<!=")(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[A-Z0-9+&##\/%=~_|](?!.*".*>)(?!.*<\/a>)/i', $subject)) {
# Successful match
} else {
# Match attempt failed
}

Shameless plug: You can look here (regular expression replace a word by a link) for inspiration.
The question asked to replace some word with a certain link, unless there already was a link. So the problem you have is more or less the same thing.
All you need is a regex that matches a URL (in place of the word). The simplest assumption would be like this: An URL (optionally) starts with "http://", "ftp://" or "mailto:" and lasts as long as there are no white-space characters, line breaks, tag brackets or quotes).
Beware, long regex ahead. Apply case-insensitively.
(href\s*=\s*['"]?)?((?:http://|ftp://|mailto:)?[^.,<>"'\s\r\n\t]+(?:\.(?![.<>"'\s\r\n])[^.,!<>"'\s\r\n\t]+)+)
Be warned - this will also match URLs that are technically invalid, and it will recognize things.formatted.like.this as an URL. It depends on your data if it is too insensitive. I can fine-tune the regex if you have examples where it returns false positives.
The regex will produce two match groups. Group 2 will contain the matched thing, which is most likely an URL. Group 1 will either contain an empty string or an 'href="'. You can use it as an indicator that this match occurred inside a href parameter of an existing link and you don't have to do touch that one.
Once you confirm that this does the right thing for you most of the time (with user supplied data, you can never be sure), you can do the rest in two steps, as I proposed it in the other question:
Make a link around every URL there is (unless there is something in match group 1!) This will produce double nested <a> tags for things that have a link already.
Scan for incorrectly nested <a> tags, removing the innermost one

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regular expression to detect mentions but not detecting emails - php

before the # sign there should be a space You can use lookbehind (edited based on OP's comment below): preg_match('/(?<= |^)#[^# ]+/', $image->caption->text, $matches); Working Demo

Related

(preg_match ('#^/thank-you/hello/#', $_SERVER['REQUEST_URI'])

Regex For PHP Code?

Regex ignore URL already in HTML tags

Is this a PHP preg_match bug or am I doing something wrong?

Need a good regex to convert URLs to links but leave existing links alone

Categories

Resources