Regexp for extracting a mailto: address - php

I'd like a reg exp which can take a block of string, and find the strings matching the format:
....
And for all strings which match this format, it will extract out the email address found after the mailto:. Any thoughts?
This is needed for an internal app and not for any spammer purposes!

If you want to match the whole thing from :
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
To fastern and shortern it:
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
The 2nd matching group will be whatever email it is.
Example:
$html ='<div>test</div>';
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
var_dump($matches);
Output:
array(1) {
[0]=>
array(5) {
[0]=>
string(39) "test"
[1]=>
string(1) " "
[2]=>
string(13) "test#live.com"
[3]=>
string(0) ""
[4]=>
string(4) "test"
}
}

There are plenty of different options on regexp.info
One example would be:
\b[A-Z0-9._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b
The "mailto:" is trivial to prepend to that.

/(mailto:)(.+)(\")/
The second matching group will be the email address.

You can work with the internal PHP filter http://us3.php.net/manual/en/book.filter.php
(they have one which is specially there for validating or sanitizing email -> FILTER_VALIDATE_EMAIL)
Greets

for me worked ~<mailto(.*?)>~
will return an array containing elements found.
Here you can test it: https://regex101.com/r/rTmKR4/1

Related

What is the patern to search for any string which respect this format "CEC0000-0000"?

The zeros can be incremented but it must be of four digits, so it could be CEC0152-2005
Of course with a "-" between them.
I used www.txt2re.com to generate this patern but it didn't help me.
Maybe,
^[A-Z]{3}[0-9]{4}-[0-9]{4}$
or,
^CEC[0-9]{4}-[0-9]{4}$
might work fine.
Test
$re = '/^[A-Z]{3}[0-9]{4}-[0-9]{4}$/m';
$str = 'CEC0152-2005
CEC0152-2019
CEC0152-1999
CEC0152-19991';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(3) {
[0]=>
array(1) {
[0]=>
string(12) "CEC0152-2005"
}
[1]=>
array(1) {
[0]=>
string(12) "CEC0152-2019"
}
[2]=>
array(1) {
[0]=>
string(12) "CEC0152-1999"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
If after the dash we'd have a four-digit year,
^[A-Z]{3}[0-9]{4}-[12][0-9]{3}$
^CEC[0-9]{4}-[12][0-9]{3}$
might also work fine, I guess.
Demo 2

preg_replace replace only once even if the match is found

my HTML form code replaces some words with <-#word#-> using the code
$string = preg_replace("/($p)/i", '<-#$1#->', $string);
the problem is that if the form has some errors, upon resubmitting the form the word becomes <-#<-#<-#word#->#->#-> every time someone resubmits the form. Is it possible to replace but if it is already replaced then do not.
This is what I tried using NOT operator but it is not working
$string = preg_replace("/^(<-#)($p)^(#->)/i", '<-#$1#->', $string);
You could use a negative lookarounds to assert what is directly on the left an on the right is not <-# and
(?<!<-#)(word)(?!#->)
Regex demo | Php demo
Your code could look like:
$string = preg_replace("/(?<!<-#)($p)(?!#->)/i", '<-#$1#->', $string);
Another method might be to check with preg_match_all() to ensure if your matches are returning:
$string = '<-#<-#<-#Any alphanumeric input that user may wish#->#->#->';
preg_match_all("/(<-#)+([A-Za-z0-9_\s]+)(#->)+/s", $string, $matches);
$string = '<-#' . $matches[2][0] . '#->';
var_dump($string);
which outputs:
string(47) "<-#Any alphanumeric input that user may wish#->"
var_dump($matches); would return:
array(4) {
[0]=>
array(1) {
[0]=>
string(59) "<-#<-#<-#Any alphanumeric input that user may wish#->#->#->"
}
[1]=>
array(1) {
[0]=>
string(3) "<-#"
}
[2]=>
array(1) {
[0]=>
string(41) "Any alphanumeric input that user may wish"
}
[3]=>
array(1) {
[0]=>
string(3) "#->"
}
}

Regexp for string which shouldn't contain two known chars

For example
I have a string like "12345%67890"
Regexp [^%]* gives me 12345.
How to get the same result, if I need to use not "%", but "<%" for example.Thanks a lot.
A bit more information:
I have a huge text, where I make some replacements between %%, like %test% I change to something else using preg_match_all and preg_replace, but if % was used not like a separator, everything crashes. Ex: %test 90% test%, so I've decided to change % to something more complicated like <% test 90% test %>.
Based on your new information it sounds like you control the output, which makes this all kind of weird.
In any case, here's a regex that will capture the contents of the wrapper you've created:
<%(.+?)%>
Notice the ? for a lazy match.
Code sample:
$string = "asdfar <%test123%>farasr%<5 sara><%90% is cool%%><%ooooaaaah%>>>%<%>%%";
preg_match_all('/<%(.+?)%>/', $string, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
array(3) {
[0]=>
string(11) "<%test123%>"
[1]=>
string(16) "<%90% is cool%%>"
[2]=>
string(13) "<%ooooaaaah%>"
}
[1]=>
array(3) {
[0]=>
string(7) "test123"
[1]=>
string(12) "90% is cool%"
[2]=>
string(9) "ooooaaaah"
}
}
Seems to me you should be doing a split, not a match:
$subject = "12345<%67890";
$result = preg_split('/<%/', $subject);
print_r($result);
output:
Array
(
[0] => 12345
[1] => 67890
)

PHP: regex to match complete matching brackets?

In PHP I have the following string:
$text = "test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value}{blabla}{option:second{B}.Value}
{option:third{C}.Value}{option:fourth{D}}
{option:fifth}
test 2
";
I need to get all {option...} out of this string (5 in total in this string). Some have multiple nested brackets in them, and some don't. Some are on the same line, some are not.
I already found this regex:
(\{(?>[^{}]+|(?1))*\})
so the following works fine :
preg_match_all('/(\{(?>[^{}]+|(?1))*\})/imsx', $text, $matches);
The text that's not inside curly brackets is filtered out, but the matches also include the blabla-items, which I don't need.
Is there any way this regex can be changed to only include the option-items?
This problem is far better suited to a proper parser, however you can do it with regex if you really want to.
This should work as long as you're not embedding options inside other options.
preg_match_all(
'/{option:((?:(?!{option:).)*)}/',
$text,
$matches,
PREG_SET_ORDER
);
Quick explanation.
{option: // literal "{option:"
( // begin capturing group
(?: // don't capture the next bit
(?!{option:). // everything NOT literal "{option:"
)* // zero or more times
) // end capture group
} // literal closing brace
var_dumped output with your sample input looks like:
array(5) {
[0]=>
array(2) {
[0]=>
string(23) "{option:first{A}.Value}"
[1]=>
string(14) "first{A}.Value"
}
[1]=>
array(2) {
[0]=>
string(24) "{option:second{B}.Value}"
[1]=>
string(15) "second{B}.Value"
}
[2]=>
array(2) {
[0]=>
string(23) "{option:third{C}.Value}"
[1]=>
string(14) "third{C}.Value"
}
[3]=>
array(2) {
[0]=>
string(18) "{option:fourth{D}}"
[1]=>
string(9) "fourth{D}"
}
[4]=>
array(2) {
[0]=>
string(14) "{option:fifth}"
[1]=>
string(5) "fifth"
}
}
Try this regular expression - it was tested using .NET regular expressions, it may work with PHP as well:
\{option:.*?{\w}.*?}
Please note - I'm assuming that you have only 1 pair of brackets inside, and inside that pair you have only 1 alphanumeric character
I modified your initial expression to search for the string '(option:)' appended with non-whitespace characters (\S*), bounded by curly braces '{}'.
\{(option:)\S*\}
Given your input text, the following entries are matched in regexpal:
test 1
{blabla:database{test}}
{blabla:testing}
{option:first{A}.Value} {option:second{B}.Value}
{option:third{C}.Value}
{option:fourth{D}}
{option:fifth}
test 2
If you don't have multiple pairs of brackets on the same level this should works
/(\{option:(([^{]*(\{(?>[^{}]+|(?4))*\})[^}]*)|([^{}]+))\})/imsx

PHP preg_match_all same line

Having trouble with a regular expression (they are not my strong suit). I'm trying to match all strings between {{ and }}, but if a set of brackets occurs on the same line, it counts that as a single match... Example:
$string = "
Hello, kind sir
{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}
welcome to
{{SHOULD_MATCH3}}
";
preg_match_all("/{{(.*)}}/", $string, $matches);
var_dump($matches); // returns arrays with 2 results instead of 3
returns:
array(2) {
[0]=>
array(2) {
[0]=>
string(35) "{{SHOULD_MATCH1}} {{SHOULD_MATCH2}}"
[1]=>
string(17) "{{SHOULD_MATCH3}}"
}
[1]=>
array(2) {
[0]=>
string(31) "SHOULD_MATCH1}} {{SHOULD_MATCH2"
[1]=>
string(13) "SHOULD_MATCH3"
}
}
Any help? Thanks!
Replace the * quantifier with its non-greedy form *?.
This will make it match as little as possible while still allowing the expression to match as a whole, which is different from its current behavior of matching as much as possible.
You can use one the following patterns.
{{(.+?)}
{{([^}]+)
{{(\w+)
{{([[:digit:][:upper:]_]+)
{{([\p{Lu}\p{N}_]+)

Categories