I have text like in the example below
$text = "rami#gmail.com:Password
This email is from Gmail
email subscription is valid
omar#yahoo.com:password
this email is from yahoo
email subscription is valid ";
I want to be able to retrieve all email:password occurrence in the text without the rest of the description.
I tried preg_match but it returned 0 results and explode returns all text with the description.
Any help is greatly appreciated
Explode
Str_Pos
Preg_match
$text = "rami#gmail.com:Password
This email is from Gmail
email subscription is valid
omar#yahoo.com:password
this email is from yahoo
email subscription is valid ";
You can use regex to capture the email and passwords separately.
I capture anything of any length to a colon then anything again until new line with an optional space.
preg_match_all("/(.*#.*):(.*?)\s*\n/", $text, $matches);
$matches = array_combine(["match", "email", "password"], $matches);
var_dump($matches);
Output:
array(3) {
["match"]=>
array(2) {
[0]=>
string(24) "rami#gmail.com:Password
"
[1]=>
string(25) "omar#yahoo.com:password
"
}
["email"]=>
array(2) {
[0]=>
string(14) "rami#gmail.com"
[1]=>
string(14) "omar#yahoo.com"
}
["password"]=>
array(2) {
[0]=>
string(8) "Password"
[1]=>
string(8) "password"
}
}
https://3v4l.org/baeQ0
It's difficult to be confident/precise when dealing with unrealistic input strings, but this pattern extracts (does not validate) the email:password lines for you.
Match from the start of the line, match the known characters and in the negated character classes include whitespace characters to prevent matching the next line. You could use \n instead of \s if you like.
Code: (Demo)
$text = "rami#gmail.com:Password
This email is from Gmail
email subscription is valid
omar#yahoo.com:password
this email is from yahoo
email subscription is valid ";
var_export(preg_match_all('~^[^#\s]+#[^:\s]+:\S+~m', $text, $matches) ? $matches[0]: "none");
Output:
array (
0 => 'rami#gmail.com:Password',
1 => 'omar#yahoo.com:password',
)
...hmm, I guess it is okay to allow spaces in a password, but if so, then you cannot logically trim any spaces from the right side of the password. An alternative pattern to allow spaces which also provides separated capture groups could look like this: (See Demo with fringe case where password characters require specific pattern logic to prevent greedy matching in the first capture group.)
var_export(preg_match_all('~([^#\s]+#[^:\s]+):(.*)~', $text, $matches, PREG_SET_ORDER) ? $matches: "none");
I am favoring negated character classes [^...] over . (any character dot) because it allows the use of greedy quantifiers -- this affords the pattern greater efficiency (in terms of step count, anyhow).
Related
I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
Yes, there are lots of hashtag regex available here but none is suiting my needs. And no one is actually able to solve the problem.
The Regex should consider the following hashtags as valid:
#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid
Allowed Characters should be:
a-Z,0-9,öÖäÄüÜß,_
Max length should be 50 characters.
The main problem is the part where the hashtags is "connected" to another textpart. I don't know how to solve that problem.
This is what I attempted to do
/([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})/u
That one works pretty well but doesn't consider the "word#hashtag" - Problem.
I think your original expression is pretty great, we'd just modify that with:
^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$
Demo
Test
$re = '/^\s*#([\p{Pc}\p{N}\p{L}\p{Mn}]{1,50})$/um';
$str = '#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
Output
array(4) {
[0]=>
array(2) {
[0]=>
string(13) "#validhashtag"
[1]=>
string(12) "validhashtag"
}
[1]=>
array(2) {
[0]=>
string(14) "#valid_hashtag"
[1]=>
string(13) "valid_hashtag"
}
[2]=>
array(2) {
[0]=>
string(41) " #validhashtag_with_space_before_or_after"
[1]=>
string(39) "validhashtag_with_space_before_or_after"
}
[3]=>
array(2) {
[0]=>
string(35) "#valid_hashtag_chars_öÖäÄüÜß"
[1]=>
string(34) "valid_hashtag_chars_öÖäÄüÜß"
}
}
You may use either of the two below:
/(?<!\S)#\w+(?!\S)/u
/(?<!\S)#[\w\p{M}\p{Pc}]+(?!\S)/u
See the regex demo. If you want to restrict the word part length, keep your {1,50} quantifier - /(?<!\S)#\w{1,50}(?!\S)/u.
Also note: \w even with u modifier does not match the same chars that are are considered "word" in .NET, Java, Python re regex. You may decide to include other classes to fill the gap and use [\w\p{M}\p{Pc}]+ instead of just \w where \p{M} matches any diacritics and \p{Pc} matches any connector punctuation.
Details
(?<!\S) - a whitespace or start of string required right before
# - a # sign
\w+ - 1+ word chars (NOTE if you want to restrict its length from 1 to 50, replace + with {1,50}) (also, note that u modifier lets the PCRE engine to match any Unicode letters and digits with \w shorthand)
[\w\p{M}\p{Pc}] - matches 1+ word chars + all diacritics (\p{M}) and all connector punctuation (\p{Pc}, considered as word in .NET regex)
(?!\S) - a whitespace or end of string required right after.
PHP demo:
$s = "#validhashtag
#valid_hashtag
#validhashtag_with_space_before_or_after
#valid_hashtag_chars_öÖäÄüÜß
...and not valid shoulw be:
ipsum#notvalid //Not valid: Connected to Word
http://google.com/#results //Not valid: Same as above
#not-valid
#not!valid";
if (preg_match_all('~(?<!\S)#\w+(?!\S)~u', $s, $matches)) {
print_r($matches[0]);
}
Output:
Array
(
[0] => #validhashtag
[1] => #valid_hashtag
[2] => #validhashtag_with_space_before_or_after
[3] => #valid_hashtag_chars_öÖäÄüÜß
)
if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[/protected\]).)+)\s*\[/protected\]/g", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
The text is
<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>
But $matches when var_dump ed (outside the if statement), it gives out NULL
Help people!
You're using / (slash) as the regex delimiter, but you also have unescaped slashes in the regex. Either escape them or (preferably) use a different delimiter.
There's no g modifier in PHP regexes. If you want a global match, you use preg_match_all(); otherwise you use preg_match().
...but there is an s modifier, and you should be using it. That's what enables . to match newlines.
After changing your regex to this:
'~\[protected\]\s*((?:(?!\[/?protected\]).)+?)\s*\[/protected\]~s'
...I get this output:
array(2) {
[0]=>
array(1) {
[0]=>
string(42) "[protected]<br> STUFFFFFF<br>
[/protected]"
}
[1]=>
array(1) {
[0]=>
string(18) "<br> STUFFFFFF<br>"
}
}
string(93) "<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>"
Additional changes:
I switched to using single-quotes around the regex; double-quotes are subject to $variable interpolation and {embedded code} evaluation.
I shortened the lookahead expression by using an optional slash (/?).
I switched to using a reluctant plus (+?) so the whitespace following the closing tag doesn't get included in the capture group.
I changed the innermost group from capturing to non-capturing; it was only saving the last character in the matched text, which seems pointless.
$text= '<p>SDGDSFGDFGdsgdfog<br>
[protected]<br> STUFFFFFF<br>
[/protected]<br> SDGDSFGDFGdsgdfog</p>';
if (preg_match_all ("/\[protected\]\s*(((?!\[protected\]|\[\/protected\]).)+)\s*\[\/protected\]/x", $text, $matches)) {
var_dump($matches);
var_dump($text);
}
There is no g modifier in preg_match - you can read more at Pattern Modifiers . Using x modifier works fine thou.
I need to use a regex pattern , but what is the right php "decode" . my pattern is "similar" to BBcode i.e. ['something'] the 'something' could be "any length" but realistically I doubt not more than 10 chars/numbers. What is the correct php syntax to "unscrambe" i.e.
if ($row->xyz =['something'] ):
do this
else:
do that
endif;
Thanks in advance
A basic regexp to match BBCode style tags would look something like this:
preg_match('/\[[\/]?[A-Za-z0-9]+\]/', $row->xyz)
That will match anything that starts with a "[", ends with a "]", and has one or more alphanumeric characters in the middle (with an optional "/" for an end-tag.) Note it has flaws - for example, if you have a nested "[...]" in a larger "[...]", it will only grab the inner one. (i.e. [foo[bar]] will return only "[bar]".)
Example:
<?php
$regexp = '/\[[\/]?[A-Za-z0-9]+\]/';
$testString = '[i]An italic string with some [b]bold[/b] text.[/i]';
preg_match_all($regexp, $testString, $result);
print_r($result);
?>
Result:
array(1) {
[0]=> array(4) {
[0]=> string(3) "[i]"
[1]=> string(3) "[b]"
[2]=> string(4) "[/b]"
[3]=> string(4) "[/i]"
}
}
Of course, I'm not sure this is what you actually mean you want to do, but it is what you say you want to do. Are you sure you want to find BBCodes, rather than find strings that are wrapped in them?
I'd like a reg exp which can take a block of string, and find the strings matching the format:
....
And for all strings which match this format, it will extract out the email address found after the mailto:. Any thoughts?
This is needed for an internal app and not for any spammer purposes!
If you want to match the whole thing from :
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
To fastern and shortern it:
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
The 2nd matching group will be whatever email it is.
Example:
$html ='<div>test</div>';
$r = '`\<a([^>]+)href\=\"mailto\:([^">]+)\"([^>]*)\>(.*?)\<\/a\>`ism';
preg_match_all($r,$html, $matches, PREG_SET_ORDER);
var_dump($matches);
Output:
array(1) {
[0]=>
array(5) {
[0]=>
string(39) "test"
[1]=>
string(1) " "
[2]=>
string(13) "test#live.com"
[3]=>
string(0) ""
[4]=>
string(4) "test"
}
}
There are plenty of different options on regexp.info
One example would be:
\b[A-Z0-9._%+-]+#(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b
The "mailto:" is trivial to prepend to that.
/(mailto:)(.+)(\")/
The second matching group will be the email address.
You can work with the internal PHP filter http://us3.php.net/manual/en/book.filter.php
(they have one which is specially there for validating or sanitizing email -> FILTER_VALIDATE_EMAIL)
Greets
for me worked ~<mailto(.*?)>~
will return an array containing elements found.
Here you can test it: https://regex101.com/r/rTmKR4/1