I have the following input string which consists of multiple lines:
BYTE $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66 // comment
BYTE $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
I use the following preg_match statement to match the data part (so only the hexadecimal values) and not the preceding white space and text, nor the trailing white space and comment sections:
preg_match('/(\$.*?) /s', $sFileContents, $aResult);
The output is this:
output: Array
(
[0] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
[1] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
)
As you may be able to see, the match appears to be correct but the first input line is repeated twice. The 's' modifier should help me get past the end of line, but I cannot seem to get past the first line.
Does anyone have an idea of how to proceed?
You can match data from all lines easy:
preg_match_all('/\$[\dA-Fa-f,\$]+/', $sFileContents, $aResult);
echo "<pre>".print_r($aResult,true);
Output:
$aResultArray:
(
[0] => Array
(
[0] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$13,$14,$01,$19,$20,$01,$20,$17,$08,$09,$0C,$05,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
[1] => $66,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$20,$66
)
)
You don't need s (DOTALL) flag for this. You can use:
preg_match_all('/(\$[0-9A-Fa-f]{2}(?:,\$[0-9A-Fa-f]{2})+)/', $input, $m);
print_r($m[1]);
RegEx Demo
I am trying to get information out of a textarea that contains certain strings (e.g. [name]) and find each item encased in the square brackets using regex patterns (currently tried using preg_match, preg_split, preg_quote, preg_match_all). It seems that the problem is in my regex pattern that I am providing for it.
My current regex:
$menuItems = preg_match_all('/[^[][([^[].*)]/U', $_SESSION['emailBody'], $menuItems);
I have tried many other patterns e.g.
/(?[...]\w+): (?[...]\d+)/
Any help that can be provided with this is greatly appreciated.
EDIT:
Sample input:
[email] address [to] name [from] someone
Message displayed on var_dump of the $menuItems variable:
array(1) { [0]=> string(0) "" }
EDIT 2:
Thank you to everyone for the help and support with this, I am pleased to say that it is all up and running perfectly!
From the comment stream above, you can simplify the regular expression as follows:
preg_match_all('/\[(.*)\]/U', $_SESSION['emailBody'], $menuItems);
One thing to note:
preg_match_all() fills the array in its 3rd parameter with the results of the matches. Your example line then overwrites this array with the result of preg_match_all() (an integer).
You should then be able to iterate over the results by using the following loop:
foreach ($menuItems[1] as $menuItem) {
// ...
}
Escape the square brackets and remove the dot:
$menuItems = preg_match_all('/[^[]\[([^[]*)\]/U', $_SESSION['emailBody'], $menuItems);
// here __^ __^ ^
preg_match_all doesn't return a string. You have to add an array for the last parameter:
preg_match_all('/\[([^[\]]*)\]/U', $_SESSION['emailBody'], $matches);
The matches are in the array $matches
print_r($matches);
Working example:
$str = '[email] address [to] name [from] someone';
preg_match_all('/\[([^[\]]*)\]/U', $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => [email]
[1] => [to]
[2] => [from]
)
[1] => Array
(
[0] => email
[1] => to
[2] => from
)
)
Here is a simple solution. This regex will capture all items encased in brackets along with brackets as well.
If you don't want brackets in result change regex to $regex = "/(?:\\[(\\w+)\\])/mi";
$subject = "[email] address [to] name [from] someone";
$regex = "/(\\[\\w+\\])/mi";
$matches = array();
preg_match_all($regex, $subject, &$matches);
print_r($matches);
My regex is:
$regex = '/(?<=Α: )(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';
My content among others is:
Q: Email Address
A: name#example.com
Rad Software Regular Expression Designer says that it should work.
Various online sites return the correct results.
If I remove the (?<=Α: ) lookbehind the regex returns all emails correctly.
When I run it from php it returns no matches.
What's going on?
I've also used the specific type of regex (ie (?<=Email: ) with different content. It works just fine in that case.
You are not most likely not using DOTALL flag s here which will make DOT match newlines as well in your regex:
$str = <<< EOF
Q: Email Address
A: name#example.com
EOF;
if (preg_match_all('/(?<=A: )(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))/s',
$str, $arr))
print_r($arr);
OUTPUT:
Array
(
[0] => Array
(
[0] => name#example.com
)
[1] => Array
(
[0] => name#example.com
)
[2] => Array
(
[0] => name
)
[3] => Array
(
[0] => example.
)
[4] => Array
(
[0] => com
)
)
This is my newer monster script for verifying whether an e-mail "validates" or not. You can feed it strange things and break it, but in production this handles 99.99999999% of the problems I've encountered. A lot more false positives really from typos.
<?php
$pattern = '!^[^#\s]+#[^.#\s]+\.[^#\s]+$!';
$examples = array(
'email#email.com',
'my.email#email.com',
'e.mail.more#email.co.uk',
'bad.email#..email.com',
'bad.email#google',
'#google.com',
'my#email#my.com',
'my email#my.com',
);
foreach($examples as $test_mail){
if(preg_match($pattern,$test_mail)){
echo ("$test_mail - passes\n");
} else {
echo ("$test_mail - fails\n");
}
}
?>
Output
email#email.com - passes
my.email#email.com - passes
e.mail.more#email.co.uk - passes
bad.email#..email.com - fails
bad.email#google - fails
#google.com - fails
my#email#my.com - fails
my email#my.com - fails
Unless there's a reason for the look-behind, you can match all of the emails in the string with preg_match_all(). Since you're working with a string, you would slightly modify the regex slightly:
$string_only_pattern = '!\s([^#\s]+#[^.#\s]+\.[^#\s]+)\s!s';
$mystring = '
email#email.com - passes
my.email#email.com - passes
e.mail.more#email.co.uk - passes
bad.email#..email.com - fails
bad.email#google - fails
#google.com - fails
my#email#my.com - fails
my email#my.com - fails
';
preg_match_all($string_only_pattern,$mystring,$matches);
print_r ($matches[1]);
Output from string only
Array
(
[0] => email#email.com
[1] => my.email#email.com
[2] => e.mail.more#email.co.uk
[3] => email#my.com
)
The problem is that your regular expression contains Α, which has an accent over it, but the content contains A, which doesn't. So the lookbehind doesn't match.
I change the regex to:
$regex = '/(?<=A: )(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))/';
and it works.
Outside of your regex issue itself, you should really consider not trying to write your own e-mail address regex parser. See stackoverflow post: Using a regular expression to validate an email address on why -- upshot: the RFC is long and demanding on your regex abilities.
The A char in your subject is the "normal" char with the code 65 (unicode or ascii). But The A you use in the lookbehind of your pattern have the code 913 (unicode). They look similar but are different.