Is it possible to shorten this code using preg_replace instead of preg_match?
I am using it to remove quoted text from an email body.
Quoted text being when you quote somebody when you reply to an email.
# Get rid of any quoted text in the email body
# stripSignature removes signatures from the email
# $body is the body of an email (All headers removed)
$body_array = explode("\n", $this->stripSignature($body));
$new_body = "";
foreach($body_array as $key => $value)
{
# Remove hotmail sig
if($value == "_________________________________________________________________")
{
break;
# Original message quote
}
elseif(preg_match("/^-*(.*)Original Message(.*)-*/i",$value,$matches))
{
break;
# Check for date wrote string
}
elseif(preg_match("/^On(.*)wrote:(.*)/i",$value,$matches))
{
break;
# Check for From Name email section
}
elseif(preg_match("/^On(.*)$fromName(.*)/i",$value,$matches))
{
break;
# Check for To Name email section
}
elseif(preg_match("/^On(.*)$toName(.*)/i",$value,$matches))
{
break;
# Check for To Email email section
}
elseif(preg_match("/^(.*)$toEmail(.*)wrote:(.*)/i",$value,$matches))
{
break;
# Check for From Email email section
}
elseif(preg_match("/^(.*)$fromEmail(.*)wrote:(.*)/i",$value,$matches))
{
break;
# Check for quoted ">" section
}
elseif(preg_match("/^>(.*)/i",$value,$matches))
{
break;
# Check for date wrote string with dashes
}
elseif(preg_match("/^---(.*)On(.*)wrote:(.*)/i",$value,$matches))
{
break;
# Add line to body
}
else {
$new_body .= "$value\n";
}
}
This almost works, but it keeps the first line "On Mon, Jul 30, 2012 at 10:54 PM, Persons Name wrote:"
$body = preg_replace('/(^\w.+:\n)?(^>.*(\n|$))+/mi', "", $body);
There's probably a more elegant way of doing this, but this should work (assuming the regexps are correct):
$search = array(
"/^-*.*Original Message.*-*/i",
"/^On.*wrote:.*/i",
"/^On.*$fromName.*/i",
"/^On.*$toName.*/i",
"/^.*$toEmail.*wrote:.*/i",
"/^.*$fromEmail.*wrote:.*/i",
"/^>.*/i",
"/^---.*On.*wrote:.*/i"
);
$body_array = explode("\n", $this->stripSignature($body));
$body = implode("\n", array_filter(preg_replace($search, '', $body_array)));
// or just
$body = str_replace(
array("\0\n", "\n\0"), '', preg_replace($search, "\0", $body)
);
Edit: As inhan pointed out, the dots in the email addresses (and potentially other special characters) need to be escaped using preg_quote() prior to being inserted into the patterns.
Related
I have a strange issue. My SaaS receives leads from various third party companies via email (I know ... its old school). I use the pipe to a program feature in cpanel to send the email to a script that parses out the to from subject and message. In the message is a xml message. Here is the code for this (php):
$lines = explode("\n", $email_content);
// initialize variable which will assigned later on
$from = "";
$subject = "";
$headers = "";
$message = "";
$is_header= true;
//loop through each line
for ($i=0; $i < count($lines); $i++) {
if ($is_header) {
// hear information. instead of main message body, all other information are here.
$headers .= $lines[$i]."\n";
// Split out the subject portion
if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
$subject = $matches[1];
}
//Split out the sender information portion
if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
$from = $matches[1];
}
} else {
// content/main message body information
//$message .= $lines[$i]."\n";
$message .= $lines[$i];
}
if (trim($lines[$i])=="") {
// empty line, header section has ended
$is_header = false;
}
}
Then I just insert the variables into my MySQL database. For some of the emails this works perfect. But some emails the xml gets odd characters inserted. An example below
<?xml version=3D"1.0" encoding=3D"UTF-8"?>=0D=0A<?adf version=3D"1.0"?>==0D=0A<adf>
Notice the '3D' and '=0D=0A' .... I believe this may have to do with the encoding? Some are UTF 8, UTF 16 and the ones that work do not have a UTF assigned.
I am inserting the xml part into a 'TEXT' type as MySQL does not seem to have a XML option.
I want to match only the first letter of the string i.e 'bot'.
For ex:
It should run a function if user types "bot hi" and should not work if they type "hi bot there"
if(preg_match('[bot ]', strtolower($message))) {
$msg = str_replace('bot ', '', $message);
//Some message response
}
the above code works even if I type "hi bot there"
You should use the ^ for tell at begin of the string
if ( preg_match("/^bot (.*)/i", $message) ) {
$msg = str_replace('bot ', '', $message);
//Some message response
}
You can check this with strpos() :
$str = "bot hi";
if (0 === strpos($str, 'bot')) {
echo("str starts with bot");
else{
echo("str does not starts with bot");
}
}
see this code below:
comes from: http://www.damnsemicolon.com/php/php-parse-email-body-email-piping
//get rid of any quoted text in the email body
$body_array = explode("\n",$body);
$message = "";
foreach($body_array as $key => $value){
//remove hotmail sig
if($value == "_________________________________________________________________"){
break;
//original message quote
} elseif(preg_match("/^-*(.*)Original Message(.*)-*/i",$value,$matches)){
break;
//check for date wrote string
} elseif(preg_match("/^On(.*)wrote:(.*)/i",$value,$matches)) {
break;
//check for From Name email section
} elseif(preg_match("/^On(.*)$fromName(.*)/i",$value,$matches)) {
break;
//check for To Name email section
} elseif(preg_match("/^On(.*)$toName(.*)/i",$value,$matches)) {
break;
//check for To Email email section
} elseif(preg_match("/^(.*)$toEmail(.*)wrote:(.*)/i",$value,$matches)) {
break;
//check for From Email email section
} elseif(preg_match("/^(.*)$fromEmail(.*)wrote:(.*)/i",$value,$matches)) {
break;
//check for quoted ">" section
} elseif(preg_match("/^>(.*)/i",$value,$matches)){
break;
//check for date wrote string with dashes
} elseif(preg_match("/^---(.*)On(.*)wrote:(.*)/i",$value,$matches)){
break;
//add line to body
} else {
$message .= "$value\n";
}
}
//compare before and after
echo "$body<br><br><br>$message";
$body contains the complete email body including quoted area if this is a reply, this loop removes quoted area to get new reply as $message. But as suggested there, loop is slow and better to use preg_replace instead. so how can I do?
replace patterns with what? should I remove foreach loop too? I created below without foreach loop but seems wrong? please advice.
$patterns = array(
"_________________________________________________________________",
"/^-*(.*)Original Message(.*)-*/i",
"/^On(.*)wrote:(.*)/i",
"/^On(.*)$fromName(.*)/i",
"/^On(.*)$toName(.*)/i",
"/^(.*)$toEmail(.*)wrote:(.*)/i",
"/^(.*)$fromEmail(.*)wrote:(.*)/i",
"/^>(.*)/i",
"/^---(.*)On(.*)wrote:(.*)/i");
$message = preg_replace($patterns, '', $body);
You already narrowed it down to a workable solution. Only a few things to fix:
As #mario commented, you need to set the /m modifier for ^s to match at the beggining of each line.
Your first pattern needs to be enclosed with delimiters, and anchored to ^ and to the end of line to mantain the same meaning as in the original code.
Include the newline chars in order to remove the whole line.
Make sure the variables $fromName, $fromEmail, etc. are set.
Once you get a match, match everything from there to the end of the body with (?s:.*).
Code:
$patterns = array(
"/^_{30,}$(?s:.*)/m",
"/^.*Original Message(?s:.*)/im",
"/^(?:---.*)?On .* wrote:(?s:.*)/im",
"/^On .* $fromName(?s:.*)/im",
"/^On .* $toName(?s:.*)/im",
"/^.*$toEmail(.*)wrote:(?s:.*)/im",
"/^.*$fromEmail.* wrote:(?s:.*)/im",
"/^>.*/ims",
);
$message = preg_replace($patterns, '', $body);
echo "$body<br><br><br>$message";
Run this code here
A word of advice:
Take into account that it will also strip lines like:
only thing I wrote: ...
I have a .txt file where I would like to find an EXACT match of a single email entered in a form.
The present directives (see below) I used, work for a standard form. But when I use it in conjunction with an AJAX call and jQuery, it confirms it exists by just finding the first occurrence.
For example:
If that person enters "bobby#" it says not found, good.
If someone enters their full Email address and it exists in the file, it says "found", very good.
Now, if someone enters just "bobby", it says "found", not good.
I used the following three examples below with the same results.
if ( !preg_match("/\b{$email}\b/i", $emails )) {
echo "Sorry, not found";
}
and...
if ( !preg_match( "/(?:^|\W){$email}(?:\W|$)/", $emails )) {
echo "Sorry, not found";
}
and...
if ( !preg_match('/^'.$email.'$/', $emails )) {
echo "Sorry, not found";
}
my AJAX
$.ajax({
type: "POST",
url: "email_if_exist.php",
data: "email="+ usr,
success: function(msg){
my text file
Bobby Brown bobby#somewhere.com
Guy Slim guy#somewhere.com
Slim Jim slim#somewhere.com
I thought of using a jQuery function to only accept a full email address, but with no success partly because I didn't know where to put it in the script.
I've spent a lot of time in searching for a solution to this and I am now asking for some help.
Cheers.
Because your text file contains "bobby" in it, any regex such as you are suggesting will always find "bobby". I would suggest checking for the presence of the # symbol BEFORE you run the regex, as any valid email will always have # in it. Try something like this:
if (strpos($email,'#')) {
if ( !preg_match("/\b{$email}\b/i", $emails )) {
echo "Sorry, not found";
}
}
EDIT: Looking at this 4 years later... I would make the regex match to the end of the line, using the m modifier to specify multiline so the $ matches newline or EOF. The PHP line would be:
if ( !preg_match("/\b{$email}$/im", $emails )) {
If you're just checking to see if the user exists, this should work:
$users = trim(preg_replace('/\s\s+/', ' ', $users));
$userArray = explode(' ', $users);
$exists = in_array($email, $userArray);
Where $users is referencing to the example file and $email is referencing to the queried e-mail.
This replaces all newlines (and double spaces) with spaces and then splits by spaces into an array, then, if the e-mail exists in the array, the user exists.
Hope I helped!
'/^'.$email.'$/' is quite close. Since you want the check being "true" only if the full email address is on the file you should include in the pattern the "limits" of the email: Whitespace before and end_of_the_line after if:
'/ '.$email.'$/'
(Yes, I've just changed ^ -start of line- for a whitespace)
If your text file filled with lines that every line ending with the email,
so you can regex with testing and match by your "email + end od line"
like that:
if( preg_match("/.+{$email}[\n|\r\n|\r]/", $textFileEmails) )
{
/// code
}
The code would validate first using php core functions whether the email is correct or not and then check for the occurrence.
$email = 'bobby#somewhere.com';
$found = false;
//PHP has a built-in function to validate an email
if(filter_var($email, FILTER_VALIDATE_EMAIL)){
//Grab lines from the file
$lines = file('myfile.txt', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line) {
//Grab words from the line
$words = explode(" ", $line);
//If email found within the words set the flag as true.
if(in_array($email, $words)) {
$found = true;
//If the aim is only to find the email, we can break out here.
break;
}
}
}
if(false === $found) {
echo 'Not found!';
} else {
echo 'Found you!';
}
If you file is formatted as your example first_name, last_name, email#address.tdl
it's really easy to break it up on load to search.
I don't know why you would use preg_match for this bit your if you were advised to use preg use it to verify the email address. You're better off using indexOf method in php (strpos) to search the file but the below method works for your fixed file format.
Object Orientated File Reader and searcher
class Search{
private $users = array();
public function __construct($password_file){
$file = file_get_contents($password_file);
$lines = explode("\n", $file);
$users = array();
foreach($lines as $line){
$users = expode(" ", $line);
}
foreach($users as $user){
$this->users[] = array("first_name" => $user[0], "last_name" => $user[1], "email" => $user[2])
}
}
public function searchByEmail($email){
foreach($this->users as $key => $user){
if($user['email'] == $email){
// return user array
return $user;
// or you could return user id
//return $key;
}
}
return false;
}
}
Then to use
$search = new Search($passwdFile);
$user = $search->searchByEmail($_POST['email']);
echo ($user)? "found":"Sorry, not found";
Using preg_match to validate email then check
If you want to use preg and your own file search system.
function validateEmail($email) {
$v = "/[a-zA-Z0-9_-.+]+#[a-zA-Z0-9-]+.[a-zA-Z]+/";
return (bool)preg_match($v, $email);
}
then use like
if(validateEmail($_POST['email'])){
echo (strpos($_POST['email'], $emails) !== false)? "found":"Sorry, not found";
}
I want to forward my bounced emails to a php script to deal with them. I am using.
#!/usr/bin/php -q
<?php
// read from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd)) {
$email .= fread($fd, 1024);
}
fclose($fd);
// handle email
$lines = explode("\n", $email);
// empty vars
$from = "";
$subject = "";
$headers = "";
$message = "";
$splittingheaders = true;
for ($i=0; $i < count($lines); $i++) {
if ($splittingheaders) {
// this is a header
$headers .= $lines[$i]."\n";
// look out for special headers
if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
$subject = $matches[1];
}
if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
$from = $matches[1];
}
} else {
// not a header, but message
$message .= $lines[$i]."\n";
}
if (trim($lines[$i])=="") {
// empty line, header section has ended
$splittingheaders = false;
}
}
?>
Works perfect! But how do I collect the "To" field in the bounced message? I've tried just adding a $to variable but it doesn't work.
Any help would be great,
thanks,
EDIT: Actually I need to get the "TO" field within the body of the message. - The email that it bounced back from. How do I pull apart the body of the message to take specific info? Should I create a special header with the person's email so that it is easier to get this info?
If you can create a custom header, that would be easiest. Otherwise, you need to match against your entire body for a particular pattern; and if your body text can vary, it could be difficult to make sure you are always matching the correct text.
Custom headers should begin with X-, so maybe do something like:
if (preg_match("/^X-Originally-To: (.*)/", $lines[$i], $matches)) {
$originallyto = $matches[1];
}
But with X- headers, they are non-standard, so it is best to pick a name that is either
Commonly used exclusively for the same purpose, or
Not likely to be used by anyone else at all
One other thing you should be aware of; lines in a message should always end in "\r\n", so you might want to split on both characters (instead of just "\n") to ensure more consistent behaviour.