I have a title="" attribute in an anchor that contains HTML. I'm trying to remove the title attribute entirely but for whatever reason the preg replace I'm using will not work. I've tried:
$output = preg_replace( '/title=\"(.*?)\"/', '', $output );
$output = preg_replace( '/\title="(.*?)"/', '', $output );
$output = preg_replace( '` title="(.+)"`', '', $output );
None of the above works, but I can use something like:
$output = str_replace( 'title', 'class', $output );
Just to prove that I was able to do something ( and I wasn't uploading the wrong file or something ). Output looks like this:
<a href="#" title="<table border=\"0\" width=\"100%\" cellspacing=\"0\" cellpadding=\"0\">
<tbody>
<tr>
<td colspan=\"2\" align=\"center\" valign=\"top\"></td>
</tr>
<tr>
<td valign=\"top\" width=\"50%\">
table content
</td>
<td valign=\"top\" width=\"50%\">
table content
</td>
</tr>
</tbody>
</table>">Link Title</a>
So what I'm trying to do is filter $output and remove the title attribute entirely including everything inside the title attribute. Why will the preg_replace() above not work and what are my options?
I would not use a regex to do operations on [x]html, I'd use a html parser instead.
But if you still want to use a regex then you can use a regex like this:
title="[\s\S]*?"
Working demo
You can have this code:
$re = "/title=\"[\\s\\S]*?\"/";
$str = "Link Title";
$subst = "";
$result = preg_replace($re, $subst, $str);
Update: You can see a clear example about why you shouldn't use regex to parse html in Andrei P. comment
Related
So I wanted to enable UBB Code on my Website using preg_replace
$text = $dbentry['text'];
$bbformat = array(
"/\\r?\\n/i" => "<br>",
"/\[i\](.*?)\[\/i\]/i" => "<i>$1</i>",
"/\[code\](.*?)\[\/code\]/i" => "<div align=\"center\"><center><table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" width=\"600\"><tr bgcolor=\"magenta\"><td>Codeblock:</td></tr><tr><td><form><textarea rows=\"10\" cols=\"82\">$1</textarea></form></td></tr></table></center></div>",
....
);
foreach($bbformat as $match=>$replacement){
$text = preg_replace($match, $replacement, $text);
}
echo $text;
This works, however, it replaces text between two [code][/code] elements too, resulting in <br> or <i></i> codes in the textarea, where they do not belong:
I would like to skip any text that is between [code][/code] UBB elements and output the raw text as it is stored in the DB:
How can this be achieved in PHP?
I actually tried "/\[code\][^>]+\[\/code\](*SKIP)(*FAIL)|\\r?\\n/i" => "<br>", (which I read of here) with the result, that the latter argument "/\[code\](.*?)\[\/code\]/i" to replace [code][/code] itself with <textarea></textarea> is also skipped and the output on the website is [code]bl \n ah[i]test[/i][/code] instead of <textarea>bl \n ah[i]test[/i]</textarea>. I could workaround this by adding these two arguments to the array:
"/\[code\]/i" => "<div align=\"center\"><center><table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" width=\"600\"><tr bgcolor=\"magenta\"><td>Codeblock:</td></tr><tr><td><form><textarea rows=\"10\" cols=\"82\">",
"/\[\/code\]/i" => "</textarea></form></td></tr></table></center></div>",
This replaces [code] and [/code] on each own with the unsexy result that the website looks like garbage if someone forgets to write [/code]. Looks like that the condition "/\[code\](.*?)\[\/code\]/i" is no longer valid within the same array due to (*SKIP)(*FAIL) why replacing each on its own works?
There must be a better solution...what am I missing?
You are dealing with one string each time, preg_replace does not know about the UBB, so it will replace all \n occurrences to <br />, the only way I can see is to use using PHP is something like preg_replace_callback_array.
$bbformat = array(
"/\\r?\\n/i" => function ($m) {
return "<br>";
},
"/\[i\](.*?)\[\/i\]/i" => function ($m) {
return "<i>".$m[1]."</i>";
},
"/\[code\](.*?)\[\/code\]/i" => function ($m) {
return "<div align=\"center\"><center><table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" width=\"600\"><tr bgcolor=\"magenta\"><td>Codeblock:</td></tr><tr><td><form><textarea rows=\"10\" cols=\"82\">".str_replace("<br>", "\n", $m[1])."</textarea></form></td></tr></table></center></div>";
},
);
$text = preg_replace_callback_array($bbformat, $text);
echo $text;
but as you can see, you must re-replace the <br> to \n again which may consume a lot of memory in some cases.
however, you can play around with regex itself to exclude the \n => <br> replacement if the occurrence is not between the [code] block by changing your first pattern from /\\r?\\n/i to: /\\r?\\n+(?![^\[].*\])/i or even /\\r?\\n+(?![^\[code\]].*\[\/code\])/i if you want to keep your [code] block
so your final code may look like this:
$bbformat = array(
"/\\r?\\n+(?![^\[code\]].*\[\/code\])/i" => "<br>",
"/\[i\](.*?)\[\/i\]/i" => "<i>$1</i>",
"/\[code\](.*?)\[\/code\]/i" => "<div align=\"center\"><center><table border=\"1\" cellpadding=\"0\" cellspacing=\"0\" width=\"600\"><tr bgcolor=\"magenta\"><td>Codeblock:</td></tr><tr><td><form><textarea rows=\"10\" cols=\"82\">$1</textarea></form></td></tr></table></center></div>",
);
foreach($bbformat as $match=>$replacement){
$text = preg_replace($match, $replacement, $text);
}
echo $text;
I have the following html, which I have extracted from an email using imap_fetchbody,
<div dir=\"ltr\"><br><div class=\"gmail_quote\"><div dir=\"ltr\"><br><div class=\"gmail_quote\"><div class=\"\">
---------- Forwarded message ----------<br>
<span style=\"font-family:"Helvetica","sans-serif"\"><\/span>
From: <span style=\"font-family:"Helvetica","sans-serif"\">"
<span>xyz<\/span>" <<a href=\"mailto:support#xyz.com\" target=\"_blank\">support#<span>xyz<\/span>.com<\/a>><\/span><br>
\r\n\r\n\r\n\r\nDate: Fri, Apr 18, 2014 at 7:17 PM<br>
Subject: Bla bla xyz<br><\/div><div><div class=\"h5\">To: XYZ <<a href=\"mailto:xyz#gmail.com\" target=\"_blank\">xyz#gmail.com<\/a>><br><br><br>\r\n\r\n<div dir=\"ltr\">\r\n\r\n\r\n\r\n
<div class=\"gmail_quote\"><div><div><div dir=\"ltr\"><div class=\"gmail_quote\"><div dir=\"ltr\"><div><div class=\"gmail_quote\">
<div dir=\"ltr\"><div><div><div class=\"gmail_quote\"><div style=\"word-wrap:break-word\" lang=\"EN-US\">\r\n\r\n\r\n\r\n
<div>
<div>
<div>
<blockquote style=\"margin-top:5pt;margin-bottom:5pt\">
<div><div>
<table style=\"width:100%;background:none repeat scroll 0% 0% rgb(207,207,207)\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\" width=\"100%\">
<tbody>
<tr>\r\n\r\n\r\n\r\n
<td style=\"width:325pt;padding:0in\" width=\"650\">\r\n\r\n<div align=\"center\"><table style=\"width:325pt;background:none repeat scroll 0% 0% rgb(207,207,207)\" cellpadding=\"0\" cellspacing=\"0\" border=\"0\" width=\"650\">\r\n\r\n\r\n\r\n
<tbody><tr>
<td style=\"padding:0in 0in 5.25pt\"><p style=\"text-align:center\" align=\"center\">
<span style=\"font-size:7.5pt;font-family:"Arial","sans-serif";color:rgb(64,64,64)\">If you are unable to see this message,
<a href=\"http:\/\/click.e.xyz.com\/?qs=3771d7c90c958f02a4b2e78494f12a3116ddb15df79b8d04cdf5aeba42012b118\" target=\"_blank\">
<span style=\"color:rgb(64,64,64)\">click here<\/span><\/a> to view.<br>
\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nTo ensure delivery to your inbox, please add <a href=\"mailto:support#xyz.com\" target=\"_blank\">support#xyz.com<\/a> to your address book. <\/span><\/p>
<\/td>
<\/tr>
<\/tbody>
<\/table>
<\/div><\/div><\/div><\/div>
I want to get rid of all the \,\r, \n and still keep < and > of the html as is.
I have tried stripslashes, stripcslashes, nl2br, htmlspecialchars_decode. But I am not able to achieve what I want.
Here is what I have tried along with imap_qprint function,
$text = stripslashes(imap_qprint($text));
$body = preg_replace('/(\v|\s)+/', ' ', $text );
Res: It doesn't remove all the white space characters.
Match the following regex:
(\\r|\\n|\\) with the g modifier
and replace with
'' (empty string)
Demo: http://regex101.com/r/mS3wM2
$html = preg_replace('/[\\\\\r\n]/', '', $html);
Match a single character present in the list below «[\\\r\n]»
A \ character «\\»
A carriage return character «\r»
A line feed character «\n»
UPDATE:
Based on your comment I've updated my answer:
$html = preg_replace('%\\\\/%sm', '/', $html);
$html = preg_replace('/\\\\"/sm', '"', $html);
$html = preg_replace('/[\r\n]/sm', '', $html);
You could use something like this to interpret the escape sequences:
function interpret_escapes($str) {
return preg_replace_callback('/\\\\(.)/u', function($matches) {
$map = ['n' => "\n", 'r' => "\r", 't' => "\t", 'v' => "\v", 'e' => "\e", 'f' => "\f"];
return isset($map[$matches[1]]) ? $map[$matches[1]] : $matches[1];
}, $str);
}
If string functions can do the trick, always favor stringfunctions above regex´s. Performace/speed will be better compared to regex's, and they's easier to read in the code:
$message = str_replace("\r\n", '', $message ); // replace all newlines, use double quotes!
$message = stripslashes( $message );
First you have to remove the newlines. As far as I can tell, the \r and \n always come together, so I replace them in 1 go. After that, the stripslashes will remove all escaping slashes.
You have to the the stripslashes after the newlines, else \r\n would result in rn, making them harder to find
This works perfect in my tests:
echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';
echo '<hr />';
$message = str_replace("\r\n", '', $message); // use double quotes!
echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';
echo '<hr />';
$message = stripslashes($message);
echo '<textarea style="width:100%; height: 33%;">'.$message.'</textarea>';
If you can open the file in vi, it would be as easy as:
%s/\\r\|\\n//g
on vi cmd mode
I'm using but getting � on my email. How can I fix this?
Some part of the code:
$review = "
<table width='100%' border='0'>
<tr>
<td class='text_font' >Customer Name: $account_nam</td>
</tr>
</table>";
It displays as
Customer Name:�� Sample Account Name
Also this one:
<td width='20%' style='border-bottom: 1px solid #CECECE;'></td>
<td> has no value but it display the funny symbol �
I guess that your encoding is UTF-8 and in the Mail client its ISO or vice versa.
Be sure to send the correct encoding in the mail header.
Producing a margin with blankspaces is anyways not a good idea, why not seperate table cells for field and value:
$review = "
<table width='100%' border='0'>
<tr>
<td class='text_font' >Customer Name:</td>
<td>" . $account_name . "</td>
</tr>
</table>";
It is most likely an encoding issue. Possibly the setting on the editor you are using is saving the text in some encoding that is not recognized. I have had similar issues using notepad++ and not thinking about encoding.
Chances are you have imported your text from MS word. Word has a few funky characters such as angled quotation marks etc.
You can use htmlentities as Daryl suggests in the comments, but I've found in the past that you have to str_replace those characters out like so:
$text = str_replace("“", '"', $text);
$text = str_replace("”", '"', $text);
$text = str_replace("‘", "'", $text);
$text = str_replace("’", "'", $text);
$text = str_replace("…", '...', $text);
$text = str_replace("–", '-', $text);
I'm trying to match this (the name in particular):
<tr>
<th class="name">Name:</th>
<td>John Smith</td>
</tr>
Like this:
preg_match('/<th class="name">Name:<\/th>.+?<td>(.+)<\/td>/s', $a, $b);
However, while it matches the name, it doesn't stop at the end of the name. It keeps going for another 150 or so characters. Why is this? I only want to match the name.
Make last quantifier non-greedy: preg_match('/<th class="name">Name:<\/th>.+?<td>(.+?)<\/td>/s', $a, $b);
Dont use regex to parse HTML, Its very easy with DOMDocument:
<?php
$html = <<<HTML
<tr>
<th class="name">Name:</th>
<td>John Smith</td>
</tr>
<tr>
<th class="name">Somthing:</th>
<td>Foobar</td>
</tr>
HTML;
$dom = new DOMDocument();
#$dom->loadHTML($html);
$ret = array();
foreach($dom->getElementsByTagName('tr') as $tr) {
$ret[trim($tr->getElementsByTagName('th')->item(0)->nodeValue,':')] = $tr->getElementsByTagName('td')->item(0)->nodeValue;
}
print_r($ret);
/*
Array
(
[Name] => John Smith
[Somthing] => Foobar
)
*/
?>
preg_match('/<th class="name">Name:<\/th>\s*<td>(.+?)<\/td>/s', $line, $matches);
Match only whitespace between the </th> and <td>, and non-greedy match for the name.
preg_match('/<th class="name">Name:<\/th>.+?<td>(?P<name>.*)<\/td>/s', $str, $match);
echo $match['name'];
Here is your match
preg_match(!<tr>\s*<th[^>]*>Name:</th>\s*<td>([^<]*)</td>\s*</tr>!s)
it will work perfectly.
I'm having some trouble with delimiter for explode. I have a rather chunky string as a delimiter, and it seems it breaks down when I add another letter (start of a word), but it doesn't get fixed when I remove first letter, which would indicate it isn't about lenght.
To wit, the (working) code is:
$boom = htmlspecialchars("<td width=25 align=\"center\" ");
$arr[1] = explode($boom, $arr[1]);
The full string I'd like to use is <td width=25 align=\"center\" class=\", and when I start adding in class, explode breaks down, and nothing gets done. That happens as soon as I add c, and it doesn't go away if I remove <, which it would if it's just a matter of string lenght.
Basically, the problem isn't dire, since I can just replace class=" with "" after the explode, and get the same result, but this has given me headaches to diagnose, and it seems like a really wierd problem. For what it's worth, I'm using PHP 5.3.0 in XAMPP 1.7.2.
Thanks in advance!
You could try converting every occurrence of the delimiter in the original string
"<td width=25 align=\"center\" "
in something more manageable like:
"banana"
and then explode on that word
Have you tried adding htmlspecialchars to the explode.
$arr[1] = explode($boom, htmlspecialchars($arr[1]));
I get unexpected results without it, but with it it works perfectly.
$s = '<td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>';
$boom = htmlspecialchars("<td width=25 align=\"center\" class=");
$sex = explode($boom, $s);
print_r($sex);
Outputs:
Array
(
[0] => <td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>
)
Whereas
$s = '<td width=25 align="center" class="asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>';
$boom = htmlspecialchars("<td width=25 align=\"center\" class=");
$sex = explode($boom, htmlspecialchars($s));
print_r($sex);
Outputs
Array
(
[0] =>
[1] => "asdjasd">sdadasd</td><td width=25 align="center" >asdasD</td>
)
This is because $boom is htmlspecialchar encoded, < and > get transformed into < and >, which it cannot find the in the string, so it just returns the whole string.