I need replace spaces with inside HTML elements.
Example:
<table atrr="zxzx"><tr>
<td>adfa a adfadfaf></td><td><br /> dfa dfa</td>
</tr></table>
should become
<table atrr="zxzx"><tr>
<td>adfa a adfadfaf></td><td><br /> dfa dfa</td>
</tr></table>
If you're working with php, you can do
$content = str_replace(' ', ' ', $content);
use regex to catch data between tags
(?:<\/?\w+)(?:\s+\w+(?:\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+)?)+\s*|\s*)\/?>([^<]*)?
then replace ' ' with ' '
also to catch before and after html :
^([^<>]*)<?
>([^<>]*)$
Edit:
here you go....
<?php
$data="dasdad asd a <table atrr=\"zxzx\"><tr><td>adfa a adfadfaf></td><td><br /> dfa dfa</td></tr></table> asdasd s ";
$exp="/((?:<\\/?\\w+)(?:\\s+\\w+(?:\\s*=\\s*(?:\\\".*?\\\"|'.*?'|[^'\\\">\\s]+)?)+\\s*|\\s*)\\/?>)([^<]*)?/";
$ex1="/^([^<>]*)(<?)/i";
$ex2="/(>)([^<>]*)$/i";
$data = preg_replace_callback($exp, function ($matches) {
return $matches[1] . str_replace(" ", " ", $matches[2]);
}, $data);
$data = preg_replace_callback($ex1, function ($matches) {
return str_replace(" ", " ", $matches[1]) . $matches[2];
}, $data);
$data = preg_replace_callback($ex2, function ($matches) {
return $matches[1] . str_replace(" ", " ", $matches[2]);
}, $data);
echo $data;
?>
it works... slightly modified but it would work without modifications (but i dont think youd understand the code ;) )
Since tokenizing HTML with regular expressions can be quite complicated (especially when allowing SGML quirks), you should use an HTML DOM parser like the one of PHP’s DOM library. Then you can query the DOM, get all text nodes and apply your replacement function on it:
$doc = new DOMDocument();
$doc->loadHTML($str);
$body = $doc->getElementsByTagName('body')->item(0);
mapOntoTextNodes($body, function(DOMText $node) { $node->nodeValue = str_replace(' ', ' ', $node->nodeValue); });
The mapOntoTextNodes function is a custom function I had defined in How to replace text URLs and exclude URLs in HTML tags?
Related
Using the following code:
$text = "أطلقت غوغل النسخة المخصصة للأجهزة الذكية العاملة بنظام أندرويد من الإصدار “25″ لمتصفحها الشهير كروم.ولم تحدث غوغل تطبيق كروم للأجهزة العاملة بأندرويد منذ شهر تشرين الثاني العام الماضي، وهو المتصفح الذي يستخدمه نسبة 2.02% من أصحاب الأجهزة الذكية حسب دراسة سابقة. ";
$tags = "غوغل, غوغل النسخة, كروم";
$tags = explode(",", $tags);
foreach($tags as $k=>$v) {
$text = preg_replace("/\b{$v}\b/u","$0",$text, 1);
}
echo $text;
Will give the following result:
I love PHP">love PHP</a>, but I am facing a problem
Note that my text is in Arabic.
The way is to do all in one pass. The idea is to build a pattern with an alternation of tags. To make this way work, you must before sort the tags because the regex engine will stop at the first alternative that succeeds (otherwise 'love' will always match even if it is followed by 'php' and 'love php' will never be matched).
To limit the replacement to the first occurence of each word you can remove tag from the array once it has been found and you test if it is always present in the array inside the replacement callback function:
$text = 'I love PHP, I love love but I am facing a problem';
$tagsCSV = 'love, love php, facing';
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(?:' . implode('|', $tags) . ')\b/iu';
$text = preg_replace_callback($pattern, function ($m) use (&$tags) {
$mLC = mb_strtolower($m[0], 'UTF-8');
if (false === $key = array_search($mLC, $tags))
return $m[0];
unset($tags[$key]);
return '<a href="index.php?s=news&tag=' . rawurlencode($mLC)
. '">' . $m[0] . '</a>';
}, $text);
Note: when you build an url you must encode special characters, this is the reason why I use preg_replace_callback instead of preg_replace to be able to use rawurlencode.
If you have to deal with an utf8 encoded string, you need to add the u modifier to the pattern and you need to replace strtolower with mb_strtolower)
the preg_split way
$tags = explode(', ', $tagsCSV);
rsort($tags);
$tags = array_map('preg_quote', $tags);
$pattern = '/\b(' . implode('|', $tags) . ')\b/iu';
$items = preg_split($pattern, $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$itemsLength = count($items);
$i = 1;
while ($i<$itemsLength && count($tags)) {
if (false !== $key = array_search(mb_strtolower($items[$i], 'UTF-8'), $tags)) {
$items[$i] = '<a href="index.php?s=news&tag=' . rawurlencode($tags[$key])
. '">' . $items[$i] . '</a>';
unset($tags[$key]);
}
$i+=2;
}
$result = implode('', $items);
Instead of calling preg_replace multiple times, call it a single time with a regexp that matches any of the tags:
$tags = explode(",", tags);
$tags_re = '/\b(' . implode('|', $tags) . ')\b/u';
$text = preg_replace($tags_re, '$0', $text, 1);
This turns the list of tags into the regexp /\b(love|love php|facing)\b/u. x|y in a regexp means to match either x or y.
I have this code:
preg_match_all('#href="/mp3/(.*?).html#', $content, $salida);
and I need to replace "_" to " " (space) in output (array), something like this
$salida = str_replace('_', ' ', $salida);
obviously that code does not work
I think what you're looking for is preg_replace_callback
$salida = preg_replace_callback(
'(href="/mp3/.*?\.html)',
function($m) {return str_replace("_","",$m[0]);},
$content);
I want to replace chr(10) with
with PHP within
<!CDATA[[Text
test
test]]>
But I'm very poor in REGEX.
$xml = "cc\n<!CDATA[[Text\ntest\ntest]]>\naa\nbb\n";
$callback = function($m) {
return '<!CDATA[[' . preg_replace("~" . chr(10) . "~s", '
', $m[1]) . ']]>';
};
echo preg_replace_callback('~<!CDATA\[\[(.+?)\]\]>~s', $callback, $xml);
p.s. you can probably do it without preg_replace_callback, but it looks nicer than to put all logic into preg_replace...
Why use RegEx?
$final = str_replace( chr(10), '
', $cdata );
Using htmlentities() is there a way I can set to allow only <b> and <i> to convert into bold and italic text? I know there was one way of doing this, but i have forgotten.
It's pretty easy
<?php
$string = htmlentities($text);
$string = str_replace(array("<i>", "<b>", "</i>", "</b>"), array("<i>", "<b>", "</i>", "</b>"), $string);
I use a helper function:
# Sanitizer function - removes forbidden tags, including script tags
function strip_tags_attributes( $str,
$allowedTags = array('<a>','<b>','<blockquote>','<br>','<cite>','<code>','<del>','<div>','<em>','<ul>','<ol>','<li>','<dl>','<dt>','<dd>','<img>','<ins>','<u>','<q>','<h3>','<h4>','<h5>','<h6>','<samp>','<strong>','<sub>','<sup>','<p>','<table>','<tr>','<td>','<th>','<pre>','<span>'),
$disabledEvents = array('onclick','ondblclick','onkeydown','onkeypress','onkeyup','onload','onmousedown','onmousemove','onmouseout','onmouseover','onmouseup','onunload') )
{
if( empty($disabledEvents) ) {
return strip_tags($str, implode('', $allowedTags));
}
return preg_replace('/<(.*?)>/ies', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $disabledEvents) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($str, implode('', $allowedTags)));
}
For your example, remove everything except <b> and <i> from the $allowedTags array.
I'm using this function from here, which is:
// highlight search keywords
function highlight($title, $search) {
preg_match_all('~\w+~', $search, $m);
if(!$m)
return $title;
$re = '~\\b(' . implode('|', $m[0]) . ')\\b~i';
return preg_replace($re, '<span style="background-color: #ffffcc;">$0</span>', $title);
}
Which works great, but only for titles. I want to be able to pass an array that contains $title and $description.
I was trying something like this:
$replacements = array($title, $description);
// highlight search keywords
function highlight($replacements, $search) {
preg_match_all('~\w+~', $search, $m);
if(!$m)
return $replacements;
$re = '~\\b(' . implode('|', $m[0]) . ')\\b~i';
return preg_replace($re, '<span style="background-color: #ffffcc;">$0</span>', $replacements);
}
It isn't working. It's passing an array as the title, and not highlighting the description (although it is actually returning a description). Any idea how to get this working?
I would personally leave the original function as only operating on one parameter rather than an array. It would make your calling code nice and clear;
$titleHighlighted = highlight($title, $searchKeywords);
$descriptionHighlighted = highlight($title, $searchKeywords);
However, I would rewrite your function to use str_ireplace rather than preg_replace;
function highlight($contentBlock, array $keywords) {
$highlightedContentBlock = $contentBlock;
foreach ($keywords as $singleKeyword) {
$highlightedKeyword = '<span class = "keyword">' . $singleKeyword . '</span>';
$highlightedContentBlock = str_ireplace($singleKeyword, $highlightedKeyword, $highlightedContentBlock);
}
return $highlightedContentBlock;
}
This rewritten function should be more simple to read and does not have the overhead of compiling the regular expressions. You can call it as many times as you like for any content block (title, description, etc);
$title = "The quick brown fox jumper over ... ";
$searchKeywords = array("quick", "fox");
$titleHighlighted = highlight($title, $searchKeywords);
echo $titleHighlighted; // The <span class = "keyword">quick</span> brown ...
have you try to change ?
$m[0]
with
$m[0][0]