Php search replacer - php

I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
Example:
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
}
}
If you have a tipp or a solution dont hesitate to let me know.

I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
}
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
}
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
}
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
}
}
return $response;
}
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.

If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));

Use patern below to get your word and 1-2 words before and after
/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i
demo
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
}

Related

PHP Preg Replace. Remove strings inside {~ string ~} pattern, but skip <pre>{~ string ~}</pre> [duplicate]

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

Preg Replace in PHP for Heading Tags

I have a markdown text content which I have to replace without using library functions.So I used preg replace for this.It works fine for some cases.For cases like heading
for eg Heading
=======
should be converted to <h1>Heading</h1> and also
##Sub heading should be converted to <h2>Sub heading</h2>
###Sub heading should be converted to <h3>Sub heading</h3>
I have tried
$text = preg_replace('/##(.+?)\n/s', '<h2>$1</h2>', $text);
The above code works but I need to have count of hash symbol and based on that I have to assign heading tags.
Anyone help me please....
Try using preg_replace_callback.
Something like this -
$regex = '/(#+)(.+?)\n/s';
$line = "##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
},
$line
);
echo $line;
The output would be something like this -
<h2>Sub heading</h2> <h3>sub-sub heading</h3>
EDIT
For the combined problem of using = for headings and # for sub-headings, the regex gets a bit more complicated, but the principle remains the same using preg_replace_callback.
Try this -
$regex = '/(?:(#+)(.+?)\n)|(?:(.+?)\n\s*=+\s*\n)/';
$line = "Heading\n=======\n##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
//var_dump($matches);
if($matches[1] == ""){
return "<h1>".$matches[3]."</h1>";
}else{
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
}
},
$line
);
echo $line;
Whose Output is -
<h1>Heading</h1><h2>Sub heading</h2> <h3>sub-sub heading</h3>
Do a preg_match_all like this:
$string = "#####asdsadsad";
preg_match_all("/^#/", $string, $matches);
var_dump ($matches);
And based on count of matches you can do whatever you want.
Or, use the preg_replace_callback function.
$input = "#This is my text";
$pattern = '/^(#+)(.+)/';
$mytext = preg_replace_callback($pattern, 'parseHashes', $input);
var_dump($mytext);
function parseHashes($input) {
var_dump($input);
$matches = array();
preg_match_all('/(#)/', $input[1], $matches);
var_dump($matches[0]);
var_dump(count($matches[0]));
$cnt = count($matches[0]);
if ($cnt <= 6 && $cnt > 0) {
return '<h' . $cnt . ' class="if you want class here">' . $input[2] . '</h' . $cnt . '>';
} else {
//This is not a valid h tag. Do whatever you want.
return false;
}
}

filtering bad words from text

This function filer the email from text and return matched pattern
function parse($text, $words)
{
$resultSet = array();
foreach ($words as $word){
$pattern = 'regex to match emails';
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );
$this->pushToResultSet($matches);
}
return $resultSet;
}
Similar way I want to match bad words from text and return them as $resultSet.
Here is code to filter badwords
TEST HERE
$badwords = array('shit', 'fuck'); // Here we can use all bad words from database
$text = 'Man, I shot this f*ck, sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
echo "filtered words <br>";
echo $text."<br/>";
$words = explode(' ', $text);
foreach ($words as $word)
{
$bad= false;
foreach ($badwords as $badword)
{
if (strlen($word) >= strlen($badword))
{
$wordOk = false;
for ($i = 0; $i < strlen($badword); $i++)
{
if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
{
$wordOk = true;
break;
}
}
if (!$wordOk)
{
$bad= true;
break;
}
}
}
echo $bad ? 'beep ' : ($word . ' '); // Here $bad words can be returned and replace with *.
}
Which replaces badwords with beep
But I want to push matched bad words to $this->pushToResultSet() and returning as in first code of email filtering.
can I do this with my bad filtering code?
Roughly converting David Atchley's answer to PHP, does this work as you want it to?
$blocked = array('fuck','shit','damn','hell','ass');
$text = 'Man, I shot this f*ck, damn sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $text, $matches);
$filter = preg_replace("/(".implode('|', $blocked).")/i", 'beep', $text);
var_dump($filter);
var_dump($matches);
JSFiddle for working example.
Yes, you can match bad words (saving for later), replace them in the text and build the regex dynamically based on an array of bad words you're trying to filter (you might store it in DB, load from JSON, etc.). Here's the main portion of the working example:
var blocked = ['fuck','shit','damn','hell','ass'],
matchBlocked = new RegExp("("+blocked.join('|')+")", 'gi'),
text = $('.unfiltered').text(),
matched = text.match(matchBlocked),
filtered = text.replace(matchBlocked, 'beep');
Please see the JSFiddle link above for the full working example.

Inverse htmlentities / html_entity_decode

Basically I want to turn a string like this:
<code> <div> blabla </div> </code>
into this:
<code> <div> blabla </div> </code>
How can I do it?
The use case (bc some people were curious):
A page like this with a list of allowed HTML tags and examples. For example, <code> is a allowed tag, and this would be the sample:
<code><?php echo "Hello World!"; ?></code>
I wanted a reverse function because there are many such tags with samples that I store them all into a array which I iterate in one loop, instead of handling each one individually...
My version using regular expressions:
$string = '<code> <div> blabla </div> </code>';
$new_string = preg_replace(
'/(.*?)(<.*?>|$)/se',
'html_entity_decode("$1").htmlentities("$2")',
$string
);
It tries to match every tag and textnode and then apply htmlentities and html_entity_decode respectively.
There isn't an existing function, but have a look at this.
So far I've only tested it on your example, but this function should work on all htmlentities
function html_entity_invert($string) {
$matches = $store = array();
preg_match_all('/(&(#?\w){2,6};)/', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $i => $match) {
$key = '__STORED_ENTITY_' . $i . '__';
$store[$key] = html_entity_decode($match[0]);
$string = str_replace($match[0], $key, $string);
}
return str_replace(array_keys($store), $store, htmlentities($string));
}
Update:
Thanks to #Mike for taking the time to test my function with other strings. I've updated my regex from /(\&(.+)\;)/ to /(\&([^\&\;]+)\;)/ which should take care of the issue he raised.
I've also added {2,6} to limit the length of each match to reduce the possibility of false positives.
Changed regex from /(\&([^\&\;]+){2,6}\;)/ to /(&([^&;]+){2,6};)/ to remove unnecessary excaping.
Whooa, brainwave! Changed the regex from /(&([^&;]+){2,6};)/ to /(&(#?\w){2,6};)/ to reduce probability of false positives even further!
Replacing alone will not be good enough for you. Whether it be regular expressions or simple string replacing, because if you replace the &lt &gt signs then the < and > signs or vice versa you will end up with one encoding/decoding (all &lt and &gt or all < and > signs).
So if you want to do this, you will have to parse out one set (I chose to replace with a place holder) do a replace then put them back in and do another replace.
$str = "<code> <div> blabla </div> </code>";
$search = array("<",">",);
//place holder for < and >
$replace = array("[","]");
//first replace to sub out < and > for [ and ] respectively
$str = str_replace($search, $replace, $str);
//second replace to get rid of original < and >
$search = array("<",">");
$replace = array("<",">",);
$str = str_replace($search, $replace, $str);
//third replace to turn [ and ] into < and >
$search = array("[","]");
$replace = array("<",">");
$str = str_replace($search, $replace, $str);
echo $str;
I think i have a small sollution, why not break html tags into an array and then compare and change if needed?
function invertHTML($str) {
$res = array();
for ($i=0, $j=0; $i < strlen($str); $i++) {
if ($str{$i} == "<") {
if (isset($res[$j]) && strlen($res[$j]) > 0){
$j++;
$res[$j] = '';
} else {
$res[$j] = '';
}
$pos = strpos($str, ">", $i);
$res[$j] .= substr($str, $i, $pos - $i+1);
$i += ($pos - $i);
$j++;
$res[$j] = '';
continue;
}
$res[$j] .= $str{$i};
}
$newString = '';
foreach($res as $html){
$change = html_entity_decode($html);
if($change != $html){
$newString .= $change;
} else {
$newString .= htmlentities($html);
}
}
return $newString;
}
Modified .... with no errors.
So, although other people on here have recommended regular expressions, which may be the absolute right way to go ... I wanted to post this, as it is sufficient for the question you asked.
Assuming that you are always using html'esque code:
$str = '<code> <div> blabla </div> </code>';
xml_parse_into_struct(xml_parser_create(), $str, $nodes);
$xmlArr = array();
foreach($nodes as $node) {
echo htmlentities('<' . $node['tag'] . '>') . html_entity_decode($node['value']) . htmlentities('</' . $node['tag'] . '>');
}
Gives me the following output:
<CODE> <div> blabla </div> </CODE>
Fairly certain that this wouldn't support going backwards again .. as other solutions posted, would, in the sense of:
$orig = '<code> <div> blabla </div> </code>';
$modified = '<CODE> <div> blabla </div> </CODE>';
$modifiedAgain = '<code> <div> blabla </div> </code>';
I'd recommend using a regular expression, e.g. preg_replace():
http://www.php.net/manual/en/function.preg-replace.php
http://www.webcheatsheet.com/php/regular_expressions.php
http://davebrooks.wordpress.com/2009/04/22/php-preg_replace-some-useful-regular-expressions/
Edit: It appears that I haven't fully answered your question. There is no built-in PHP function to do what you want, but you can do find and replace with regular expressions or even simple expressions: str_replace, preg_replace

highlight the word in the string, if it contains the keyword

how write the script, which menchion the whole word, if it contain the keyword? example: keyword "fun", string - the bird is funny, result - the bird is * funny*. i do the following
$str = "my bird is funny";
$keyword = "fun";
$str = preg_replace("/($keyword)/i","<b>$1</b>",$str);
but it menshions only keyword. my bird is funny
Try this:
preg_replace("/\w*?$keyword\w*/i", "<b>$0</b>", $str)
\w*? matches any word characters before the keyword (as least as possible) and \w* any word characters after the keyword.
And I recommend you to use preg_quote to escape the keyword:
preg_replace("/\w*?".preg_quote($keyword)."\w*/i", "<b>$0</b>", $str)
For Unicode support, use the u flag and \p{L} instead of \w:
preg_replace("/\p{L}*?".preg_quote($keyword)."\p{L}*/ui", "<b>$0</b>", $str)
You can do the following:
$str = preg_replace("/\b([a-z]*${keyword}[a-z]*)\b/i","<b>$1</b>",$str);
Example:
$str = "Its fun to be funny and unfunny";
$keyword = 'fun';
$str = preg_replace("/\b([a-z]*${keyword}[a-z]*)\b/i","<b>$1</b>",$str);
echo "$str"; // prints 'Its <b>fun</b> to be <b>funny</b> and <b>unfunny</b>'
<?php
$str = "my bird is funny";
$keyword = "fun";
$look = explode(' ',$str);
foreach($look as $find){
if(strpos($find, $keyword) !== false) {
if(!isset($highlight)){
$highlight[] = $find;
} else {
if(!in_array($find,$highlight)){
$highlight[] = $find;
}
}
}
}
if(isset($highlight)){
foreach($highlight as $replace){
$str = str_replace($replace,'<b>'.$replace.'</b>',$str);
}
}
echo $str;
?>
Here by am added multi search in a string for your reference
$keyword = ".in#.com#dot.com#1#2#3#4#5#6#7#8#9#one#two#three#four#five#Six#seven#eight#nine#ten#dot.in#dot in#";
$keyword = implode('|',explode('#',preg_quote($keyword)));
$str = "PHP is dot .com the amazon.in 123455454546 dot in scripting language of choice.";
$str = preg_replace("/($keyword)/i","<b>$0</b>",$str);
echo $str;
Basically, since this is HTML, what you have to do is iterate over text nodes and split those containing the search string into up to three nodes (before match, after match and the highlighted match). If "after match" node exist, it must be processed too. Here is a PHP7 example using PHP DOM extension. The following function accepts preg_quoted UTF-8 search string (or regex-conpatible expression like apple|orange). It will enclose every match in a given tag with a given class.
function highlightTextInHTML($regex_compatible_text, $html, $replacement_tag = 'span', $replacement_class = 'highlight') {
$d = new DOMDocument('1.0','utf-8');
$d->loadHTML('<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/></head>' . $html);
$xpath = new DOMXPath($d);
$process_node = function(&$node) use($regex_compatible_text, $replacement_tag, $replacement_class, &$d, &$process_node) {
$i = preg_match("~(?<before>.*?)(?<search>($regex_compatible_text)+)(?<after>.*)~ui", $node->textContent, $m);
if($i) {
$x = $d->createElement($replacement_tag);
$x->setAttribute('class', $replacement_class);
$x->textContent = $m['search'];
$parent_node = $node->parentNode;
$before = null;
$after = null;
if(!empty($m['after'])) {
$after = $d->createTextNode($m['after']);
$parent_node->replaceChild($after, $node);
$parent_node->insertBefore($x, $after);
} else {
$parent_node->replaceChild($x, $node);
}
if(!empty($m['before'])) {
$before = $d->createTextNode($m['before']);
$parent_node->insertBefore($before, $x);
}
if($after) {
$process_node($after);
}
}
};
$node_list = $xpath->query('//text()');
foreach ($node_list as $node) {
$process_node($node);
}
return preg_replace('~(^.*<body>)|(</body>.*$)~mis', '', $d->saveHTML());
}
Search and highlight the word in your string, text, body and paragraph:
<?php $body_text='This is simple code for highligh the word in a given body or text'; //this is the body of your page
$searh_letter = 'this'; //this is the string you want to search for
$result_body = do_Highlight($body_text,$searh_letter); // this is the result with highlight of your search word
echo $result_body; //for displaying the result
function do_Highlight($body_text,$searh_letter){ //function for highlight the word in body of your page or paragraph or string
$length= strlen($body_text); //this is length of your body
$pos = strpos($body_text, $searh_letter); // this will find the first occurance of your search text and give the position so that you can split text and highlight it
$lword = strlen($searh_letter); // this is the length of your search string so that you can add it to $pos and start with rest of your string
$split_search = $pos+$lword;
$string0 = substr($body_text, 0, $pos);
$string1 = substr($body_text,$pos,$lword);
$string2 = substr($body_text,$split_search,$length);
$body = $string0."<font style='color:#FF0000; background-color:white;'>".$string1." </font> ".$string2;
return $body;
} ?>

Categories