find and replace country from string - php

I have an array of countries the key is the country code, the value is the country name, now i have a string, which is posted by users, i want to find if the string has country in it n replace it with
<span class="country">$1</span>
to make it even clearer : let's say i have this text :
Canada is a cold place
i want it to be :
<span class="country">canada</span> is a cold place
where i use my countries array to find and repalce.
the reason behind this is i want to use the microformats, so i need to extract specific text from a string.
i had similar preg_replaces code
$style = array(
'/\[b\](.*)?\[\/b\]/isU' => '<b>$1</b>',
'/\[i\](.*)?\[\/i\]/isU' => '<i>$1</i>',
'/\[u\](.*)?\[\/u\]/isU' => '<u>$1</u>',
'/\[em\](.*)?\[\/em\]/isU' => '<em>$1</em>',
'/\[li\](.*)?\[\/li\]/isU' => '<li>$1</li>',
'/\[code\](.*)?\[\/code\]/isU' => '<div class="tx_code">$1</div>',
'/\[q\](.*)?\[\/q\]/isU' => '<q>$1</q>',
'/[\r\n]{3}+/' => "\n"
);
$text = preg_replace(array_keys($style),array_values($style),$text);
which works, i need something like that.
Keep in mind, that it should not be case sensitive, some users may post canada or Canada
thanks

try this
function findword($text,array $List){
foreach($List as $Val)
$pattern['%([^\da-zA-Z]+)'.$Val.'([^\da-zA-Z]+)%si'] = '<span class="country">'.$Val.'</span>';
$text = preg_replace(array_keys($pattern), array_values($pattern), ' '.$text.' ');
return $text;
}
echo findword('Canada is a cold place',array('Canada'));
output:
<span class="country">Canada</span>is a cold place
Edit: if you want replace all match word in text you can use this
function findword($text,array $List){
foreach($List as $Val)
$pattern['~'.$Val.'~si'] = '<span class="country">'.$Val.'</span>';
$text = preg_replace(array_keys($pattern), array_values($pattern), ' '.$text.' ');
return $text;
}
echo findword('Canadaisacold place',array('Canada'));
output:
<span class="country">Canada</span>isacold place
Edit2: i wrote it by DOMDocument That Work Good in Html
class XmlRead{
static function Clean($html){
$html=preg_replace_callback("~<script(.*?)>(.*?)</script>~si",function($m){
//print_r($m);
// $m[2]=preg_replace("/\/\*(.*?)\*\/|[\t\r\n]/s"," ", " ".$m[2]." ");
$m[2]=preg_replace("~//(.*?)\n~si"," ", " ".$m[2]." ");
//echo $m[2];
return "<script ".$m[1].">".$m[2]."</script>";
}, $html);
$search = array(
"/\/\*(.*?)\*\/|[\t\r\n]/s" => "",
"/ +\{ +|\{ +| +\{/" => "{",
"/ +\} +|\} +| +\}/" => "}",
"/ +: +|: +| +:/" => ":",
"/ +; +|; +| +;/" => ";",
"/ +, +|, +| +,/" => ","
);
$html = preg_replace(array_keys($search), array_values($search), $html);
preg_match_all('!(<(?:code|pre|script).*>[^<]+</(?:code|pre|script)>)!',$html,$pre);
$html = preg_replace('!<(?:code|pre).*>[^<]+</(?:code|pre)>!', '#pre#', $html);
$html = preg_replace('#<!–[^\[].+–>#', '', $html);
$html = preg_replace('/[\r\n\t]+/', ' ', $html);
$html = preg_replace('/>[\s]+</', '><', $html);
$html = preg_replace('/\s+/', ' ', $html);
if (!empty($pre[0])) {
foreach ($pre[0] as $tag) {
$html = preg_replace('!#pre#!', $tag, $html,1);
}
}
return($html);
}
function loadNprepare($content,$encod='') {
$content=self::Clean($content);
//$content=html_entity_decode(html_entity_decode($content));
// $content=htmlspecialchars_decode($content,ENT_HTML5);
$DataPage='';
if(preg_match('~<body(.*?)>(.*?)</body>~si',$content,$M)){
$DataPage=$M[2];
}else{
$DataPage =$content;
}
$HTML=$DataPage;
$HTML="<!doctype html><html><head><meta charset=\"utf-8\"><title>Untitled Document</title></head><body>".$HTML."</body></html>";
$dom= new DOMDocument;
$HTML = str_replace("&", "&", $HTML); // disguise &s going IN to loadXML()
// $dom->substituteEntities = true; // collapse &s going OUT to transformToXML()
$dom->recover = TRUE;
#$dom->loadHTML('<?xml encoding="UTF-8">' .$HTML);
// dirty fix
foreach ($dom->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$dom->removeChild($item); // remove hack
$dom->encoding = 'UTF-8'; // insert proper
return $dom;
}
function GetBYClass($Doc,$ClassName){
$finder = new DomXPath($Doc);
return($finder->query("//*[contains(#class, '$ClassName')]"));
}
function findword($text,array $List){
foreach($List as $Val)
$pattern['%(\#)?([^\da-zA-Z]+)'.$Val.'([^\da-zA-Z]+)%si'] = '<span class="country">'.$Val.'</span>';
$text = preg_replace(array_keys($pattern), array_values($pattern), ' '.$text.' ');
return $text;
}
function FindAndReplace($node,array $List) {
if($node==NULL)return false;
if (XML_TEXT_NODE === $node->nodeType || XML_CDATA_SECTION_NODE === $node->nodeType) {
$node->nodeValue=$this->findword($node->nodeValue,$List);
return;
}else{
if(is_object($node->childNodes) or is_array($node->childNodes)) {
foreach($node->childNodes as $childNode) {
$this->FindAndReplace($childNode,$List);
}
}
}
}
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
$innerHTML=html_entity_decode(html_entity_decode($innerHTML));
return $innerHTML;
}
function DOMRemove(DOMNode $from) {
$from->parentNode->removeChild($from);
}
}
$XmlRead=new XmlRead();
$Doc=$XmlRead->loadNprepare('Canada is a cold place');
$XmlRead->FindAndReplace($Doc,array('Canada'));
$Body=$Doc->getElementsByTagName('body')->item(0);
echo $XmlRead->DOMinnerHTML($Body);
output
<span class="country">Canada</span>is a cold place

i wrote my own, and it was the best so far :
if($microformat){
foreach ($this->countries as $co){
$text = preg_replace('/(\#)?\b'.$co.'\b/isU','<span class="country">$0</span>',$text);
}
}
thank you all

Related

PHP DOM Text Replace

I need a little help in dynamically doing PHP DOM text replacement. In my research, I found a snippet of PHP DOM code that looks promising, but the writer provides no method as to how it works. The link to the code is: http://be2.php.net/manual/en/class.domtext.php
So for, here's what I did in approaching the code as a newbie to DOM.
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($myXmlString);
$search = 'FirstName lastname';
$replace = 'Jack Daniels';
$newTxt = domTextReplace( $search, $replace, DOMNode &$doc, $isRegEx = false );
Print_r($newTxt);
I would like the domTextReplace() return $newTxt. How can I get it to do so?
Here you have a working example to use that function:
<?php
$myXmlString = '<root><name>FirstName lastname</name></root>';
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadXML($myXmlString);
$search = 'FirstName lastname';
$replace = 'Jack Daniels';
// The function doesn't return any value
domTextReplace($search, $replace, $doc, $isRegEx = false);
// Now the text is replaced in $doc
$xmlOutput = $doc->saveXML();
// I put xml header to display the results correctly on the browser
header("Content-type: text/xml");
print_r($xmlOutput);
// I copied here the function for everyone to find it quick
function domTextReplace( $search, $replace, DOMNode &$domNode, $isRegEx = false ) {
if ( $domNode->hasChildNodes() ) {
$children = array();
// since looping through a DOM being modified is a bad idea we prepare an array:
foreach ( $domNode->childNodes as $child ) {
$children[] = $child;
}
foreach ( $children as $child ) {
if ( $child->nodeType === XML_TEXT_NODE ) {
$oldText = $child->wholeText;
if ( $isRegEx ) {
$newText = preg_replace( $search, $replace, $oldText );
} else {
$newText = str_replace( $search, $replace, $oldText );
}
$newTextNode = $domNode->ownerDocument->createTextNode( $newText );
$domNode->replaceChild( $newTextNode, $child );
} else {
domTextReplace( $search, $replace, $child, $isRegEx );
}
}
}
}
This is the output:
<root>
<name>Jack Daniels</name>
</root>

php htmlentities tags exceptions leave working only certains

I have no problem to disallow all HTML tags with this code that works fine:
while($row = $result->fetch_array()){
echo "<span class='names'>".htmlentities($row['username'])."</span>:<span class='messages'>".htmlentities($row['msg'])."</span><br>";
}
But what if I want to allow some tags exceptions?
The result that I want is to disable any tag except <p><b><h2>
Example: (allowing <b> and disallowing <div>)
<b>sometext</b><div>sometext</div>
Expected Result:
sometext <div>sometext</div>
See the image:
This code does the job, parsing the HTML code using DOMDocument. It seemed somehow more reliable than regular expressions (what happens if the user inserts an attribute in a forbidden tag? maybe containing <>?), especially after reading this question; it requires more work though, and is not necessarily faster.
<?
$allowed = ['strong']; // your allowed tags
$text = "<div>\n" .
" <div style=\"color: #F00;\">\n" .
" Your <strong>User Text</strong> with DIVs.\n".
" </div>\n" .
" more <strong>text</strong>\n" .
"</div>\n";
echo selective_escape($text, $allowed);
/* outputs:
<div>
<div style="color: #F00;">
Your <strong>User Text</strong> with DIVs.
</div>
more <strong>text</strong>
</div>
*/
/** Escapes HTML entities everywhere but in the allowed tags.
*/
function selective_escape($text, $allowed_tags) {
$doc = new DOMDocument();
/* DOMDocument normalizes the document structure when loading,
adding a bunch of <p> around text where needed. We don't need
this as we're working only on small pieces of HTML.
So we pretend this is a piece of XML code.
*/
// $doc->loadHTML($text);
$doc->loadXML("<?xml version=\"1.0\"?><body>" . $text . "</body>\n");
// find the body
$body = $doc->getElementsByTagName("body")->item(0);
// do stuff
$child = $body->firstChild;
while ($child != NULL) {
$child = selective_escape_node($child, $allowed_tags);
}
// output the innerHTML. need to loop again
$retval = "";
$child = $body->firstChild;
while ($child != NULL) {
$retval .= $doc->saveHTML($child);
$child = $child->nextSibling;
}
return $retval;
}
/** Escapes HTML for tags that are not in $allowed_tags for a DOM tree.
* #returns the next sibling to process, or NULL if we reached the last child.
*
* The function replaced a forbidden tag with two text nodes wrapping the
* children of the old node.
*/
function selective_escape_node($node, $allowed_tags) {
// preprocess children
if ($node->hasChildNodes()) {
$child = $node->firstChild;
while ($child != NULL) {
$child = selective_escape_node($child, $allowed_tags);
}
}
// check if there is anything to do on $node as well
if ($node->nodeType == XML_ELEMENT_NODE) {
if (!in_array($node->nodeName, $allowed_tags)) {
// move children right before $node
$firstChild = NULL;
while ($node->hasChildNodes()) {
$child = $node->firstChild;
if ($firstChild == NULL) $firstChild = $child;
$node->removeChild($child);
$node->parentNode->insertBefore($child, $node);
}
// now $node has no children.
$outer_html = $node->ownerDocument->saveHTML($node);
// two cases. either ends in "/>", or in "</TAGNAME>".
if (substr($outer_html, -2) === "/>") {
// strip off "/>"
$outer_html = substr($outer_html, 0, strlen($outer_html) - 2);
} else {
// find the closing tag
$close_tag = strpos($outer_html, "></" . $node->nodeName . ">");
if ($close_tag === false) {
// uh-oh. something wrong
return NULL;
} else {
// strip "></TAGNAME>"
$outer_html = substr($outer_html, 0, $close_tag);
}
}
// put a textnode before the first child
$txt1 = $node->ownerDocument->createTextNode($outer_html . ">");
// and another before $node
$txt2 = $node->ownerDocument->createTextNode("</" . $node->nodeName . ">");
// note that createTextNode automatically escapes "<>".
$node->parentNode->insertBefore($txt1, $firstChild);
$node->parentNode->insertBefore($txt2, $node);
// pick the next node to process
$next = $node->nextSibling;
// remove node
$node->parentNode->removeChild($node);
return $next;
}
}
// go to next sibling
return $node->nextSibling;
}
?>
HERE IS YOUR RESULT:
Please be aware at bottom to set which tags would be allowed:
function strip_html_tags( $text )
{
$text = preg_replace(
array(
// Remove invisible content
'#<b[^>]*?>.*?</b>#siu', // HERE IS YOUR DISSALOW TAG WITH CONTENT
'#<head[^>]*?>.*?</head>#siu',
'#<style[^>]*?>.*?</style>#siu',
'#<script[^>]*?.*?</script>#siu',
'#<object[^>]*?.*?</object>#siu',
'#<embed[^>]*?.*?</embed>#siu',
'#<applet[^>]*?.*?</applet>#siu',
'#<noframes[^>]*?.*?</noframes>#siu',
'#<noscript[^>]*?.*?</noscript>#siu',
'#<noembed[^>]*?.*?</noembed>#siu',
// Add line breaks before and after blocks
'#</?((address)|(blockquote)|(center)|(del))#iu',
'#</?((h[1-9])|(ins)|(isindex)|(p)|(pre))#iu',
'#</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))#iu',
'#</?((table)|(th)|(td)|(caption))#iu',
'#</?((form)|(button)|(fieldset)|(legend)|(input))#iu',
'#</?((label)|(select)|(optgroup)|(option)|(textarea))#iu',
'#</?((frameset)|(frame)|(iframe))#iu',
),
array(
"\$0", // RETURNED STATEMENT
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\$0", "\$0", "\$0", "\$0", "\$0", "\$0",
"\$0", "\$0",
),
$text );
$to_strip = strip_tags( $text, '<b>' ); // STRIP YOUR BOLD TAGS
// add here to another + add content on above '#<b[^>]*?>.*?</b>#siu', and returns "\$0" on arrays
return $to_strip;
}
$e = '<b>from_bold_text</b><div>from_div_text</div>';
echo strip_html_tags($e);
RESULT:
from_bold_text<div>from_div_text</div>
shell:~$ php ar.php
<b>sometext</b>sometext
shell:~$ cat ar.php
<?php
$t ="<b>sometext</b><div>sometext</div>";
$text = htmlentities($t, ENT_QUOTES, "UTF-8");
$text = htmlspecialchars_decode($text);
$text = strip_tags($text, "<p><b><h2>");
echo $text;
shell:~$ php ar.php
<b>sometext</b>sometext
Note: strip_tags will NOT remove a values inside it, only tags will be removed.
$text = 'sometextsometext';
$text2 = strip_tags($text, '');
var_dump($text2); // it will show allowed tags and values.
For removing a values inside it use regex or another function with CONTENT ON MANUAL:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('#<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>#si', '', $text);
}
else {
return preg_replace('#<('. implode('|', $tags) .')\b.*?>.*?</\1>#si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('#<(\w+)\b.*?>.*?</\1>#si', '', $text);
}
return $text;
}
?>
Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
Result for strip_tags($text):
sample text with tags
Result for strip_tags_content($text):
text with
Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with
Result for strip_tags_content($text, '<b>', TRUE);
text with <div>tags</div>
Your expectation:
$text = '<b>sometext_from_bold</b><div>sometext_from_div</div>';
// here goes function function strip_tags_content($text, $tags = '', $invert = FALSE) {
.... }
// Your results
echo strip_tags_content($text, '<b>', FALSE);
RESULT:
<b>sometext_from_bold</b>

preg_replace inside loop replacing last match instead of each

Sorry about the title I honestly dont know how to explain it properly.
I am making a small shortcode function that needs to replace shortcodes with html output.
The preg_match_all finds everything I need but the preg_replace is replacing the same match over and over again.
Here is the demo
https://eval.in/139727
I am sure I made a mess in those foreach loops but just cant figure it out.
$text = 'Some text and some [link link="linkhref1" text="Text1"],[link link="linkhref2" text="Text2"]';
function shortcodes($text) {
$shortcodes = array(
'link' => array(
"check" => "[link",
"type" => "link",
"match" => "#\[link(.*?)link\=\"(.*?)\"(.*?)text\=\"(.*?)\"#Ui",
"replace" => "/\[link(.*?)\]/s"
)
);
foreach ($shortcodes as $index => $shortcode) {
if (strpos($text, $shortcode['check']) !== false) {
$text = shortcode_replace($shortcode, $text);
}
}
return $text;
}
function shortcode_replace($shortcode, $text) {
$replacement = '';
preg_match_all($shortcode['match'], $text, $matches);
switch ($shortcode['type']) {
case "link":
foreach ($matches[4] as $index => $match) {
$link = $matches[2][$index];
$linktext = $matches[4][$index];
$replacement .= '' . $linktext . '';
$text = preg_replace($shortcode['replace'], $replacement, $text);
}
}
return $text;
}
echo shortcodes($text);
any help is appreciated!
There was a problem in regex i've changed it. Also you don't need preg_replace there.
<?php
$text = 'Some text and some [link link="linkhref1" text="Text1"],[link link="linkhref2" text="Text2"]';
function shortcodes($text) {
$shortcodes = array(
'link' => array(
"check" => "[link",
"type" => "link",
"match" => "#\[link(\s+)link\=\"([^\"]+)\"(\s+)text\=\"([^\"]+)\"\]#Ui",
"replace" => "/\[link(.*?)\]/s"
)
);
foreach ($shortcodes as $index => $shortcode) {
if (strpos($text, $shortcode['check']) !== false) {
$text = shortcode_replace($shortcode, $text);
}
}
return $text;
}
function shortcode_replace($shortcode, $text) {
$replace = '';
preg_match_all($shortcode['match'], $text, $matches);
switch ($shortcode['type']) {
case "link":
var_dump($matches);
foreach ($matches[4] as $index => $match) {
$link = $matches[2][$index];
$linktext = $matches[4][$index];
$replace = '' . $linktext . '';
$text = str_replace($matches[0][$index], $replace, $text);
}
}
return $text;
}
echo shortcodes($text);
Here is a working version :
<?php
$text = 'Some text and some [link link="linkhref1" text="Text1"],[link link="linkhref2" text="Text2"]';
function shortcodes($text) {
$shortcodes = array(
'link' => array(
"check" => "[link",
"type" => "link",
"match" => "#\[link(.*?)link\=\"(.*?)\"(.*?)text\=\"(.*?)\"#",
"replace" => "/\[link(.*?)\]/s"
)
);
foreach ($shortcodes as $index => $shortcode) {
if (strpos($text, $shortcode['check']) !== false) {
$text = shortcode_replace($shortcode, $text);
}
}
return $text;
}
function shortcode_replace($shortcode, $text) {
$replace = '';
preg_match_all($shortcode['match'], $text, $matches);
switch ($shortcode['type']) {
case "link":
foreach ($matches[4] as $index => $match) {
$link = $matches[2][$index];
$linktext = $matches[4][$index];
$replace .= '' . $linktext . '';
$whatToReplace = '[link link="'.$link.'" text="'.$linktext.'"]';
$text = str_replace($whatToReplace, $replace, $text);
}
}
return $text;
}
echo shortcodes($text);
I'm not very good at RegExp, i modified the "match" to match all the links ( with what you got, it didn't )
You need to identify the exact [link ] to replace, and not all all of them. In my opinion, this is the correct way to identify the link, You can also identify it using strpos() ( getting the start and end of the string) or see where the [link starts and the first ] is .
A better option can be to create a regexp with the unique values for it , which can compensate for extra spaces between tags
Hopefully this is of help to you

How to use preg_replace() to apply hilight_string() to content between BBCode-like tags?

I'm trying to run the preg_replace() function to replace the content in between two custom tags (i.e. [xcode]) within the string / content of the page.
What I want to do with the content between these custom tags is to run it through highlight_string() function and to remove those custom tags from the output.
Any idea how to do it?
So you want sort of a BBCode parser. The example below replaces [xcode] tags with whatever markup you like.
<?php
function highlight($text) {
$text = preg_replace('#\[xcode\](.+?)\[\/xcode\]#msi', '<em>\1</em>', $text);
return $text;
}
$text = '[xcode]Lorem ipsum[/xcode] dolor sit [xcode]amet[/xcode].';
echo highlight($text);
?>
Use preg_replace_callback() if you want to pass the matched text to a function:
<?php
function parse($text) {
$text = preg_replace_callback('#\[xcode\](.+?)\[\/xcode\]#msi',
function($matches) {
return highlight_string($matches[1], 1);
}
, $text);
return $text;
}
$text = '[xcode]Lorem ipsum[/xcode] dolor sit [xcode]amet[/xcode].';
echo bbcode($text);
?>
I'll include the source code of a BBCode parser that I made a long time ago. Feel free to use it.
<?php
function bbcode_lists($text) {
$pattern = "#\[list(\=(1|a))?\](.*?)\[\/list\]#msi";
while (preg_match($pattern, $text, $matches)) {
$points = explode("[*]", $matches[3]);
array_shift($points);
for ($i = 0; $i < count($points); $i++) {
$nls = split("[\n]", $points[$i]);
$brs = count($nls) - 2;
$points[$i] = preg_replace("[\r\n]", "<br />", $points[$i], $brs);
}
$replace = ($matches[2] != '1') ? ($matches[2] != 'a') ? '<ul>' : '<ol style="list-style:lower-alpha">' : '<ol style="list-style:decimal">';
$replace .= "<li>";
$replace .= implode("</li><li>", $points);
$replace .= "</li>";
$replace .= ($matches[2] == '1' || $matches[2] == 'a' ) ? '</ol>' : '</ul>';
$text = preg_replace($pattern, $replace, $text, 1);
$text = preg_replace("[\r\n]", "", $text);
}
return $text;
}
function bbcode_parse($text) {
$text = preg_replace("[\r\n]", "<br />", $text);
$smilies = Array(
':)' => 'smile.gif',
':d' => 'tongue2.gif',
':P' => 'tongue.gif',
':lol:' => 'lol.gif',
':D' => 'biggrin.gif',
';)' => 'wink.gif',
':zzz:' => 'zzz.gif',
':confused:' => 'confused.gif'
);
foreach ($smilies as $key => $value) {
$text = str_replace($key, '<img src="/images/smilies/' . $value . '" alt="' . $key . '" />', $text);
}
if (!(!strpos($text, "[") && !strpos($text, "]"))) {
$bbcodes = Array(
'#\[b\](.*?)\[/b\]#si' => '<strong>$1</strong>',
'#\[i\](.*?)\[/i\]#si' => '<em>$1</em>',
'#\[u\](.*?)\[/u\]#si' => '<span class="u">$1</span>',
'#\[s\](.*?)\[/s\]#si' => '<span class="s">$1</span>',
'#\[size=(.*?)\](.*?)\[/size\]#si' => '<span style="font-size:$1">$2</span>',
'#\[color=(.*?)\](.*?)\[/color\]#si' => '<span style="color:$1">$2</span>',
'#\[url=(.*?)\](.*?)\[/url\]#si' => '$2',
'#\[url\](.*?)\[/url\]#si' => '$1',
'#\[img\](.*?)\[/img\]#si' => '<img src="$1" alt="" />',
'#\[code\](.*?)\[/code\]#si' => '<div class="code">$1</div>'
);
$text = preg_replace(array_keys($bbcodes), $bbcodes, $text);
$text = bbcode_lists($text);
$quote_code = Array("'\[quote=(.*?)\](.*?)'i", "'\[quote](.*?)'i", "'\[/quote\]'i");
$quote_html = Array('<blockquote><p class="quotetitle">Quote \1:</p>\2', '<blockquote>\2', '</blockquote>');
$text = preg_replace($quote_code, $quote_html, $text);
}
return $text;
}
?>
basically,
preg_replace_callback('~\[tag\](.+?)\[/tag\]~', function($matches) { whatever }, $text);
this doesn't handle nested tags though
complete example
$text = "hello [xcode] <? echo bar ?> [/xcode] world";
echo preg_replace_callback(
'~\[xcode\](.+?)\[/xcode\]~',
function($matches) {
return highlight_string($matches[1], 1);
},
$text
);
<?php
$string = 'The quick brown fox jumped over the lazy dog.';
$patterns = array();
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements = array();
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
echo preg_replace($patterns, $replacements, $string);
?>
The above example will output:
The bear black slow jumped over the lazy dog.
http://php.net/manual/en/function.preg-replace.php
OR
str_replace should help you
http://php.net/manual/en/function.str-replace.php
Thanks to user187291's suggestion and preg_replace_callback specification I've ended up with the following outcome which does the job spot on! :
function parseTagsRecursive($input)
{
$regex = '~\[xcode\](.+?)\[/xcode\]~';
if (is_array($input)) {
$input = highlight_string($input[1], true);
}
return preg_replace_callback($regex, 'parseTagsRecursive', $input);
}
$text = "hello [xcode] <? echo bar ?> [/xcode] world and [xcode] <?php phpinfo(); ?> [/xcode]";
echo parseTagsRecursive($text);
The output of parsing the $text variable through this function is:
hello <? echo bar ?> world and <?php phpinfo(); ?>
Thank you everyone for input!

Close open HTML tags in a string

Situation is a string that results in something like this:
<p>This is some text and here is a <strong>bold text then the post stop here....</p>
Because the function returns a teaser (summary) of the text, it stops after certain words. Where in this case the tag strong is not closed. But the whole string is wrapped in a paragraph.
Is it possible to convert the above result/output to the following:
<p>This is some text and here is a <strong>bold text then the post stop here....</strong></p>
I do not know where to begin. The problem is that.. I found a function on the web which does it regex, but it puts the closing tag after the string.. therefore it won't validate because I want all open/close tags within the paragraph tags. The function I found does this which is wrong also:
<p>This is some text and here is a <strong>bold text then the post stop here....</p></strong>
I want to know that the tag can be strong, italic, anything. That's why I cannot append the function and close it manually in the function. Any pattern that can do it for me?
Here is a function i've used before, which works pretty well:
function closetags($html) {
preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1];
preg_match_all('#</([a-z]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
if (count($closedtags) == $len_opened) {
return $html;
}
$openedtags = array_reverse($openedtags);
for ($i=0; $i < $len_opened; $i++) {
if (!in_array($openedtags[$i], $closedtags)) {
$html .= '</'.$openedtags[$i].'>';
} else {
unset($closedtags[array_search($openedtags[$i], $closedtags)]);
}
}
return $html;
}
Personally though, I would not do it using regexp but a library such as Tidy. This would be something like the following:
$str = '<p>This is some text and here is a <strong>bold text then the post stop here....</p>';
$tidy = new Tidy();
$clean = $tidy->repairString($str, array(
'output-xml' => true,
'input-xml' => true
));
echo $clean;
A small modification to the original answer...while the original answer stripped tags correctly. I found that during my truncation, I could end up with chopped up tags. For example:
This text has some <b>in it</b>
Truncating at character 21 results in:
This text has some <
The following code, builds on the next best answer and fixes this.
function truncateHTML($html, $length)
{
$truncatedText = substr($html, $length);
$pos = strpos($truncatedText, ">");
if($pos !== false)
{
$html = substr($html, 0,$length + $pos + 1);
}
else
{
$html = substr($html, 0,$length);
}
preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1];
preg_match_all('#</([a-z]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
if (count($closedtags) == $len_opened)
{
return $html;
}
$openedtags = array_reverse($openedtags);
for ($i=0; $i < $len_opened; $i++)
{
if (!in_array($openedtags[$i], $closedtags))
{
$html .= '</'.$openedtags[$i].'>';
}
else
{
unset($closedtags[array_search($openedtags[$i], $closedtags)]);
}
}
return $html;
}
$str = "This text has <b>bold</b> in it</b>";
print "Test 1 - Truncate with no tag: " . truncateHTML($str, 5) . "<br>\n";
print "Test 2 - Truncate at start of tag: " . truncateHTML($str, 20) . "<br>\n";
print "Test 3 - Truncate in the middle of a tag: " . truncateHTML($str, 16) . "<br>\n";
print "Test 4: - Truncate with less text: " . truncateHTML($str, 300) . "<br>\n";
Hope it helps someone out there.
And what about using PHP's native DOMDocument class? It inherently parses HTML and corrects syntax errors...
E.g.:
$fragment = "<article><h3>Title</h3><p>Unclosed";
$doc = new DOMDocument();
$doc->loadHTML($fragment);
$correctFragment = $doc->getElementsByTagName('body')->item(0)->C14N();
echo $correctFragment;
However, there are several disadvantages of this approach.
Firstly, it wraps the original fragment within the <body> tag. You can get rid of it easily by something like (preg_)replace() or by substituting the ...->C14N() function by some custom innerHTML() function, as suggested for example at http://php.net/manual/en/book.dom.php#89718.
The second pitfall is that PHP throws an 'invalid tag in Entity' warning if HTML5 or custom tags are used (nevertheless, it will still proceed correctly).
This PHP method always worked for me. It will close all un-closed HTML tags.
function closetags($html) {
preg_match_all('#<([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1];
preg_match_all('#</([a-z]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
if (count($closedtags) == $len_opened) {
return $html;
}
$openedtags = array_reverse($openedtags);
for ($i=0; $i < $len_opened; $i++) {
if (!in_array($openedtags[$i], $closedtags)){
$html .= '</'.$openedtags[$i].'>';
} else {
unset($closedtags[array_search($openedtags[$i], $closedtags)]);
}
}
return $html;
}
There are numerous other variables that need to be addressed to give a full solution, but are not covered by your question.
However, I would suggest using something like HTML Tidy and in particular the repairFile or repaireString methods.
if tidy module is installed, use php tidy extension:
tidy_repair_string($html)
reference
Using a regular expression isn't an ideal approach for this. You should use an html parser instead to create a valid document object model.
As a second option, depending on what you want, you could use a regex to remove any and all html tags from your string before you put it in the <p> tag.
I've done this code witch doest the job quite correctly...
It's old school but efficient and I've added a flag to remove the unfinished tags such as " blah blah http://stackoverfl"
public function getOpennedTags(&$string, $removeInclompleteTagEndTagIfExists = true) {
$tags = array();
$tagOpened = false;
$tagName = '';
$tagNameLogged = false;
$closingTag = false;
foreach (str_split($string) as $c) {
if ($tagOpened && $c == '>') {
$tagOpened = false;
if ($closingTag) {
array_pop($tags);
$closingTag = false;
$tagName = '';
}
if ($tagName) {
array_push($tags, $tagName);
}
}
if ($tagOpened && $c == ' ') {
$tagNameLogged = true;
}
if ($tagOpened && $c == '/') {
if ($tagName) {
//orphan tag
$tagOpened = false;
$tagName = '';
} else {
//closingTag
$closingTag = true;
}
}
if ($tagOpened && !$tagNameLogged) {
$tagName .= $c;
}
if (!$tagOpened && $c == '<') {
$tagNameLogged = false;
$tagName = '';
$tagOpened = true;
$closingTag = false;
}
}
if ($removeInclompleteTagEndTagIfExists && $tagOpened) {
// an tag has been cut for exemaple ' blabh blah <a href="sdfoefzofk' so closing the tag will not help...
// let's remove this ugly piece of tag
$pos = strrpos($string, '<');
$string = substr($string, 0, $pos);
}
return $tags;
}
Usage example :
$tagsToClose = $stringHelper->getOpennedTags($val);
$tagsToClose = array_reverse($tagsToClose);
foreach ($tagsToClose as $tag) {
$val .= "</$tag>";
}
This is works for me to close any open HTML tags in a script.
<?php
function closetags($html) {
preg_match_all('#<([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1];
preg_match_all('#</([a-z]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
if (count($closedtags) == $len_opened) {
return $html;
}
$openedtags = array_reverse($openedtags);
for ($i=0; $i < $len_opened; $i++) {
if (!in_array($openedtags[$i], $closedtags)) {
$html .= '</'.$openedtags[$i].'>';
} else {
unset($closedtags[array_search($openedtags[$i], $closedtags)]);
}
}
return $html;
}
An up-to-date solution with parsing HTML would be:
function fix_html($html) {
$dom = new DOMDocument();
$dom->loadHTML( mb_convert_encoding( $html, 'HTML-ENTITIES', 'UTF-8' ), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
return $dom->saveHTML();
}
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD is needed to avoid implementing doctype, html and body.. the rest looks pretty obvious :)
UPDATE:
After some testing noticed, that the solution above ruins a correct layout time-after-time. The following works well, though:
function fix_html($html) {
$dom = new DOMDocument();
$dom->loadHTML( mb_convert_encoding( $html, 'HTML-ENTITIES', 'UTF-8' ) );
$return = '';
foreach ( $dom->getElementsByTagName( 'body' )->item(0)->childNodes as $v ) {
$return .= $dom->saveHTML( $v );
}
return $return;
}

Categories