Replace markdown with html in wordpress

Replace markdown with html in wordpress - php

I am trying to replace all the wiki markdown text from my custom post. Example:
== My Subheading ==
=== My sub Subheading ===
==== another heading ====
I am trying to change that content like below:
My Subheading My sub Subheading another
heading
So, I tried to use below function. But, didn't worked!
I am seeing:
Parse error: syntax error, unexpected token "{", expecting "("
I am not much familiar with WP custom function. Can u guys please help me?
function the_content{
private $patterns, $replacements;
public function __construct($analyze=false) {
$this->patterns=array(
"/\r\n/",
// Headings
"/^==== (.+?) ====$/m",
"/^=== (.+?) ===$/m",
"/^== (.+?) ==$/m",
// Formatting
"/\'\'\'\'\'(.+?)\'\'\'\'\'/s",
"/\'\'\'(.+?)\'\'\'/s",
"/\'\'(.+?)\'\'/s",
// Special
"/^----+(\s*)$/m",
"/\[\[(file|img):((ht|f)tp(s?):\/\/(.+?))( (.+))*\]\]/i",
"/\[((news|(ht|f)tp(s?)|irc):\/\/(.+?))( (.+))\]/i",
"/\[((news|(ht|f)tp(s?)|irc):\/\/(.+?))\]/i",
// Indentations
"/[\n\r]: *.+([\n\r]:+.+)*/",
"/^:(?!:) *(.+)$/m",
"/([\n\r]:: *.+)+/",
"/^:: *(.+)$/m",
// Ordered list
"/[\n\r]?#.+([\n|\r]#.+)+/",
"/[\n\r]#(?!#) *(.+)(([\n\r]#{2,}.+)+)/",
// Unordered list
"/[\n\r]?\*.+([\n|\r]\*.+)+/",
"/[\n\r]\*(?!\*) *(.+)(([\n\r]\*{2,}.+)+)/",
// List items
"/^[#\*]+ *(.+)$/m",
"/^(?!<li|dd).+(?=(<a|strong|em|img)).+$/mi",
"/^[^><\n\r]+$/m",
);
$this->replacements=array(
"\n",
// Headings
"<h3>$1</h3>",
"<h2>$1</h2>",
"<h1>$1</h1>",
//Formatting
"<strong><em>$1</em></strong>",
"<strong>$1</strong>",
"<em>$1</em>",
// Special
"<hr/>",
"<img src=\"$2\" alt=\"$6\"/>",
"$7",
"$1",
// Indentations
"\n<dl>$0\n</dl>",
"<dd>$1</dd>",
"\n<dd><dl>$0\n</dl></dd>",
"<dd>$1</dd>",
// Ordered list
"\n<ol>\n$0\n</ol>",
"\n<li>$1\n<ol>$2\n</ol>\n</li>",
// Unordered list
"\n<ul>\n$0\n</ul>",
"\n<li>$1\n<ul>$2\n</ul>\n</li>",
// List items
"<li>$1</li>",
// Newlines
"$0<br/>",
"$0<br/>",
);
if($analyze) {
foreach($this->patterns as $k=>$v) {
$this->patterns[$k].="S";
}
}
}
public function parse($input) {
if(!empty($input))
$output=preg_replace($this->patterns,$this->replacements,$input);
else
$output=false;
return $output;
}
}
Mainly I am trying to use a filter on the_content, which will convert markdown text to simple HTML using regex replace.

There were a few issues.
You weren't declaring your class properly (it is not a "function").
There was trailing space after == My Subheading == (before the end of line marker -- you need to allow zero or more spaces before $)
Please educate yourself on PHP's PSR-12 coding standards -- this will help you to write clean, consistent, and professional code.
Code: (Demo)
$text = <<<TEXT
== My Subheading ==
=== My sub Subheading ===
==== another heading ====
TEXT;
class ContentParser
{
private $patterns = [];
private $replacements = [];
public function __construct() {
$this->patterns = [
// Headings
"/^==== (.+?) ====\h*$/m",
"/^=== (.+?) ===\h*$/m",
"/^== (.+?) ==\h*$/m",
];
$this->replacements = [
// Headings
"<h3>$1</h3>",
"<h2>$1</h2>",
"<h1>$1</h1>",
];
}
public function parse($input) {
if (!empty($input)) {
$output = preg_replace($this->patterns, $this->replacements, $input);
} else {
$output = false; // I don't recommend returning a boolean when otherwise returning strings
}
return $output;
}
}
$object = new ContentParser();
var_export($object->parse($text));
Output: (the single quotes are from var_export(), you can use echo instead)
'<h1>My Subheading</h1>
<h2>My sub Subheading</h2>
<h3>another heading</h3>'

Related

Replace multiple items between tags in a string

Trying to write a function that will replace #something# and #anything# in any string with items in my db that match the name "something" and "anything".
This should work for no matter how many different #some-name# there are in my string. Below is what I have so far and it's working, although only the last (#anything#) is being replaced with the correct code when I load in my browser.
Please keep in mind that I'm learning, so I may be completely off on how to go about this. If there is a better way, I'm all ears.
HTML (String)
<p>This is "#something#" I wanted to replace with code from my database. Really, I could have "#anything#" between my pound sign tags and it should be replaced with text from my database</p>
OUTPUT I'm Getting
This is "#something#" I want to replace with code from my database. Really, I could have "Any Name" between my pound sign tags and it should be replaced with text from my database
DESIRED OUTPUT
This is "The Code" I want to replace with code from my database. Really, I could have "Any Name" between my pound sign tags and it should be replaced with text from my database
FUNCTION in CMS class a.php
public function get_snippets($string) {
$regex = "/#(.*?)#/";
preg_match_all($regex, $string, $names);
$names = $names[1];
foreach ($names as $name){
$find_record = Snippet::find_snippet_code($name);
$db_name = $find_record->name;
if($name == $db_name) {
$snippet_name = "/#".$name."#/";
$code = $find_record->code;
}
}
echo preg_replace($snippet_name, $code, $string);
}
FUNCTION in Snippet class b.php
public static function find_snippet_code($name) {
global $database;
$result_array = static::find_by_sql("SELECT * from ".static::$table_name." WHERE name = '{$name}'");
return !empty($result_array) ? array_shift($result_array) : false;
}

It's because your preg_replace occurs outside of the foreach() loop, so it only happens once.
Here is a working example based on your code which returns $string.
Note that I also use PREG_SET_ORDER which gives me each match as its own array:
function get_snippets($string) {
$regex = '/#([^#]+)#/';
$num_matches = preg_match_all($regex, $string, $matches, PREG_SET_ORDER);
if ($num_matches > 0) {
foreach ($matches as $match) {
// Each match is an array consisting of the token we matched and the 'name' without the special characters
list($token, $name) = $match;
// See if there is a matching record for 'name'
$find_record = Snippet::find_snippet_code($name);
// This step might be redundant, but compare name with the record name
if ($find_record->name == $name) {
// Replace all instances of #token# with the code from the matched record
$string = preg_replace('/'.$token.'/', $find_record->code, $string);
}
}
}
return $string;
}

What you're looking for is preg_replace_callback():
public function get_snippets($string)
{
$regex = "/#(.*?)#/";
return preg_replace_callback($regex, function($match) {
$find_record = Snippet::find_snippet_code($match[1]);
return $find_record === false ? '' : $find_record->code;
}, $string);
}

PHP - simple shortcode parser, output order is wrong

I have a simple function which parse shortcode tags and its attribute,
but it has some problem in output.
Like, this is my content in a string with a shortcode inside it:
$content = 'This is lorem ispium test [gallery image="10"] and text continues...'
I want the result output like this:
This is lorem ispium test
----------------------------------------------
| This is output of gallery |
-----------------------------------------------
and text continues...
But now shortcode is not rendering where the shortcode is called, instead of this shortcode render at the top. like:
----------------------------------------------
| This is output of gallery |
-----------------------------------------------
This is lorem ispium test and text continues...
Kindly tell how do I render shortcode where it was called
function shortcode($content) {
$shortcodes = implode('|', array_map('preg_quote', get('shortcodes')));
$pattern = "/(.?)\[($shortcodes)(.*?)(\/)?\](?(4)|(?:(.+?)\[\/\s*\\2\s*\]))?(.?)/s";
echo preg_replace_callback($pattern, array($this,'handleShortcode'), $content);
}
function handleShortcode($matches) {
$prefix = $matches[1];
$suffix = $matches[6];
$shortcode = .$matches[2];
// allow for escaping shortcodes by enclosing them in double brackets ([[shortcode]])
if($prefix == '[' && $suffix == ']') {
return substr($matches[0], 1, -1);
}
$attributes = array(); // Parse attributes into into this array.
if(preg_match_all('/(\w+) *= *(?:([\'"])(.*?)\\2|([^ "\'>]+))/', $matches[3], $match, PREG_SET_ORDER)) {
foreach($match as $attribute) {
if(!empty($attribute[4])) {
$attributes[strtolower($attribute[1])] = $attribute[4];
} elseif(!empty($attribute[3])) {
$attributes[strtolower($attribute[1])] = $attribute[3];
}
}
}
//callback to gallery
return $prefix. call_user_func(array($this,$shortcode), $attributes, $matches[5], $shortcode) . $suffix;
}
function gallery($att, $cont){
//gallery output
}
Please note: it is not related to wordpress, it is a custom script.

I believe that the problem may be in your function gallery($att, $cont).
If that function uses echo or print instead of return, then it makes perfect sense to show up before the actual content does.
EDIT:
If you can't change the gallery code, then yes, you can use output buffering.
function handleShortcode($matches) {
...
ob_start();
call_user_func(array($this,$shortcode), $attributes, $matches[5], $shortcode);
$gallery_output = ob_get_contents();
ob_end_clean();
return $prefix . $gallery_output . $suffix;
}
Related readings:
PHP ob_start
PHP ob_get_contents

preg_replace: Remove comments only within curly backets

I have this:
$text = 'This is some text /*Comment 1 */ . Some more text{ This is to let you know that /* this is a comment*/. A comment /*this one is */ can be anything }. So the next thing { This is to let you know that /* this is a comment*/. A comment /*this one is */ can be anything } is another topic. /*Final comment*/';
Need this:
$text = 'This is some text /*Comment 1 */ . Some more text{ This is to let you know that . A comment can be anything }. So the next thing { This is to let you know that . A comment can be anything } is another topic. /*Final comment*/';
Tried this:
$text = preg_replace("/\/\*.*?\*\//", "", $text);
The problem is that what I have tried, is removing all the comments. I just want the comments appearing within { } to be removed. How to do this?

You can use the following regular expression to tokenize the string:
$tokens = preg_split('~(/\*.*?\*/|[{}])~s', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then iterate the tokens to find opening { and the comments inside them:
$level = 0;
for ($i=1, $n=count($tokens); $i<$n; $i+=2) { // iterate only the special tokens
$token = &$tokens[$i];
switch ($token) {
case '{':
$level++;
break;
case '}':
if ($level < 1) {
echo 'parse error: unexpected "}"';
break 2;
}
$level--;
break;
default: // since we only have four different tokens, this must be a comment
if ($level > 0) {
unset($tokens[$i]);
}
break;
}
}
if ($level > 0) {
echo 'parse error: expecting "}"';
} else {
$str = implode('', $tokens);
}

This is probably the safest way:
<?php
$text = 'This is some text /*Comment 1 */ . Some more text{ This is to let you know that /* this is a comment*/. A comment /*this one is */ can be anything }. So the next thing { This is to let you know that /* this is a comment*/. A comment /*this one is */ can be anything } is another topic. /*Final comment*/';
$text = preg_replace_callback('#\{[^}]+\}#msi', 'remove_comments', $text);
var_dump($text);
function remove_comments($text) {
return preg_replace('#/\*.*?\*/#msi', '', $text[0]);
}
?>
It searches for {} then removes the comments inside them. This will remove multiple comments in a {}.

PHP: Display the first 500 characters of HTML

I have a huge HTML code in a PHP variable like :
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
I want to display only first 500 characters of this code. This character count must consider the text in HTML tags and should exclude HTMl tags and attributes while measuring the length.
but while triming the code, it should not affect DOM structure of HTML code.
Is there any tuorial or working examples available?

If its the text you want, you can do this with the following too
substr(strip_tags($html_code),0,500);

Ooohh... I know this I can't get it exactly off the top of my head but you want to load the text you've got as a DOMDOCUMENT
http://www.php.net/manual/en/class.domdocument.php
then grab the text from the entire document node (as a DOMnode http://www.php.net/manual/en/class.domnode.php)
This won't be exactly right, but hopefully this will steer you onto the right track.
Try something like:
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$dom = new DOMDocument();
$dom->loadHTML($html_code);
$text_to_strip = $dom->textContent;
$stripped = mb_substr($text_to_strip,0,500);
echo "$stripped"; // The Sameple text.Another sample text.....
edit ok... that should work. just tested locally
edit2
Now that I understand you want to keep the tags, but limit the text, lets see. You're going to want to loop the content until you get to 500 characters. This is probably going to take a few edits and passes for me to get right, but hopefully I can help. (sorry I can't give undivided attention)
First case is when the text is less than 500 characters. Nothing to worry about. Starting with the above code we can do the following.
if (strlen($stripped) > 500) {
// this is where we do our work.
$characters_so_far = 0;
foreach ($dom->child_nodes as $ChildNode) {
// should check if $ChildNode->hasChildNodes();
// probably put some of this stuff into a function
$characters_in_next_node += str_len($ChildNode->textcontent);
if ($characters_so_far+$characters_in_next_node > 500) {
// remove the node
// try using
// $ChildNode->parentNode->removeChild($ChildNode);
}
$characters_so_far += $characters_in_next_node
}
//
$final_out = $dom->saveHTML();
} else {
$final_out = $html_code;
}

i'm pasting below a php class i wrote a long time ago, but i know it works. its not exactly what you're after, as it deals with words instead of a character count, but i figure its pretty close and someone might find it useful.
class HtmlWordManipulator
{
var $stack = array();
function truncate($text, $num=50)
{
if (preg_match_all('/\s+/', $text, $junk) <= $num) return $text;
$text = preg_replace_callback('/(<\/?[^>]+\s+[^>]*>)/','_truncateProtect', $text);
$words = 0;
$out = array();
$text = str_replace('<',' <',str_replace('>','> ',$text));
$toks = preg_split('/\s+/', $text);
foreach ($toks as $tok)
{
if (preg_match_all('/<(\/?[^\x01>]+)([^>]*)>/',$tok,$matches,PREG_SET_ORDER))
foreach ($matches as $tag) $this->_recordTag($tag[1], $tag[2]);
$out[] = trim($tok);
if (! preg_match('/^(<[^>]+>)+$/', $tok))
{
if (!strpos($tok,'=') && !strpos($tok,'<') && strlen(trim(strip_tags($tok))) > 0)
{
++$words;
}
else
{
/*
echo '<hr />';
echo htmlentities('failed: '.$tok).'<br /)>';
echo htmlentities('has equals: '.strpos($tok,'=')).'<br />';
echo htmlentities('has greater than: '.strpos($tok,'<')).'<br />';
echo htmlentities('strip tags: '.strip_tags($tok)).'<br />';
echo str_word_count($text);
*/
}
}
if ($words > $num) break;
}
$truncate = $this->_truncateRestore(implode(' ', $out));
return $truncate;
}
function restoreTags($text)
{
foreach ($this->stack as $tag) $text .= "</$tag>";
return $text;
}
private function _truncateProtect($match)
{
return preg_replace('/\s/', "\x01", $match[0]);
}
private function _truncateRestore($strings)
{
return preg_replace('/\x01/', ' ', $strings);
}
private function _recordTag($tag, $args)
{
// XHTML
if (strlen($args) and $args[strlen($args) - 1] == '/') return;
else if ($tag[0] == '/')
{
$tag = substr($tag, 1);
for ($i=count($this->stack) -1; $i >= 0; $i--) {
if ($this->stack[$i] == $tag) {
array_splice($this->stack, $i, 1);
return;
}
}
return;
}
else if (in_array($tag, array('p', 'li', 'ul', 'ol', 'div', 'span', 'a')))
$this->stack[] = $tag;
else return;
}
}
truncate is what you want, and you pass it the html and the number of words you want it trimmed down to. it ignores html while counting words, but then rewraps everything in html, even closing trailing tags due to the truncation.
please don't judge me on the complete lack of oop principles. i was young and stupid.
edit:
so it turns out the usage is more like this:
$content = $manipulator->restoreTags($manipulator->truncate($myHtml,$numOfWords));
stupid design decision. allowed me to inject html inside the unclosed tags though.

I'm not up to coding a real solution, but if someone wants to, here's what I'd do (in pseudo-PHP):
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$aggregate = '';
$document = XMLParser($html_code);
foreach ($document->getElementsByTagName('*') as $element) {
$aggregate .= $element->text(); // This is the text, not HTML. It doesn't
// include the children, only the text
// directly in the tag.
}

How to remove php code from a string?

I have a string that has php code in it, I need to remove the php code from the string, for example:
<?php $db1 = new ps_DB() ?><p>Dummy</p>
Should return <p>Dummy</p>
And a string with no php for example <p>Dummy</p> should return the same string.
I know this can be done with a regular expression, but after 4h I haven't found a solution.

<?php
function filter_html_tokens($a){
return is_array($a) && $a[0] == T_INLINE_HTML ?
$a[1]:
'';
}
$htmlphpstring = '<a>foo</a> something <?php $db1 = new ps_DB() ?><p>Dummy</p>';
echo implode('',array_map('filter_html_tokens',token_get_all($htmlphpstring)));
?>
As ircmaxell pointed out: this would require valid PHP!
A regex route would be (allowing for no 'php' with short tags. no ending ?> in the string / file (for some reason Zend recommends this?) and of course an UNgreedy & DOTALL pattern:
preg_replace('/<\\?.*(\\?>|$)/Us', '',$htmlphpstring);

Well, you can use DomDocument to do it...
function stripPHPFromHTML($html) {
$dom = new DomDocument();
$dom->loadHtml($html);
removeProcessingInstructions($dom);
$simple = simplexml_import_dom($d->getElementsByTagName('body')->item(0));
return $simple->children()->asXml();
}
function removeProcessingInstructions(DomNode &$node) {
foreach ($node->childNodes as $child) {
if ($child instanceof DOMProcessingInstruction) {
$node->removeChild($child);
} else {
removeProcessingInstructions($child);
}
}
}
Those two functions will turn
$str = '<?php echo "foo"; ?><b>Bar</b>';
$clean = stripPHPFromHTML($str);
$html = '<b>Bar</b>';
Edit: Actually, after looking at Wrikken's answer, I realized that both methods have a disadvantage... Mine requires somewhat valid HTML markup (Dom is decent, but it won't parse <b>foo</b><?php echo $bar). Wrikken's requires valid PHP (any syntax errors and it'll fail). So perhaps a combination of the two (try one first. If it fails, try the other. If both fail, there's really not much you can do without trying to figure out the exact reason they failed)...

A simple solution is to explode into arrays using the php tags to remove any content between and implode back to a string.
function strip_php($str) {
$newstr = '';
//split on opening tag
$parts = explode('<?',$str);
if(!empty($parts)) {
foreach($parts as $part) {
//split on closing tag
$partlings = explode('?>',$part);
if(!empty($partlings)) {
//remove content before closing tag
$partlings[0] = '';
}
//append to string
$newstr .= implode('',$partlings);
}
}
return $newstr;
}
This is slower than regex but doesn't require valid html or php; it only requires all php tags to be closed.
For files which don't always include a final closing tag and for general error checking you could count the tags and append a closing tag if it's missing or notify if the opening and closing tags don't add up as expected, e.g. add the code below at the start of the function. This would slow it down a bit more though :)
$tag_diff = (substr_count($str,'<?') - (substr_count($str,'?>');
//Append if there's one less closing tag
if($tag_diff == 1) $str .= '?>';
//Parse error if the tags don't add up
if($tag_diff < 0 || $tag_diff > 1) die('Error: Tag mismatch.
(Opening minus closing tags = '.$tag_diff.')<br><br>
Dumping content:<br><hr><br>'.htmlentities($str));

This is an enhanced version of strip_php suggested by #jon that is able to replace php part of code with another string:
/**
* Remove PHP code part from a string.
*
* #param string $str String to clean
* #param string $replacewith String to use as replacement
* #return string Result string without php code
*/
function dolStripPhpCode($str, $replacewith='')
{
$newstr = '';
//split on each opening tag
$parts = explode('<?php',$str);
if (!empty($parts))
{
$i=0;
foreach($parts as $part)
{
if ($i == 0) // The first part is never php code
{
$i++;
$newstr .= $part;
continue;
}
//split on closing tag
$partlings = explode('?>', $part);
if (!empty($partlings))
{
//remove content before closing tag
if (count($partlings) > 1) $partlings[0] = '';
//append to out string
$newstr .= $replacewith.implode('',$partlings);
}
}
}
return $newstr;
}

If you are using PHP, you just need to use a regular expression to replace anything that matches PHP code.
The following statement will remove the PHP tag:
preg_replace('/^<\?php.*\?\>/', '', '<?php $db1 = new ps_DB() ?><p>Dummy</p>');
If it doesn't find any match, it won't replace anything.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Replace markdown with html in wordpress - php

Related

Replace multiple items between tags in a string

PHP - simple shortcode parser, output order is wrong

preg_replace: Remove comments only within curly backets

PHP: Display the first 500 characters of HTML

How to remove php code from a string?

Categories

Resources