markdown: render linebreaks within block elements as <br> - php

I know this has been asked (Python Markdown nl2br extension, etc) but none of those answers is doing it for me.
I would like to render markdown so that linebreaks occuring within a <p> element will be rendered as <br>. Example: they type
Here is line one.
And line two.
New paragraph.
should render as
<p>Here is line one.<br>And line two.</p>
<p>New paragraph.</p>
I know that if you want that, you should type two spaces at the end of the line you want to <br>. I am trying to make it so my users don't have to do that, but rather, enter text as though they were using a typewriter (for those who know what that is). One hard return, new line; two hard returns, new paragraph.
I've been working with https://parsedown.org/ and have also experimented with https://commonmark.thephpleague.com; also the Python markdown module with nl2br extension (tried their example verbatim, did not work for me). Whatever I do, I end up with either too many or not enough linebreaks, depending.
I have tried what I thought would be clever and elegant: style my markdown's <p> with white-space: "pre" (also tried pre-line). That works, unless the user has done it "right" with two spaces, in which case you get the unwanted double <br> effect.
Also tried nl2br($markdown) with likewise unreliable results.
I want non-technical users to be able to use some basic formatting as easily as possible, and markdown seems just the thing, but for this detail. I don't want to write a CMS just to work around this. For example, I've thought of adding a boolean markdown property on the entity and letting them choose, yadda yadda... don't wanna go there. I've thought of doing some string-replacement or regexp magic, either at database-write time or just before rendering. But again, hoping to avoid getting too complicated. (To make it a little more challenging, I will also have to import a few thousand legacy records that are non-markdown, and potentially deal with issues around old ones versus new.)
Maybe I'm overlooking a simple, sane way out. Any thoughts as to the best strategy?
Update: by popular demand, code examples of what does not work. It's a Zend MVC application that involves Doctrine entities I call MOTD and MOTW (Message Of The Day and Message Of The Week, respectively); these have a string property called content. Generically I think of these entities as Notes and they implement a NoteInterface. When I retrieve these from the database (via a NotesService class that internally uses a custom Doctrine repository class), it's time to render the content as markdown before the controller assigns it to the view:
// from NotesService.php
use Parsedown;
// stuff omitted...
/**
* gets MOT(D|W) by date
*
* #param DateTime $date
* #param string $type
* #param boolean $render_markdown
* #return NoteInterface|null
*/
public function getNoteByDate(DateTime $date, string $type, bool $render_markdown = true) :? NoteInterface
{
$entity = $this->getRepository()->findByDate($date,$type);
if ($entity && $render_markdown) {
$content = $entity->getContent();
$entity->setContent($this->parsedown($content));
}
return $entity;
}
The point of the boolean $render_markdown is for when we want raw markdown, i.e., when it's going to populate a textarea element of a form.
And the parsedown() method, quite simply:
public function parsedown(string $content) : string
{
if (! $this->parseDown) {
$this->parseDown = new Parsedown();
}
// nope...
// return nl2br($this->parseDown->text($content));
return $this->parseDown->text($content);
}
Inside a viewscript, I just go, e.g.,
if ($this->notes['motd']):
// echo nl2br($this->notes['motd']->getContent());
echo $this->notes['motd']->getContent();
else:
?><p class="font-italic no-note">no MOTD for this date</p><?php
endif;
Now, if in the editing form they input this as content:
here is a line
and here is another
now, new paragraph.
and then we save it in the database, when you select it back out and run it through $parsedown->text($content), you get this HTML:
<p>here is a line
and here is another</p>
<p>now, new paragraph.</p>
Please note, the example input above does not have any space characters preceding the linebreaks. When you do type two spaces before the linebreaks, yeah, it works great. But I don't think my users want to think about that. So using nl2br() helps, except when it results in too many consecutive <br>s in the HTML.
My latest thinking is, use a CSS solution and an input filter that strips <space><space> at the end of lines. When it works, I'll add the story to my memoir. :-)

There may be some more desirable way to achieve this, but finally I decided to
(1) filter the input (at create|update time) with regexp pattern substition to remove trailing ' ' (two consecutive space characters) from lines. I happen to be using ZendFramework's Zend\Filter\PregReplace but it's a de facto wrapper for preg_replace('/( {2,})(\R)/m',$2).
(2) Use CSS to make newlines act like <br> when I display these entities, e.g.,
#motd .card-body p { white-space: pre-line }
Seems to be working for me.

Related

Importing PHP string with quotes

So I'm importing ExpressionEngine fields into a php array. I want to display one field, called {gearboxx_body}, unless that field has more then 300 characters, in which case I want to display a field called {article_blurb}. I'm pretty sure there isn't a way to do this just in ExpressionEngine fields and conditionals, so I tried some PHP, which I'm just starting to learn:
<?php
$info = array('{gearboxx_body}','{article_blurb}');
if(mb_strlen($info[0]) <= 300)
echo($info[0]);
}
else {
echo($info[1]);
}
?>
So that works well, but there's a problem. If the tag includes any apostrophes or quote marks, it ends the string and the page won't load. So what can I do about this? I've tried to replace the quote marks in the string, but I have to have loaded the string from the fields first, and as soon as I do that the page is already broken.
Hopefully that made sense. Any suggestions?
I would recommend you handle this in an EE plugin rather than in the template:
Faster to render (because you don't need the overhead of PHP in the templates)
More secure and reliable
Faster to develop once you get the basics of EE development down which is a useful life skill
All around best-practice
The plugin I have in mind takes three parameters:
body, blurb and character limit.
Let's say you call your plugin "Blurby". In the template you would just have this:
{exp:blurby body="{gearboxx_body}" blurb="{article_blurb}" char_limit="300"}
It variably returns either of your fields based on the logic you define in the plugin itself.
See plugin developer documentation.
Alternatively you could use the dreaded HEREDOC syntax to set variables before passing them into your array:
$body = <<<EOT
{gearboxx_body}
EOT;
$blurb = <<<EOT
{article_blurb}
EOT;

PHP preg_replace markdown issue - detecting duplicates

In a project I am building I would like to use markdown as follows
*text* = <em>text</em>
**text** = <strong>text</strong>
***text*** = <strong><em>text</em><strong>
As those are the only three markdown formats I require, I would like to remain lightweight and avoid importing the entire PHP markdown library as that would introduce features I do not require and create issues.
So I have been trying to build some simple regex replaces. Using preg_replace I run:
'/(\*\*\*)(.*?)\1/' to '<strong><em>\2</em></strong>'
'/(\*\*)(.*?)\1/' to '<strong>\2</strong>'
'/(\*)(.*?)\1/' to '<em>\2</em>',
And this works great! em, bold, and the combo all work fine...
But if the user makes a mistake or enters to many stars, everything breaks.
i.e.
****hello**** = <strong><em><em>hello</em></strong></em>
*****hello***** = <strong><em><strong>hello</em></strong></strong>
******hello****** = <strong><em></em></strong>hello<strong><em></em></strong>
etc
When ideally it would create
****hello**** = *<strong><em>hello</em></strong>*
*****hello***** = **<strong><em>hello</em></strong>**
******hello****** = ***<strong><em>hello</em></strong>***
etc
Ignoring the un-required stars (so it would become clear to the user they made a mistake, and more importantly, the rendered HTML remains valid).
I presume there must be some way to modify my regex to do this but I cannot for the life of my work it out, even after a whole day trying!
I would also be happy with the result of
******hello****** = <strong><em>hello</em></strong>
So please, can anybody help me?
Also please consider uneven stars. In this case the below scenario would be ideal.
***hello* = **<em>hello</em>
And the time when a star should be part of the body and not detected, such as if a user inputs:
'terms and conditions may apply*'
or
'I give the film 5* out of 10'
Many many thanks
Try different capturing pattern (match anything except * one or more times),
'/(\*\*\*)([^*]+)\1/'

Changing/deleting html from file_get_contents

I'm currently using this code:
$blog= file_get_contents("http://powback.tumblr.com/post/" . $post);
echo $blog;
And it works. But tumblr has added a script that activates each time you enter a password-field. So my question is:
Can i remove certain parts with file_get_contents? Or just remove everything above the <html> tag? could i possibly kill a whole div so it wont load at all? And if so; how?
edit:
I managed to do it the simple way. By skipping 766 characters. The script now work as intended!
$blog= file_get_contents("powback.tumblr.com/post/"; . $post, NULL, NULL, 766);
After file_get_contents returns, you have in your hands a string. You can do anything you want to it, including cutting out parts of it.
There are two ways to actually do the cutting:
Using string functions like str_replace, preg_replace and others; the exact recipe depends on what you need to do. This approach is kind of frowned upon because you are working at the wrong level of abstraction, but in some cases it has an unmatched performance to time spent ratio.
Parsing the HTML into a DOM tree, modifying it appropriately (this time working at the appropriate level of abstraction) and then turn it back into a string and echo it. This can be more convenient to work with if your requirements are not dead simple and is easier to maintain, but it typically requires more code to be written.
If you want to do something that's most naturally expressed in HTML document terms ("cutting out this <div>") then don't be tempted and go with the second approach.
At that point, $blog is just a string, so you can use normal PHP functions to alter it. Look into these 2:
http://php.net/manual/en/function.str-replace.php
http://us2.php.net/manual/en/function.preg-replace.php
You can parse your output using simple html dom parser and display olythe contents thatyou really want to display

VIM: Show PHP function / class in command line?

Is there any way to show the current PHP function or class name in the VIM command line? I found a plugin for showing C function names in the status line but it does not work for PHP and in any case I prefer the command line be used to save valuable vertical lines.
Thanks.
EDIT
While looking for something completely unrelated in TagList's help I've just found these two functions:
Tlist_Get_Tagname_By_Line()
Tlist_Get_Tag_Prototype_By_Line()
Adding this in my statusbar works beautifully:
%{Tlist_Get_Tagname_By_Line()}
Also, did you read the Vim Wiki? It has a bunch of tips trying to adress the same need. There is also this (untested) plugin.
ENDEDIT
If you are short on vertical space maybe you won't mind using a bit of horizontal space?
TagList and TagBar both show a vertical list of the tags used in the current buffer (and other opened documents in TagList's case) that you can use to navigate your code.
However, I'm not particularly a fan of having all sorts of informations (list of files, VCS status, list of tags, list of buffers/tabs…) displayed at all times: being able to read the name of the function you are in is only useful when you actually need to know it, otherwise it's clutter. Vim's own [{ followed by <C-o> are enough for me.
I don't know anything about PHP, and I'm not trying to step on anyone's toes, but having looked at some PHP code I came up with this function which I think takes a simpler approach than the plugins that have been mentioned.
My assumpmtion is that PHP functions are declared using the syntax function MyFunction(){} and classes declared using class MyClass{} (possibly preceded by public). The following function searches backwards from the cursor position to find the most recently declared class or function (and sets startline). Then we search forward for the first {, and find the matching }, setting endline. If the starting cursor line is inbetween startline and endline, we return the startline text. Otherwise we return an empty string.
function! PHP_Cursor_Position()
let pos = getpos(".")
let curline = pos[1]
let win = winsaveview()
let decl = ""
let startline = search('^\s*\(public\)\=\s*\(function\|class\)\s*\w\+','cbW')
call search('{','cW')
sil exe "normal %"
let endline = line(".")
if curline >= startline && curline <= endline
let decl = getline(startline)
endif
call cursor(pos)
call winrestview(win)
return decl
endfunction
set statusline=%{PHP_Cursor_Position()}
Because it returns nothing when it is outside a function/class, it does not display erroneous code on the statusline, as the suggested plugin does.
Of course, I may well be oversimplifying the problem, in which case ignore me, but this seems like a sensible approach.

"Safe" markdown processor for PHP?

Is there a PHP implementation of markdown suitable for using in public comments?
Basically it should only allow a subset of the markdown syntax (bold, italic, links, block-quotes, code-blocks and lists), and strip out all inline HTML (or possibly escape it?)
I guess one option is to use the normal markdown parser, and run the output through an HTML sanitiser, but is there a better way of doing this..?
We're using PHP markdown Extra for the rest of the site, so we'd already have to use a secondary parser (the non-"Extra" version, since things like footnote support is unnecessary).. It also seems nicer parsing only the *bold* text and having everything escaped to <a href="etc">, than generating <b>bold</b> text and trying to strip the bits we don't want..
Also, on a related note, we're using the WMD control for the "main" site, but for comments, what other options are there? WMD's javascript preview is nice, but it would need the same "neutering" as the PHP markdown processor (it can't display images and so on, otherwise someone will submit and their working markdown will "break")
Currently my plan is to use the PHP-markdown -> HTML santiser method, and edit WMD to remove the image/heading syntax from showdown.js - but it seems like this has been done countless times before..
Basically:
Is there a "safe" markdown implementation in PHP?
Is there a HTML/javascript markdown editor which could have the same options easily disabled?
Update: I ended up simply running the markdown() output through HTML Purifier.
This way the Markdown rendering was separate from output sanitisation, which is much simpler (two mostly-unmodified code bases) more secure (you're not trying to do both rendering and sanitisation at once), and more flexible (you can have multiple sanitisation levels, say a more lax configuration for trusted content, and a much more stringent version for public comments)
PHP Markdown has a sanitizer option, but it doesn't appear to be advertised anywhere. Take a look at the top of the Markdown_Parser class in markdown.php (starts on line 191 in version 1.0.1m). We're interested in lines 209-211:
# Change to `true` to disallow markup or entities.
var $no_markup = false;
var $no_entities = false;
If you change those to true, markup and entities, respectively, should be escaped rather than inserted verbatim. There doesn't appear to be any built-in way to change those (e.g., via the constructor), but you can always add one:
function do_markdown($text, $safe=false) {
$parser = new Markdown_Parser;
if ($safe) {
$parser->no_markup = true;
$parser->no_entities = true;
}
return $parser->transform($text);
}
Note that the above function creates a new parser on every run rather than caching it like the provided Markdown function (lines 43-56) does, so it might be a bit on the slow side.
JavaScript Markdown Editor Hypothesis:
Use a JavaScript-driven Markdown Editor, e.g., based on showdown
Remove all icons and visual clues from the Toolbar for unwanted items
Set up a JavaScript filter to clean-up unwanted markup on submission
Test and harden all JavaScript changes and filters locally on your computer
Mirror those filters in the PHP submission script, to catch same on the server-side.
Remove all references to unwanted items from Help/Tutorials
I've created a Markdown editor in JavaScript, but it has enhanced features. That took a big chunk of time and SVN revisions. But I don't think it would be that tough to alter a Markdown editor to limit the HTML allowed.
How about running htmlspecialchars on the user entered input, before processing it through markdown? It should escape anything dangerous, but leave everything that markdown understands.
I'm trying to think of a case where this wouldn't work but can't think of anything off hand.

Categories