How to store Markdown comments - php

I want to use Markdown for my website's commenting system but I have stumbled upon the following problem: What should I store in the database - the original comment in Markdown, the parsed comment in HTML, or both?
I need the HTML version for viewing and the Markdown version if the user needs to edit his comment. If I store the Markdown version, I have to parse the comments at runtime. If I store the HTML version, I need to convert the comment back to Markdown when the user needs to edit it (I found Markdownify for this but it isn't flawless). If I store both versions, I'm doubling the used space.
What would be the best option? Also, how does Stack Overflow handle this?

Store both. It goes against the rules for database normalization, but I think it's worth it for the speed optimisation in this case - parsing large amounts of text is a very slow operation.
You only need to store it twice, but you might need to serve it thousands of times, so it's a space-time trade-off.

Store the original markdown and parse at runtime. There are a few problems with storing the converted version in the database.
If user wants to edit their comment, you have to backwards convert parsed into original markdown
Space in database (always go by the rule that if you don't need to store it, don't)
Changes made to markdown parser would have to be run on every comment in the database, instead of just showing up at runtime.

Just render the Markdown to HTML at runtime.
If your site runs into performance issues, the Markdown will be one of the last things you'll look into tweaking. And even then, I doubt it'll make sense.
Just take a look at the realtime JavaScript renderer that SO uses. It's fast.
Edit:
Sorry, I should've been more clear. I meant just render in PHP. You'll save yourself a lot of headache -- and you probably have more important things to worry about.

Related

Clean HTML from Word documents

Ok, so my company has a client that has an interface for posting content - standard MySQL database, PHP-based, etc.
Anyway, they've continually had an intern or someone, post content to this interface straight from an MS Word doc - the interface is coded poorly, and takes this input as is, with no formatting.
My company has now been contracted out to fix this particular problem, as it is continually breaking their site, and my company has repeatedly had to manually go into the database, and delete the offending values.
Is there a quick and easy way to do this, or am I going to have to just do a replace operation on each offending character?
I see htmlentities() may be a partial solution - but as far as I know, that won't remove everything.
What's a good solution to this problem? Is there anything out there to make this easier?
We're also considering writing a content validator as well, probably just server-side (though maybe client-side, if my week is going slowly enough/I finish the rest of this quickly enough).
It depends on how many clients (or potential clients) you are supporting and how much time you have to invest. Options
Write your own function to strip out the metadata
Teach your clients to remove it themselves such as paste in notepad first,
or supply a knowledge base article to explain how to do it in the software. Perhaps a "Help" section or icon they can click on.
htttp://support.microsoft.com/default.aspx?scid=kb;en-us;223396
Use a WYSIWYG editor such as TinyMCE which has built in functionality to remove it
But like I said in the comments, unless you are using your own function, prepare for clients to continue to paste directly and wonder why there is a problem.

Is it possible to prevent standard HTML comments from showing up in source code?

I'm going to assume the answer is 'no' here, but since I haven't actually found that answer, I'm asking.
Quite basically, all I want to do is leave some HTML commenting in my files for 'author eyes only', simply to make editing the file later a much more pleasant experience.
<!-- Doing it like this --> leaves nice clean comments but they show up when viewing the page source after output.
I am using PHP, so technically I could <?PHP /* wrap comments in PHP tags */ ?> which would prevent them from being output at all, but if possible I'd like to avoid all of the extra arbitrary PHP tagging that would be needed for commenting throughout the file. After all, the point of commenting is to make the document feel less cluttered and more organized.
Is there anything else I could try or are these my best options?
No, anything in html will show up.
You could, have a script that parses the code, and removes the comments, before it puts it up on the server, and then you would have the original, and the uncommented source.
A tool to accomplish this:
http://code.google.com/p/htmlcompressor/
I guess these are your best options, yes, unless you run the entire HTML output through some sort of cleanup module before being sent to the client.
Anything not wrapped in server side syntax will will be output to the client if not modified on its way out (through template engines, for example). This goes for most (probably all) server side languages).
You could definitely write a parser that uses regex to strip out HTML comments, but unless you're already dealing with a roll-your-own CMS, most likely the work involved in this far outweighs the benefits of not using PHP comments as you suggested.

Should I use strip tags on the fly or store on server already stripped

I am using WMD like stackoverflow uses. I have read the upvoted posts on the topic which all advise to store both the markdown and the html_purified versions on the server. The reason for this is the markdown is needed for the user to edit his post, and the to purify on the fly is too expensive for the server so better to convert it and store it on the server.
https://stackoverflow.com/a/519322/1240324
However, when I show on the index page a bit of every question, then I dont want the markdown or the html version. I would like to call strip_tags on every question excerpt. But would this be heavy for the server to do it on the fly? Should I store a third field for a stripped tagged version?
strip_tags() isnt particularly expensive. I wouldn't worry about it.

How should I store textual data that won't change very often?

As an exercise in web design and development, I am building my website from the ground up, using PHP, MySQL, JavaScript and no frameworks. So far, I've been following a model-view-controller design. However, there is one hurdle that I am quickly approaching that I'm not sure how I'm going to solve, but I'm sure it's been addressed before with varying degrees of success.
On my website, I'm going to have a resume and an "about me" bio section. These probably won't be changing very often.
For my resume, I think that XML that can be rendered into HTML (or any other format) is the best option, and in that case, I could even build a "resume manager" using PHP that can edit the underlying XML. A resume also seems like it could be built on top of MySQL, as well, and generated into XML or HTML or whatever output format I choose.
However, I'm not sure how to store my about me/bio. My initial idea was a plain text document that can be read it, parsed, and the line breaks converted to paragraphs. However, I'm not sold on that being the best idea. My other idea was using MySQL, but I think that might be overkill for a single page. What I do know, however
What techniques have you used when storing text for a page that will not change very often? How did they work out for you - what problems or successes did you have?
Like McWafflestix said, use HTML, if you want to output HTML. Simplest case within PHP:
<?php
create_header_stuff();
include('static_about.html');
create_footer_stuff();
?>
and in static_about.html something like
<div id="about">
...
</div>
Cheers,
Just use a static page, if the information won't change very often. Just using static HTML gives you more control over the display format.
Generally treating infrequently changing information the same as frequently changing information works well if you add one other component: caching.
Whatever solution you decide on for the back end, store the output in a cache and then check to see if the data has changed. Version numbers or modified dates work well here. If it hasn't changed, just give the cached data. If it has changed then you rebuild the content, cache it and display.
As far as structure goes, I tend to use text blobs in a database if there is any risk that there will be more dynamic databases. XML is a great protocol for communicating between services and as an intermediate step, but I tend to use a database under all my projects because eventually I end up using it for other things anyway.

Best way to handle LARGE strings of text going into a database?

I have built a number of solutions in the past in which people enter data via a webform, validation checks are applied, regex in some cases and everything gets stored in a database. This data is then used to drive output on other pages.
I have a special case here where a user wants to copy/paste HUGE amounts of text (multiple paragraphs with various headers, links and etc throughout) -- what is the best way to handle this before it goes into a database to provide the best output when it needs to come back out?
So far the best I have come up with is sticking all the output from these fields in PRE tags and using regex to add links where appropriate. I have a database put together with a list of special keywords that need to be bold or have other styles applied to them which works fine. So I can make this work using these approaches but it just seems to me that there is probably a much more graceful way of doing it.
Nicholas
There are a lot of ways you could format the text for output. You could simply use pre tags as you mentioned (if you are worried about wrapping, the CSS white-space property does also support the pre-wrap value, but browser support for this is currently sketchy at best).
There are also a large number of markup languages you could use for more advanced formatting options (some of which are listed here). Stack Overflow itself uses Markdown, which I personally enjoy using very much.
However, as the data is being pasted from another source, a markup language may interfere with the formatting of the text - in which case you could roll your own solution, perhaps using regular expressions and functions like htmlentities and nl2br.
Whatever you decide, I would recommend storing the input in its original form in the database so you can retroactively amend your formatting routines at any time.
If you're expecting a good deal of formatting, you should probably go with a WYSIWYG editor. These editors produce word-like toolbars which product (hopefully) valid (x)html-markup which can be directly stored into a text field in your database. Here are a couple examples:
FCKeditor - Massive amount of options/tools
Tinymce - A nice alternative.
Markdown - What stackoverflow.com uses
Both FCKeditor and Tinymce have been thoroughly tested and have proven to be reliable. I don't have any experience with markdown but it seems solid.
I've always hated 'forum' formatting tags like [code], [link], etc. Stackoverflow and others have shown that providing an open wysisyg editor is safe, reliable, and very easy to use. Just take the output it gives you, run it through some sort of escape function to check for any kind of injection, xss, etc and store in a text field.

Categories