Database for Content - OK to store HTML?

Database for Content - OK to store HTML? - php

Basic question is - is it safe to store HTML in a database if I restrict who can submit to it?
I have a pretty simple question. I provide video tutorials and other content. Without spending months writing a proper BBCode parser, I would need to store the HTML so I can have it look exactly the way I want when I grab it from the database.
Basically I plan to store all information in the database about a tutorial series and each episode. I would like to have some formatting for the descriptions for both so I can add multiple paragraphs, ordered and unordered lists, links to required resources, and so on.
I am using PHP and creating my own database. I am using phpMyAdmin to store the information in the table right now. I will use a user with read only rights when I pull the information in the PHP code.
What is the best way to do this? Thank you!

Like others have pointed out there's nothing dangerous about storing HTML in the DB. But when you display it you need to know the HTML is safe. Seeing as you're the only one editing the HTML I see no problem.
However, I wouldn't store HTML at all. If all you need are headings, paragraphs, lists, links, images etc I'd say Markdown is a perfect fit. The benefit with Markdown is that it looks just like normal text (ie you could send your articles as e-mails or save them as txt-documents), it takes up a lot less space than HTML and you don't have to change it once HTML gets updated.
http://michelf.ca/projects/php-markdown/

From the security point of view it is not less secure to store your HTML in a database than storing it anywhere else - if you are the only author of that HTML. But then again if other people can author HTML in your website then it doesn't matter where you store it - only how you sanitize it and how and where you display it.
Now whether or not it is an efficient way to store HTML is a completely different matter. If I were you I would use some decent templating system and store HTML in files.

Storing HTML code is fine. But if it is not from trusted source, you need to check it and allow a secure subset of markup only. HTML Tidy library will help you with that.
Also, you need to count with a future change in website design, so do not use too much markup, only basic tags. To make it look like you want, use global CSS rules and semantically named classes in the markup.
But even better is to use Markdown or another wiki-like syntax. There are nice JS editors for Markdown with real-time preview (like the one here at Stackowerflow), and you can avoid HTML altogether.

My initial answer to "should I store html in a db" is generally no. Sure it's safe if you know what you're storing, but are you really considering best practices when you ask only that question? The true answer is "It depends".
I'm sure there are things like Wordpress that store html in a database, however, as a professional website designer, I like to remember the Separation of Concerns principle. How reusable is storing html in your database for a mobile app? Is your back end now in charge of display as well as data? Do you have many implementation possibilities for a front end or are you now stuck with whatever the back end portrays, what if you want it a different color and you've stacked ul within ul within ul? How easy is the css styling now? How easy is it to change or update that html?
I could be wrong, but even Sitecore and Kentico may store an html template in a database somewhere, but the data associated with that html template is a model, not directly on the html template.
So, when you are considering this question, you may want to store your models one place and your templates another, that way when you say "hey, lets build a mobile app" you can grab your data and go, rather than creating yet another table to store the same data.

I made a really big mistake by storing text data in Mongodb gridFS + compression and using mongodump for daily backup. GridFS is 1GB of textfiles but after backup memory usage rises sometimes 1GB daily after one month 20GB in memory due to how this backup is made.
In mongodb you should do a snapshot of the data folder - rather than do mongodump. The possible reason is that it copies unused data from disk into memory then makes bson dump. So in my case text that was never used for a long time should never be loaded into memory. I think this is how backup works as even right now my Mongodb is using 200MB of ram after run mongodump its can rise to 3GB
So i think the best solution is to use a filesystem for storing HTML files as your even RAID like PERC H700 has many amazing caching features including read ahead. But it has some limitations like network access and with my experiences some data was corrupted in time and needed to run chkdsk for repair as many GB of data was add or removed daily. Also you should consider to use proper raid features like Write trough that prevent data loss when power failure.
Sqlite is not designed to be used with extremely big data so you shouldn't not use it and has missing many caching features.
Not perfect solution is to use MariaDB or its own caching script in nodejs that can use memcached/Linux ramdisk with maybe 1GB of hot cache. Using an internal nodejs caching mechanism after some time can produce many memory leak. So i can use it for network connection and I/O are using filesystem lock and many "HOT" most used files can be programmed to cached in RAM or just leave as is

Related

Is it bad to use JSON files instead of real databases?

I have no access to create/edit any databases but due to the massive amount of content I need some sort of management system which is why I created my own. Here's how it works:
Each blog post has got its own .php file which loads static parts of the website like the header or the menu bar. But there are many category sites that display previews of the respective posts. It would be sooo annoying having to edit the same preview on 10 sites due to a misspelled word or so. That's why I store those previews (not the full content since there's no need to) in a JSON file.
Is that bad practice? Could this lead to long loading times if the number of previews rose? And could I prevent that by creating multiple JSON files?
Thanks for your advice!

If you are working on a small scale, then using JSON files is fine however it would definitely be beneficial to switch to a database management system whenever possible for storage when it comes to PHP (Or the majority of languages for that matter).
It can be considered bad practice if JSON is used to store large amounts of data or if a lot of data is stored in the same file in which case yes, using multiple JSON files rather than one large one is indeed more viable since the input stream when reading the file does not have to go over as much data.

How to use database with CDN to improve peformance

I have question something similar to this question, I am using jwplayer for playing my videos. I have saved my videos in CDN. Due to some requirements, I have to save my subtitle first in cdn and then save both video file url and associated subtitle urls [eng, chinese, japanese etc] in DB.
When I make a Ajax call to retrieve the data in my JS file from PHP file. It is taking more time and it is causing performance issue.
I was wondering if there is any DB option in CDN, so that instead of saving those detail in my db I can directly save this info (associated subtitles of one video file) in CDN. since retrieving from CDN is much faster it will surely improve the performance.

CDNs just bring static information closer to the users, caching that information in points-of-presence (PoPs) around the globe. It is mostly done by web-servers sitting within those PoPs. So whatever you can't retrieve by HTTP GET will likely be a problem. For example, legacy protocol RTMP (also video) is supported by legacy CDNs (Level3/Akamai/EdgeCast), but not by newly formed Cloudflare/Cloudfront and so on, because it requires adds-on to web-server and clutters workflows.
Technically, any static database can be stored in a file, and the file can be cached by a CDN. But then, again, it would be your code that takes care of db->file->db metamorphosis. Therefore, if something is static, you don't really want to use database for it (to be future/CDN-proof). Subtitles are just text files, so let them be files in asset folders. I appreciate that high level architecture might be beyond your control here (due to specific ingesting system for instance), but then the answer is that you won't be able to do what you try, and resulting performance will suffer.

If you have the bucks you can look at Continuent.
http://www.continuent.com/solutions/pricing-and-services

Scalable way to store files on server (PHP)?

I'm creating my first web application - a really simplistic online text editor.
What I need to do is find the best way to store text based files - a lot of them.
These text files can be past 10,000 words in size (text words not computer words.) in essence I want the text documents to be limitless in size.
I was thinking about storing the text files in my MySQL database - but thought there was a better way.
Instead I'm planing on storing the text files in XML based format in a directory on my server.
The rows in the database define the name of the xml based text file and the user who created the text along with basic metadata.
An ID is generated using a V4 GUID generator , which gives the text an id and stores the text in the "/store" directory on my server. The text definitions in my server contain this id, and the android app I'm developing gets the contents of the text file by retrieving the text definition and then downloading the text to the local device using the GUID in the text definition.
I just think this is a botch job? how can I improve this system?
There has been cases of GUID colliding.
I don't want this to happen. A "slim" possibility isn't good enough - I need to make sure there is absolutely no chance in a GUID collision.
I was planning on checking the database for texts that have the same id before storing the text with a particular id - I however believe with over 20,000 pieces of text in my database this would take an long time and produce unneeded stress on the server.
How can I make GUID safe?
What happens when a GUID collides?
The server backend is going to be written in PHP.

You've got several questions here, so I'll try to answer them all.
Is XML with GUID the best way to do this?
"Best" is usually subjective. This is certainly one way to do it, but you're probably adding unneeded overhead. If it's just text you're storing, why not put it in the SQL with varchar(MAX)?
Are GUID collisions possible?
Yes, but the chance of that happening is small. Ridiculously small. There are much bigger things to worry about.
How can I make GUIDs safe?
Stop worrying about them.
What happens when a GUID collides?
This depends on how you're using them. In this case, the old data stored in the location indicated by the GUID would probably be overwritten by the new data.

Well i dont know if id use a guid i would probably just use the auto_increment key on the db table and name the files like that because unless you have deleted records from the db without cleaning up the filesystem they will always be unique. I dont know if the GUID is a requirement on the android side though.

There's nothing wrong with using MySQL to store the documents!
What is storing them in XML going to provide you with? Adding an additional format layer will only increase the processing time when they are to be read and formatted.
Placing them as files on disk would be no different than storing them in an RDBMS and in the longer-term probably cause you further issues down the line. (File access, disk-seek, locking, race conditions come to mind).

Easiest and fastest way to template, possibly in a PDF

I have been looking extensively for a simple solution to a not-very-complicated problem.
I have a great deal of data in a sql database which needs to be printed (for example, each entry would have name, address, phone number, etc).
The vast majority of the data on the eventual printed page is static- there would only need to be a small handful of fields that need to be 'variables' in the 'template'. Quite beneficially the areas that the variable data would be dropped into are themselves in both location orientation and dimensions fixed-- so there need be no adjustments to spacing for the other static/redundant data on the page.
I would like to have some form of 'accounting' in the sense that, since the amount of pages printed are going to be on the order of the tens of thousands, I would like to know which sql entries have been printed thus far.
I would not like to 'reinvent the wheel' and write a php front end which loops through arrays and deposits the sql data onto the right place on the page before or after it is rendered as pdf...
I would prefer to print directly from the server (*nix), and would be very enthusiastic if there is a way to do this without actually having to render tens of thousands of individual pdfs. With todays open source software packages, which route is the best to take?
(so far, it is looking like if there isn't a simple way, I am going to need to learn LaTeX, Cheetah, and some python)

Dabo's report writer is a banded reporting engine like Crystal, which takes as input a set of data (output of cur.fetchall(), for example) and a report template (xml string or file), and outputs a PDF or set of PDF's (it can output a stream of bytes instead of writing to a file directly, if desired).
Dabo's main purpose is a desktop-application framework on top of wxPython, but the reporting can be done on the web with no desktop interaction. Though it does help to design the reports using the desktop though using the included report designer.
http://dabodev.com
There will be some installation hurdles and a learning curve, but you'll find this to be an easy task once you are ramped up.

Linking an image to a PHP file

Here's a bit of history first: Recently finished an application that allows me to upload images and store them in a directory, it also stores the information of that file in a database. Database stores the location, name and gives it an ID (auto_increment).
Okay, so what I'm doing now is allowing people to insert images into posts. Throwing a few ideas around on the best way to do this, as the application I designed allows people to move files around, and I don't want images in posts to break if an image is moved to a different directory (hence the storing of IDs).
What I'm thinking of doing is when linking to images, instead of linking to the file directly, I link it like so:
<img src="/path/to/functions.php?method=media&id=<IMG_ID_HERE>" alt="" />
So it takes the ID, searches the database, then from there determines the mime type and what not, then spits out the image.
So really, my question is: Is this the most efficient way?
Note that on a single page there could be from 3 to 30 images, all making a call to this function.

Doing that should be fine as long as you are aware of your memory limitations configured by both PHP and the web server. (Though you'll run into those problems merely by receiving the file first)
Otherwise, if you're strict about this being just for images, it could prove more efficient to go with Mike B's approach. Design a static area and just drop the images off in there, and record those locations in the records for their associated post. It's less work, and less to worry about... and I'm willing to bet your web server is better at serving files than most developer's custom application code will be.

Normally, I would recommend keeping the src of an image static (instead of a php script). But if you're allowing users to move them around the filesystem you need a way to track them
Some form of caching would help reduce the number of database calls required to fetch the filesystem location of each image. Should be pretty easy to put an indefinite TTL on the cache and invalidate upon the image being moved.

I don't think you should worry about that, what you have planned sounds fine.
But if you want to go out of your way to minimise requests or whatever, you could instead do the following: when someone embeds an image in a post, replace the anchor tag with some special character sequence, like [MYIMAGE=1234] or something. Then when a page with one or more posts is viewed, search through all the posts to find all the [MYIMAGE=] sequences, query the database to get all of the images' locations, and then output the posts with the [MYIMAGE=] sequences replaced with the appropriate anchor tags. You might or might not want to make sure users cannot directly add [MYIMAGE=] tags to their submitted content.

The way you have suggested will work, and it's arguably the nicest solution, but I should warn you that I've tried something similar before and it completely fell apart under load. The database seemed to be keeping up, but the script would start to time out and the image wouldn't arrive. That was probably down to some particular server configuration, but it's worth bearing in mind.
Depending on how much access you have to the server it's running on, you could just create a symlink whenever the user moves a file. It's a little messy but it'll be fast and reliable, and will also handle collisions if a user moves a file to where another one used to be.

Use the format proposed by Hammerite, and use [MYIMAGE=1234] tags (or something similar).
You can then fetch the id-path mappings before display, and replace the [MYIMAGE] tags with proper tags which link to images directly. This will yield much better performance than outputting images using php.
You could even bypass the database completely, and simply use image paths like (for example) /images/hash(IMAGEID).jpg.
(If there are different file formats, use [MYIMAGE=1234.png], so you can append png/jpg/whatever without a database call)
If the need arises to change the image locations, output method, or anything else, you only need to change the method where [MYIMAGE] tags are converted to full file paths.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.