I need to design a website which has friendly URLs configurable by some specific users. They must be in a database to manage them (which user created it, which module and data must load, etc.), and they must have some permissions system (view, actions, etc.).
The question is: It would have better performance if I created a php file for each path (like /section/subsection/index.php) to be loaded directly by Apache, it would be better if I check every query in the database, or it depends of the kind of page?
There would be 3 kind of pages:
- Mostly static (once created won't need to connect with database)
- Periodically updated (I can delete that pages when they are not updated)
- Mostly dynamic (like load user events, which would require to perform database queries)
Is there any existing brenchmark about this?
Simple answer: Do what is easier to code. The performance difference is too small to matter.
I find it easier for static pages to be files. And constructed pages built by PHP. Sometimes I have 1 php file building 1 page. Sometimes I have 1 php file building many pages, usually minor variants on one page. But then, I have to pass arguments to say which variant I am building.
Without a more substantial description of your setup, the number of entries in each section/subsection, the kind of load, etc., noone can answer this question for you.
And really, except for some pathological use cases, noone should. You should answer your question with benchmark data from your implementation.
The only clear drawback I can see in the solution you describe with PHP files is code duplication, but that would be easily managed if you have some automated creation of the files.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've took a look at the PHP script behind my father website which has been built by a hired programmer. Now, I'm not thinking that I'm better than him, but I think its technique might not be the best.
The website has dynamic page body, in the meaning that my dad can, via a specific admin page, modify the HTML content of most of the webpages in the website. Right now it's made via database: the pages are all stored in the database and every request deals with a query that fetches the page from the database and implement it.
Now, I think this way is very bad mostly because it requires (even if not that expensive if cached) an additional query to the database.
Wouldn't it be more efficient to store the pages as HTML files and then just modify the file itself when required? In this way the editing of the file, I think, is faster, and the loading of the content of an html file per request is a lot easier and faster than perform a query.
Is it? Is there any other (more efficient) way to handling this situation?
this is the question of the century :) there is no exact answer to this question. just performance tips. people are working to optimize page load times during recent 30 years.
No it isnt better to have fixed HTML pages under an hypothetical '/mypages' folder.
What if the user wants about 500 contents in his webpage? he'll end up with 500 files.
Yeah sure, they'll be served faster but is that enough against the massive problems below?
What about page translation? this would be nightmare in static html files.
Pages are rendered that way because they're dynamic; that is, stuff can be "entered" by third parties/plugins (say) and applied into multiple contents at once; what about applying the same stuff into numerous HTMLs and then changing it again?
What about if you want to change the < HEAD>*er and *< SCRIPT>**s loaded? you'll be forced to do that in all 500 contents in every change.
What about the PHP included in those .html files? This is not a reason if you're not putting PHP into this but in case of an included php file renamed/removed you'll need to change all files in a massive update.
Think of templates; the reason the modern CMS (or admin pages) are dynamic today is because they can change classes/styles etc. without affecting content itself. A single change to theme used or a single class would cause (again) a massive update.
Database is files too, but run faster. If you worry about performance you can program the database to use caching (queries like SELECT data FROM content WHERE id=1) so the query is almost no-query in performance terms.
I can think of more.
There are several good reasons why a CMS should use a Database to store/fetch the dynamic content. Just as there are several reasons why you might prefer not to rely on a DB.
Pro Db:
Security: It's an obvious, and slightly ambivalent argument, but nonetheless. If you decide to store your content as separate files on your server, they'll need to be stored in a directory that doesn't allow public access. If not, users might be able to access the chunks of your site separatly, which comes across as unprofessional. People with ignoble intentions will have an easy time altering your site's content, too. Of course, there are many ways to prevent this, and increase overall security. Database systems, when left to their own devices, aren't exactly safe either, but provide an extra obstacle to hackers with minimal effort.note: The security argument stands, or falls with how well your script filters out injection, and how secure you set up your server.
Disk usage. When using separate files to compose each requested page, The server has to access its HD on each request. Again, caching solves this issue to some extend, but it's easier and (in general) better to cache DB query results (performance wise). Either on your Database server, in PHP, or, better still, both.
Logging. By this I mean: when you alter the content, a database driven CMS is a lot easier to manage. If you altered the content, and want to undo/rollback the changes, a DB is the easiest way to implement such a feature. Using HTML, you'll soon find yourself wading through tons of files called site_menu_block_YYYY-mm-dd.html.backup. Even if this is done by a script, it'll almost certainly be slower than using a DB.
Translation: as vlzvl pointed out, if you're using static pages, you'll either end up with each page N times, once for each language. When altering the stylesheets, you'll then have to alter N files, too. Which is resource expensive. Alternatively, your scripts will parse an HTML template file for each request, and an XML file with the actual contents. This way you loose the SEO benefit of the HTML files, and cause extra server load and slow down your site.
Pro HTML:
I can only give 1 solid pro argument here: it's a lot easier to get an SEO site this way. Just allow search engines to index the separate files. This does decrease the overall security of your CMS drastically .
That said, I think I'm right in saying that all major CMS's use both methods, depending on what type of data they're dealing with. HTML headers, for example, are often partially stored as separate files, just like JS files and style-sheets.
This all depends on the content. When you have much different content, like news it's easier to store the data inside a database for each newsentry and load the data into a template. But when you have single content (like a huge article or info page) you can use a HTML to store the data.
It also depends on if you want a multi-language page or not. You can surly create a multi-linugal page with only HTMLs. But here's the same as above. What content do you have. Much different entries or less same content?
But, what I have done so far was I did both at the same time: When a client wanted a page with news script and multi-language and so on I created the page that the user can log in for news entries and store the news in different languages, but changes to other sites are made via HTML and there was for every language a different html file.
EDIT:
It depends on the user, too. If the user don't know how to use HTML but wants to change the site hisself than the only available option is the option to give him an admin center to make changes. OR if you don't want give to much power to the user :D
My opinion: Most CMS handle lots of DB accesses and load many files in order to compile one page that is eventually sent to a visitor. Still, for most sites there's no performance problem (i.e. most pages loading in < 1s). So I'd be surprised if you have a problem there, with just one single DB access.
But for handling, why use a database, when your site doesn't really need it?
I made a couple of sites for clients where I use one index.php that loads one menu file and one footer file that are the same for all pages, and inbetween it loads the individual selected html files. So in order to edit a page, you use any editor to open and edit the corresponding html. Very simple.
Currently i am wondering whether or not to use a MySQL DB to provide content on my website.
An example of what i mean is based loosely here: http://www.asual.com/jquery/address/samples/
find the sample called SEO. Alternatively Click Here
Anyone with HTML5 able browsers will notice the URL is 'pretty' and is what you'd expect to find on any standard website.
Anyone with IE8 or a browser which isnt 'Webkit' enabled, will see the use of the Hashbang (#!) in order for SEO.
The problem is this: the content is pulled from a MySQL DB.. I have approx 30 pages (some are PACKED with content) And im wondering if all this tedious modification of my website is necessary?
I use jQuery MySQL and PHP through a single page interface so my content is not indexable at all. What are your views?
Help me!!
PS. would it be easier to provide PHP Includes in my DB content to fetch pages without having to upload all my pages into my DB?
your question is made up of a lot of questions. :)
to mysql or not to mysql: most of the PHP-usng web world is using mysql as a database to store content. i don't see much of a problem there. 30 pages is peanuts.
jquery and php for a single page interface indexable: depends on the search engine. i've read somewhere (too lazy to look things up) that google uses a javascript enabled crawler. not sure if they use it in production already.
PHP includes in DB content: textpattern uses this approach. your worry is a problem of scale.
if your PHP code can serve pages properly, it wouldn't matter where it pulls content from. DB or filesystem wouldn't matter at this point.
just do it.
There is no such question.
Mysql is okay.
Its general purpose solution for storing site data, everyone using it with not a single problem and even Wikipedia is happy with it.
Mysql is irrelevant to any problems of your site.
Your problem is somewhere else but you forgot to state it. Let me suggest you to ask another question, pointing to the real problem you have, not some guess you made of it's reasons.
Well, if you can avoid it, avoir storing pages inside MySQL, unless you want to give the administrator the possibility to edit the pages.
Aside from that, there is no problem in storing pages in a DB, would it be MySQL or others. A lot of CMS do it (Drupal, Joomla, etc.).
You might encounter some performance issues on your DB server if your traffic becomes high, but this is another problem.
In my tests and comparison, mysql connectivity and queries do slow down responses. If your site is simple and you are only doing updates yourself, then using a template engine and storing content in a files is not a bad choice.
If you decide to put it into SQL, then eventually you might need to build a cache. Hopefully nginx and not the php cache, so it shouldn't be a problem too.
The deciding factor is how you are willing to edit the content. I found that myself and my team is much more comfortable with editing html files through notepad++, Vim or Coda. If content is inside a database you get a poorly-performing (compared to desktop app) WYSIWYG editor.
Always use SQL the content is generated by your users. And do use some lightweight CMS.
I am using the one bundled with Agile Toolkit myself and templates look like this:
https://github.com/atk4/atk4-web/tree/master/templates/jui
would it be easier to provide PHP Includes in my DB content
I think you'll find your site far easier to maintain for years IF you keep a very clear separation of duties: data goes in a database, presentation and code go in files.
While there is some contention whether it is a good idea to store templates in a database, my gut feeling says that you should avoid that temptation unless you have a very good reason.
But storing code (your PHP include statements) in the database is almost certainly not the best way forward.
Me and a colleague were discussing the best way to build a website last week. We both have different ideas about how to store content on the website. The way I have always approached this has been to store any sort of text or image link (not image file) on to a database. This way, if I needed to change a letter or a sentance I would just need to go on the database. I wouldn't have to touch the actual web page itself.
My colleague agreed with this to a point. He thinks that there are performance issues related to retrieving content from the database, especially if every character of content is coming from the database. When he builds a website, any content that won't be changed often (if at all) will be hard coded on to the page, and any content that would be changed or added regulary would come from the database.
I can't see the benefit of doing it like this, just from the perspective of everytime we make a change to an ASPX page we need to re-compile the site to upload it. So if a page has a misspelt "The" (so it'd be like "Teh") on one page, we have to change it on the page and then recompile the site (the whole site) and then upload it.
Likewise with my colleague, he thinks that if everything was to come from the database there would be performance issues with the site and the database, and that the overall loading speed of the web page to the browser would decrease.
What we were both left wondering was that if a website drew everything from the database (not HTML code as such, more like content for the headers, footers, links etc) would it slow down the website? And as well as this, if there is a performance issue, what would be better? A 100% database driven website with it's performance issues, or a website that contains hard coded content which would mean 10/20 minutes spent compiling and uploading a website just for the sake of a one word or letter change?
I'm interested to see if anyone else has heard of it, or if they have their own thoughts on this subject?
Cheers
Naturally it's a bit slower to retrieve information from a database rather than directly from the file system. But do you really care? If you design your application correctly then
a) you can implement caching so that the database is not hit for every page
b) the performance difference will be tiny anyway, particularly compared to the time to transmit the page from the server to the client
A 100% database approach opens up the potential for more flexibility and features in your application.
This is a classic case of putting caching / performance considerations before features / usability. Bottlenecks rarely occur where or when you expect them to - so focus on developing a powerful application and then implement caching later - when it's needed and where it's needed.
I'm not suggesting storing templates as static files is a bad idea - just that performance shouldn't be your primary driver in making these assessments. Static templates may be more secure or easier to edit using your development tools for example.
Hardcode the strings in the code (unless you plan to support multiple languages).
It is not worth the
extra code required for maintaining the strings
the added complexity
and possibly performance penalty
Would you extract the string "Cancel" from a button?
If so, would you be using the same string on multiple cancel buttons? Or one for each?
IF you decided to rename one button to "Cancel registration", how do you identify which "Cancel" to update in the database? You would be forced to set up a working process around how to deal with this, and in my opinion it's just not worth it.
This is a general programming question.
What is the best way to make a light blogging system that can handle images, bbcode-ish styling and text without a database back end? Light means not more than 50 to 100 posts in extreme cases.
What language(s) should be used? Is there any preferred data format for the information? How does security play out?
EDIT: Client has no database, is on a shared server. Can't change that. Therefore, no DB.
EDIT2:
Someone mentioned SQL Compact - does that require anything more than copying files to the server? The key here is again that things shouldn't require any more permissions than FTP Acess.
If you're looking to do it yourself; store each post as a file in a directory. Then to sort and limit the posts you rely partially on the file names to order and limit them, and potentially (in the case of a search) on reading every last file. Don't go letting users make 10,000 posts though. But yeah, the above is considered a flat file data format. You can get fancy by using a standard format like JSON, Yaml, or XML within each post file, and even fancier by requesting these with Ajax calls in mostly client side code.
Now if the reason you want to work with flat files is that you just don't want to install a database server, there's nothing stopping you from reading a local (to the server) file as a berkley DB, a Lucene Index, or an SQLite DB from within your webapp using the appropriate client library. You'll find any of these approaches a little more sane (a bit faster, a bit more readable in code) than the afore-mentioned with all the same requirements for installing on the server (read-write file permissions). Many web frameworks or languages (like php) come with the option of an API to these client libraries; SQLite, and Lucy (C Lucene) particularly.
If you're just looking for examples of it being done, I first (I think 1999 or 2000) came across blosxom which is a perl script that either runs as a cgi script per request or as a cron job. It builds a dated index of "posts" based on whatever you throw into the directory it's meant to scan. It also builds an RSS feed.
Jekyll or Blogofile are my favorite kind of solution for that, "compiling pages before upload".
I'm going to go out on a limb here and say that it's not always the destination, but the Journey.
If you're going to set out to do this, I recommend using a language you are comfortable. Personally, this would be C#/.net for me, but from your tagging, I'll assume PHP would be the Serverside scripting language you would choose.
I would layout how I wanted my application to behave. If there is going to be a lot of data, you should consider (as dlamblin mentioned) an DB of some sort for lookup and retrieval. (Light Blog, not so much data... 1000 users can edit? maybe you should consider a DB.) Once you've decided how to store the data, decide how to present it.
Write some proof of concept code for each of the features you want to implement (blog templating, bbcode, user authentication, text searching...) and start to work them all together.
search for flat-file cms-es on google, for example:
http://www.flatcms.org/
this has been already done, so there is no need to create such CMS again. there are plenty of them.
I concur with dusoft that this has already been done.
DotNetBlogEngine.net is an ASP.NET (C#) based blogging system that has a nice XML back-end as an option.
Doesn't answer your question directly but check Unify.
If you do not want to write a new one or want to get some inspiration:
Flatpress
Simple PHP Blog
Ninja Designs are working on a db-free wordpress clone
You could either use XML, or use SQL compact (which allows for handling things just like SQL Server, but instead of a database you utilize flat files).
I want to implement a two-pass cache system:
The first pass generates a PHP file, with all of the common stuff (e.g. news items), hardcoded. The database then has a cache table to link these with the pages (eg "index.php page=1 style=default"), the database also stores an uptodate field, which if false causes the first pass to rerun the next time the page is viewed.
The second pass fills in the minor details, such as how long ago something(?) was, and mutable items like "You are logged in as...".
However I'm not sure on a efficient implementation, that supports both cached and non-cached (e.g., search) pages, without a lot of code and several queries.
Right now each time the page is loaded the PHP script is run regenerating the page. For pages like search this is fine, because most searches are different, but for other pages such as the index this is virtually the same for each hit, yet generates a large number of queries and is quite a long script.
The problem is some parts of the page do change on a per-user basis, such as the "You are logged in as..." section, so simply saving the generated pages would still result in 10,000's of nearly identical pages.
The main concern is with reducing the load on the server, since I'm on shared hosting and at this point can't afford to upgrade, but the site is using a sizeable portion of the servers CPU + putting a fair load on the MySQL server.
So basically minimising how much has to be done for each page request, and not regenerating stuff like the news items on the index all the time seems a good start, compared to say search which is a far less static page.
I actually considered hard coding the news items as plain HTML, but then that means maintaining them in several places (since they may be used for searches and the comments are on a page dedicated to that news item (i.e. news.php), etc).
I second Ken's rec of PEAR's Cache_Lite library, you can use it to easily cache either parts of pages or entire pages.
If you're running your own server(s), I'd strongly recommend memcached instead. It's much faster since it runs entirely in memory and is used extensively by a lot of high-volume sites. It's a very easy, stable, trouble-free daemon to run. In terms of your PHP code, you'd use it much the same way as Cache_Lite, to cache various page sections or full pages (or other arbitrary blobs of data), and it's very easy to use since PHP has a memcache interface built in.
For super high-traffic full-page caching, take a look at doing Varnish or Squid as a caching reverse proxy server. (Pages that get served by Varnish are going to come out easily 100x faster than anything that hits the PHP interpreter.)
Keep in mind with caching, you really only need to cache things that are being frequently accessed. Sometimes it can be a trap to develop a really sophisticated caching strategy when you don't really need it. For a page like your home page that's getting hit several times a second, you definitely want to optimize it for speed; for a page that gets maybe a few hits an hour, like a month-old blog post, it's a bad idea to cache it, you only waste your time and make things more complicated and bug-prone.
I recommend to don't reinvent the wheel... there are some template engines that support caching, like Smarty
For server side caching use something like Cache_Lite (and let someone else worry about file locking, expiry dates, file corruption)
You want to save the results to a file and use logic like this to pull them back out:
if filename exists
include filename
else
generate results
render to html (as string)
write to file
output string or include file
endif
To be clear, you don't need two passes because you can save parts of the page and leave the rest dynamic.
As always with this type of question, my response is:
Why do you need the caching?
Is your application consuming too much IO on your database?
What metrics have you run?
Your are talking about adding an extra level of complexity to your app so you need to be very sure that you actually need it.
You might actually benefit from using the built-in MySQL query cache, if the database is the contention point in your system. The other option is too use Memcache.
I would recommend using existing caching mechanism. Depending on what you really need, You might be looking for APC, memcached, various template caching libs... It easier/faster to tune written/tested code to please your need than to write everything from scratch. (usually, although there might be situations when you don't have a choisce)