Im working on Blog based Website having more than 50k posts. I need suggestions to increase the website speed.
I have two options
1: I can pick up the post data from the mysql database and display it using php
2: Static Webpage for each post (Using DOM parser i can Update the Post Contents)
which one is fast database or File System ? or any other suggestions to speedup my website.Im using go daddy shared hosting.
I would suggest:
a pagination for the site.
implement coding style: fetch-what-you-only-need from the database
run some load tests on where on your site needs improving.
Sorry, looked up godaddy and they do not allow memcached :(
Use database and implement memcached to cache recently shown pages.
Even with 50 K posts I imagine that most fetches are for a small subset of posts for a specific time period, usually recent posts.
If this is the case a memcache solution would beat any disk based storage.
Automatically generating static pages for posts for often retrieved posts is another way.
But base storage in a database is the easiest.
You can't get reliable performance on shared hosting, so just go with what's easiest to work with. Today you may get fast access to the file system, but tomorrow they relocate your app to another silo and the database becomes faster. It's a lot easier to extend a database to add new features, so I'd go with that.
But if you really care about performance you have to make tests to measure it.
you can use page caching for whole page
query caching from caching results of database queries
using file system will give only give trouble in update ,delete ,insert etc..
Related
I am running application (build on PHP & MySql) on VPS. I have article table which have millions of records in it. Whenever user login i am displaying last 50 records for each section.
So every-time use login or refresh page it is executing sql query to get those records. now there are lots of users on website due to that my page speed has dropped significantly.
I done some research on caching and found that i can read mysql data based on section, no. articles e.g (section - 1 and no. of articles - 50). store it in disk file cache/md5(section no.).
then in future when i get request for that section just get the data from cache/md5(section no).
Above solution looks great. But before i go ahead i really would like to clarify few below doubts from experts .
Will it really speed up my application (i know disk io faster than mysql query but dont know how much..)
i am currently using pagination on my page like display first 5 articles and when user click on "display more" then display next 5 articles etc... this can be easily don in mysql query. I have no idea how i should do it in if i store all records(50) in cache file. If someone could share some info that would be great.
any alternative solution if you believe above will not work.
Any opensource application if you know. (PHP)
Thank you in advance
Regards,
Raj
I ran into the same issue where every page load results in 2+ queries being run. Thankfully they're very similar queries being run over and over so caching (like your situation) is very helpful.
You have a couple options:
offload the database to a separate VPS on the same network to scale it up and down as needed
cache the data from each query and try to retrieve from the cache before hitting the database
In the end we chose both, installing Memecached and its php extension for query caching purposes. Memecached is a key-value store (much like PHP's associative array) with a set expiration time measured in seconds for each value stored. Since it stores everything in RAM, the tradeoff for volatile cache data is extremely fast read/write times, much better than the filesystem.
Our implementation was basically to run every query through a filter; if it's a select statement, cache it by setting the memecached key to "namespace_[md5 of query]" and the value to a serialized version of an array with all resulting rows. Caching for 120 seconds (3 minutes) should be more than enough to help with the server load.
If Memecached isn't a viable solution, store all 50 articles for each section as an RSS feed. You can pull all articles at once, grabbing the content of each article with SimpleXML and wrapping it in your site's article template HTML, as per the site design. Once the data is there, use CSS styling to only display X articles, using JavaScript for pagination.
Since two processes modifying the same file at the same time would be a bad idea, have adding a new story to a section trigger an event, which would add the story to a message queue. That message queue would be processed by a worker which does two consecutive things, also using SimpleXML:
Remove the oldest story at the end of the XML file
Add a newer story given from the message queue to the top of the XML file
If you'd like, RSS feeds according to section can be a publicly facing feature.
I have a few sites with Twitter & Facebook Feeds, and one that references a health club schedule (quite large, complicated data tree). I am starting to get into caching to improve load times on page, and am also interested in keeping bandwidth usage down as these sites are hosted on our own VPS.
Right now I have Twitter and Facebook serializing/unserializing each to a simple data file, rewriting themselves every 10 minutes. Would it be better to write this data to the mySQL database? And if so, what is a good method for accomplishing this?
Also, on the Twitter feed results, it contains only what I need, so it is nice and small (3 most recent tweets). But for Facebook, the result is larger and I sort through it with PHP for display - should I store THAT result or the raw feed? Does it matter?
For the other, larger JSON object, would the file vs mysql recommendation be the same?
I appreciate any insights and would be happy to show an example of the JSON schedule object if it makes a difference.
P.S. APC is not a viable option as it seemed to break all my WordPress installs yesterday. However, we are running on FastCGI.
If it's just a cache I would go for a file, but I don't think it will really matter. Unless ofcourse you have thousands or millions of these cache files, then mysql should be the way to go. If you are doing anything else with the cache (like storing multiple versions or searching in the text) then I would go for MySQL.
As for speed, only cache what you're using. So the store the processed results and not the raw ones. Why process it every time? Try to cache it in a format as close as the actual output will be.
Since you use a VPS, I don't think you'll have an enormous amount of visitors so APC (although very nice) isn't really needed. If you do want a memory cache, you could try to look at xcache:
http://xcache.lighttpd.net/
I've been working on a website lately and want to speed up my application.
I want to cache my users' pages but the pages are dynamic like if someone posts a new feed then the homepage is updated with that new feed. If I cache the homepage for one user and a friend of his posts a new feed I want that cache to be expired and the next time he visits the homepage again the application contacts the database and fetches the new feeds and caches it.
I'm using memcache and PHP and MySQL for my DB.
I have a table called friends, feeds and users.
Will it be efficient to cache every user's friends and when that user posts a feed, my app fetches his/her friends and caches a notification with their userid so that when those friends log in the app checks at every page if there is a notification to take action (in this case deleting the homepage in the cache).
Regards,
Resul
Profile your application and locate places where you access data that is expensive to fetch (or calculate). Those places are good places to start with memcached, unless you're doing more writes than reads (where you'd likely have to update the cache more often than you could make use of it).
Caching everything you ever access could well lead to nothing than a quite full memcached that holds mostly data that is rarely accessed (while potentially pushing things out from the cache you actually should cache). In many cases you shouldn't use memcached as a 1:1 copy of your database in key-value form.
Before you even start server-side optimizations, you should run ySlow and try to get an A rating. Take a hard look at you JavaScript too. If you are using jQuery, then getting rid of it would vastly improve the overall performance of site. The front-end optimization usually is much more important.
Next step would be optimizing cleaning up the server-side code. Try testing your SQL queries qith EXPLAIN. See if you are missing some indexes. And then do some profiling on PHP side with Xdebug. See where the bottlenecks are.
And only then start messing with caching. As for Memcached, unless your website runs on top of cluster of servers, you do not need it. Hell .. it might even be harmful. If you site is located on single box, you will get much better results with APC, which, unlike Memcached, is not distributed by nature.
Write a class that handles all the DB queries, caches the tables, and does the queries on the cached tables instead your DB. update your cache each time you do an Insert or an update on a Table.
I have a website that let's each user create a webpage (to advertise his product). Once the page is created it will never be modified again.
Now, my question: Is it better to keep the page content (only a few parts are editable) into a MySql database and generate it using queries everytime the page is accesed or to create a static webpage containing all the info and store it onto the server?
If I store every page on the disk, I may reach like 200.000 files.
If I store each page in MySQL database I would have to make a query each time the page is requested, and for like 200.000 entries and 5-6 queries/second I think the website will be slow...
So what's better?
MySQL will be able to handle the load if you create the tables properly (normalized and indexed). But if the content of the page doesn't change after creation, it's better if you cache the page statically. You can organize the files into buckets (folders) so that one folder doesn't have too many files in it.
Remember to cache only the content areas and not the templates. Unless each user has complete control over how his/her page shows up.
200.000 files writable by the Apache process is not a good idea.
I recommend using a database.
Database imports/exports are easier, not telling about the difference between the maintenance costs.
Databases are using caching, and if nothing is changed, they will pull up the last result, without running the query again. This doesn't stand, thanks JohnP.
If you want to redesign your webpage sometimes later you must be using MySQL to store the pages as you can't really change them (unless you dig into regexp) after making them static.
About the time issue - its not an issue if you set indexes right.
if the data is small to moderate then prefer static hardcoding ie. putting the data in the HTML, but if it is huge, computational or dynamic and changing you have no option but to use a connectivity to the Database
I believe that proper caching technique with certain attributes (long exp. time) would be better than static pages or retrieving everything from mysql everytime.
Static content is usually a good thing if you have a lot of traffic, but 5-6 queries a second is not hard for the database at all, so with your current load it doesn't matter.
You can spread the static files to different directories by file name and set up rewrite rules in your web server (mod_rewrite on Apache, basic location matching with regexp on Nginx and similar on other web servers). That way you won't even have to invoke the PHP interpreter.
A database and proper caching. 200.000 pages times, what? 5KB? That's 1 GB. Easy to keep in RAM. Besides 5/6 queries per second is easy on a database. Program first, then benchmark.
// insert quip about premature optimisation
First of all, the website I run is hosted and I don't have access to be able to install anything interesting like memcached.
I have several web pages displaying HTML tables. The data for these HTML tables are generated using expensive and complex MySQL queries. I've optimized the queries as far as I can, and put indexes in place to improve performance. The problem is if I have high traffic to my site the MySQL server gets hammered, and struggles.
Interestingly - the data within the MySQL tables doesn't change very often. In fact it changes only after a certain 'event' that takes place every few weeks.
So what I have done now is this:
Save the HTML table once generated to a file
When the URL is accessed check the saved file if it exists
If the file is older than 1hr, run the query and save a new file, if not output the file
This ensures that for the vast majority of requests the page loads very fast, and the data can at most be 1hr old. For my purpose this isn't too bad.
What I would really like is to guarantee that if any data changes in the database, the cache file is deleted. This could be done by finding all scripts that do any change queries on the table and adding code to remove the cache file, but it's flimsy as all future changes need to also take care of this mechanism.
Is there an elegant way to do this?
I don't have anything but vanilla PHP and MySQL (recent versions) - I'd like to play with memcached, but I can't.
Ok - serious answer.
If you have any sort of database abstraction layer (hopefully you will), you could maintain a field in the database for the last time anything was updated, and manage that from a single point in your abstraction layer.
e.g. (pseudocode): On any update set last_updated.value = Time.now()
Then compare this to the time of the cached file at runtime to see if you need to re-query.
If you don't have an abstraction layer, create a wrapper function to any SQL update call that does this, and always use the wrapper function for any future functionality.
There are only two hard things in
Computer Science: cache invalidation
and naming things.
—Phil Karlton
Sorry, doesn't help much, but it is sooooo true.
You have most of the ends covered, but a last_modified field and cron job might help.
There's no way of deleting files from MySQL, Postgres would give you that facility, but MySQL can't.
You can cache your output to a string using PHP's output buffering functions. Google it and you'll find a nice collection of websites explaining how this is done.
I'm wondering however, how do you know that the data expires after an hour? Or are you assuming the data wont change that dramatically in 60 minutes to warrant constant page generation?