I have a few sites with Twitter & Facebook Feeds, and one that references a health club schedule (quite large, complicated data tree). I am starting to get into caching to improve load times on page, and am also interested in keeping bandwidth usage down as these sites are hosted on our own VPS.
Right now I have Twitter and Facebook serializing/unserializing each to a simple data file, rewriting themselves every 10 minutes. Would it be better to write this data to the mySQL database? And if so, what is a good method for accomplishing this?
Also, on the Twitter feed results, it contains only what I need, so it is nice and small (3 most recent tweets). But for Facebook, the result is larger and I sort through it with PHP for display - should I store THAT result or the raw feed? Does it matter?
For the other, larger JSON object, would the file vs mysql recommendation be the same?
I appreciate any insights and would be happy to show an example of the JSON schedule object if it makes a difference.
P.S. APC is not a viable option as it seemed to break all my WordPress installs yesterday. However, we are running on FastCGI.
If it's just a cache I would go for a file, but I don't think it will really matter. Unless ofcourse you have thousands or millions of these cache files, then mysql should be the way to go. If you are doing anything else with the cache (like storing multiple versions or searching in the text) then I would go for MySQL.
As for speed, only cache what you're using. So the store the processed results and not the raw ones. Why process it every time? Try to cache it in a format as close as the actual output will be.
Since you use a VPS, I don't think you'll have an enormous amount of visitors so APC (although very nice) isn't really needed. If you do want a memory cache, you could try to look at xcache:
http://xcache.lighttpd.net/
Related
I'm working on a site that has a store locator built in.
Since I have similar sites developed in the past, I have experienced some troubles when I had search peaks hitting the database (mySQL) hard.
All these past location search engines were querying the database to get the results.
Now I have taken a different approach, but since I'm not 100% sure, I thought that asking this great community could make me feel more secure about this direction or stick to what I did before.
So for this new search, instead of hitting the database for requests, I'm serving the search with a JSON file that regenerates (querying the database) only when something is updated, created or deleted on the locations list.
My doubt is, can a high load of requests over the json file have the same effect than a high load of query requests over the database?
Serving the search results from a JSON to lower the impact on db (and server resources) is a good approach or it's not a good idea?
Maybe someone out there had to take the same decision and can share the experience with me, or maybe you just know how things really are and recommend me a certain approach.
Flat files are the poor man's db and can be even more problematic than a heavily pounded database. For example reading and writing the file still requires a lock, and will not scale, as the same file may not be accessible to all app servers.
My suggestion would be any one of the following:
Benchmark your current hardware, identify bottlenecks, scale out or up accordingly.
Implement a caching layer, this will save on costly queries for readonly data.
Consider more high performant storage solutions such as Aerospike or Redis
Implement a real full text search engine such as ElasticSearch or SOLR.
Response to comment #1:
You could accomplish the same thing without having to read/write a flat file (which must be accessible by all app servers), by caching the data. Here's just a quick N dirty rundown of how I would do it:
Zip + 10 miles:
Query database, pull store data, json_encode, cache using a key construct like 92562_10, then store in cache. Now when other users enter 92562 + 10 they will pull data from cache vs the database (or flat file).
City, State + 50 miles:
Same as above, except key construct may look like murrieta_ca_50.
But with the caching layer you get better performance, and the cache server will be available to all your app servers, which would be much easier than having to install/configure NFS to share the file on a network.
I am running application (build on PHP & MySql) on VPS. I have article table which have millions of records in it. Whenever user login i am displaying last 50 records for each section.
So every-time use login or refresh page it is executing sql query to get those records. now there are lots of users on website due to that my page speed has dropped significantly.
I done some research on caching and found that i can read mysql data based on section, no. articles e.g (section - 1 and no. of articles - 50). store it in disk file cache/md5(section no.).
then in future when i get request for that section just get the data from cache/md5(section no).
Above solution looks great. But before i go ahead i really would like to clarify few below doubts from experts .
Will it really speed up my application (i know disk io faster than mysql query but dont know how much..)
i am currently using pagination on my page like display first 5 articles and when user click on "display more" then display next 5 articles etc... this can be easily don in mysql query. I have no idea how i should do it in if i store all records(50) in cache file. If someone could share some info that would be great.
any alternative solution if you believe above will not work.
Any opensource application if you know. (PHP)
Thank you in advance
Regards,
Raj
I ran into the same issue where every page load results in 2+ queries being run. Thankfully they're very similar queries being run over and over so caching (like your situation) is very helpful.
You have a couple options:
offload the database to a separate VPS on the same network to scale it up and down as needed
cache the data from each query and try to retrieve from the cache before hitting the database
In the end we chose both, installing Memecached and its php extension for query caching purposes. Memecached is a key-value store (much like PHP's associative array) with a set expiration time measured in seconds for each value stored. Since it stores everything in RAM, the tradeoff for volatile cache data is extremely fast read/write times, much better than the filesystem.
Our implementation was basically to run every query through a filter; if it's a select statement, cache it by setting the memecached key to "namespace_[md5 of query]" and the value to a serialized version of an array with all resulting rows. Caching for 120 seconds (3 minutes) should be more than enough to help with the server load.
If Memecached isn't a viable solution, store all 50 articles for each section as an RSS feed. You can pull all articles at once, grabbing the content of each article with SimpleXML and wrapping it in your site's article template HTML, as per the site design. Once the data is there, use CSS styling to only display X articles, using JavaScript for pagination.
Since two processes modifying the same file at the same time would be a bad idea, have adding a new story to a section trigger an event, which would add the story to a message queue. That message queue would be processed by a worker which does two consecutive things, also using SimpleXML:
Remove the oldest story at the end of the XML file
Add a newer story given from the message queue to the top of the XML file
If you'd like, RSS feeds according to section can be a publicly facing feature.
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
Hi this is more of an information request really.
I'm currently working on a pretty large event listing website and have started thinking about some caching for the data sets being used.
I have been messing with APC this week and have seen some real improvements during testing however what I'm struggling to get my head around is best practices and techniques required when trying to cache data that changes frequently.
Say for example the user hits the home page, this by default displays the latest 10 events happening and if that user is logged in those events are location specific. Is it possible to deploy some kind of caching system when dealing with logged in states and data that changes frequently, the system currently allows the user to "show more events: which is an ajax request to pull extra results from the db.
I haven't really found anything on this as I'm not sure what to search for but I'm really interested to know the techniques used for advanced caching systems that deal especially with data that changes and data specific to users?
I mean is it even worth it? are the other performance boosters when dealing with this sort of criteria?
Any articles or tips and info on this will be greatly appreciated!! Please let me know if any other info is required!!
Your basic solutions are:
file cache
memcached/redis
APC
Each used for slightly different goal.
File cache is usually something that you utilize when you can pre-render files or parts of them. It is used in templating solutions, partial views (mvc), css frameworks. That sort of stuff.
Memcached and redis are both more or less equal, except redis is more of a noSQL oriented thing. They are used for distributed cache ( multiple servers , same cached data ) and for storing the sessions, if you have cluster of webservers.
APC is good for two things: opcode cache and data cache. Faster then memcached, but works for each server separately.
Bottom line is : in a huge project you will use all of them. Each for a different task.
So you have opcode caching, which speeds things up by saving already compiled PHP files in cache.
Then you have data caching, where you save variables or objects that take time to get like data built from SQL queries.
Then you have output caching, which is where you save entire blocks of your webpages in files, and output those files instead of building that block of your webpage on each request.
I once wrote a blog post about how to do output caching:
http://www.spotlesswebdesign.com/blog.php?id=17
If it's location specific, and there are a billion locations, your best bet is probably output caching assuming you have a lot of disc space, but you will have to use your head for what is best, as each situation is very different when it comes to how best to apply caching.
If done correctly, using memcached or similar solutions can give huge boosts to site performance. By altering the cached data directly instead of rehydrating it from the database you can bypass the database entirely for data that either doesn't need to be saved or can be trivially rebuilt. Since the database is often the most critical component in web applications, any load you can take off it is a bonus.
On the other hand, making sure your database queries are as light and efficient as possible will have a much larger impact on performance than most cache tweaks.
I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.