I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.
Related
I have an application that is fetching several e-commerce websites using Curl, looking for the best price.
This process returns a table comparing the prices of all searched websites.
But now we have a problem, the number of stores are starting to increase, and the loading time actually is unacceptable at the user experience side. (actually 10s pageload)
So, we decided to create a database, and start to inject all Curl filtered result inside this database, in order to reduce the DNS calls, and increase Pageload.
I want to know, despite of all our efforts, is still an advantage implement a Memcache module?
I mean, will it help even more or it is just a waste of time?
The Memcache idea was inspired by this topic, of a guy that had a similar problem: Memcache to deal with high latency web services APIs - good idea?
Memcache could be helpful, but (in my opinion) it's kind of a weird way to approach the issue. If it was me, I'd go about it this way:
Firstly, I would indeed cache everything I could in my database. When the user searches, or whatever interaction triggers this, I'd show them a "searching" page with whatever results the server currently has, and a progress bar that fills up as the asynchronous searches complete.
I'd use AJAX to add additional results as they become available. I'm imagining that the search takes about ten seconds - it might take longer, and that's fine. As long as you've got a progress bar, your users will appreciate and understand that Stuff Is Going On.
Obviously, the more searches go through your system, the more up-to-date data you'll have in your database. I'd use cached results that are under a half-hour old, and I'd also record search terms and make sure I kept the top 100 (or so) searches cached at all times.
Know your customers and have what they want available. This doesn't have much to do with any specific technology, but it is all about your ability to predict what they want (or write software that predicts for you!)
Oh, and there's absolutely no reason why PHP can't handle the job. Tying together a bunch of unrelated interfaces is one of the things PHP is best at.
Your result is found outside the bounds of only PHP. Do not bother hacking together a result in PHP when a cronjob could easily be used to populate your database and your PHP script can simply query your database.
If you plan to only stick with PHP then I suggest you change your script to index your database from the results you have populated it with. To populate the results, have a cronjob ping a PHP script that is not accessible to the users which will perform all of your curl functionality.
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
I am considering enabling Memcache support for my large-scale REST service. However I have some questions regarding best approaches for these key-value stores.
The setup:
A database wrapper which has functions for select, update and etc.
A REST framework which contains all the API functions (getUser, createUser and etc.)
In my head, the ideal approach would be to integrate the Memcache in the database wrapper so, for example, every SQL query would get md5-hashed and saved in the cache (this is btw what most online resources suggests). However, there is obviously a problem with this approach: if a search query has been cached, and one of the users from the search result has been updated after the cached result, this wont reflect in the next request (because it is now in the cache).
As I see it I have several ways of handeling this:
Implement the Memcache in the REST framework for each function (getUser, createUser etc) and thereby explicit handle the updating of the cache etc. if users gets updated. This could end up in redundant code.
Let the cached values expire very quickly and live with the fact that some requests shows old cached values.
Do a more advanced implementation of the Memcache in the database wrapper so that I can identify which parts(e.g. users) to update in e.g. a search request.
Could you guide me to which of the following, or a complete another approach, to take?
Thanks in advance.
Enabling cache for a web application is not something to take lightly.
Maybe you have done that already bit... I recommend you first come up with a goal based on business needs or forcast (ex: must accept 1000 requests per seconds) then properly stress-test your system to have numbers before you start changing anything and then identify your bottleneck.
http://en.wikipedia.org/wiki/Performance_tuning
I usually use profiling tools such as HXProf (by facebook).
https://github.com/facebook/xhprof
Caching all your data to mirror your database might not be the best approach.
Find out how big you can allocate for your cache. If your architecture only allow you to allocate 100MB for your memcache, then it will affect your decision about what you cache and how long you cache it.
The best cache is to cache forever. But we all know that data changes. You can start by caching data that is requested often and requires the most resources to fetch.
Always try to make sure you are not working on improving something that will get you low improvement.
Without understanding your architecture in depth, it would be hazardous for anyone to recommend a caching strategy that best fit your needs.
Maybe you should cache the resutling output of your web services instead? Using a reverse proxy for example (What #Darrel is talking about) or using output buffering...
http://en.wikipedia.org/wiki/Reverse_proxy
http://php.net/manual/en/book.outcontrol.php
Optimize your database queries before you think about caching. Make sure your use a PHP Op cache (like APC) and all those things that are standard practice.
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://blog.digitalstruct.com/2008/01/31/performance-tuning-overview/
If you want to cache data and prevent stale/old data from being served, the trick is to identify your data (primary key maybe?) and when the data is updated or deleted, you delete or update the cache for that identifyer.
<?php
// After inserting into DB, you can also put it in the cache
$memcache->set($userId, $userData);
// After updating or deleting the user, you update or delete the data
$memcache->delete($userId);
A lot of site will show stale data. When I am on stackoverflow and my reputation is increased and then I got in the stackoverflow chat, the reputation shown is my old reputation. When I got a reputation of 20 (reputation required to chat) I still could not chat for another 5 minutes because the chat system had my old reputation data and did not yet know my reputation had increased enough to allow me to chat. Some data can be stale while other type of data should never be stale. Consider that when caching data.
Conclusion
Your approaches can all be valid depending on the factors that I talk about above. In fact, you can use a combination of those for all the different type of data you want to cache and how long it is acceptable to show old data for them. Maybe the categories or list of countries (since they do not change often) can be cached for a long time while the reputation (or whatever data changes all the time for all users) should be cached for a short period only.
In PHP,
What are the Advantage and Disadvantage of Caching in Web Development In PHP, how does it affect Database?
Caching works in many different ways, but for PHP specifically I can think of a few ways;
Database calls; they are slow, require computation, and can be quite intensive. If you've got repeated calls, caching the query is golden. There's two levels; at the PHP side where you control the cache, and at the database side where they do.
Running PHP code means the webserver calls the PHP interpreter, it parses the code, and the run it. A PHP cacher can cache the parsing part, and go straight for the running part. THen there's the next generation of directly compiling PHP code to C, and run it from there (like Facebook does).
Computations; if you're doing math or heavy lifting of repeated operation, you can cache the result instead of calculate it every time.
Advantages;
speed
less resources used
reuse
being smart
Disadvantages;
stale data
overhead
complexity
I'll only deal with the disadvantages here;
First, stale data; this means that when you use cached content/data you are at risk of presenting old data that's no longer relevant to the new situation. If you've cached a query of products, but in the mean time the product manager has delete four products, the users will get listings to products that don't exists. There's a great deal of complexity in figuring out how to deal with this, but mostly it's about creating hashes/identifiers for caches that mean something to the state of the data in the cache, or business logic that resets the cache (or updates, or appends) with the new data bits. This is a complicated field, and depends very much on your requirements.
Then overhead is all the business logic you use to make sure your data is somewhere between being fast and being stale, which lead to complexity, and complexity leads to more code that you need to maintain and understand. You'll easily lose oversight of where data exists in the caching complex, at what level, and how to fix the stale data if you get it. It can easily get out of hand, so instead of doing caching on complex logic you revert to simple timestamps, and just say that a query is cached for a minute or so, and hope for the best (which, admittedly, can be quite effective and not too crazy). You could give your cache life-times (say, it will live X minutes in the cache) vs. access (it will live for 10 requests) vs. timed (it will live until 10pm) and variations thereof. The more variation, the more complexity, of course.
However, having said that, caching can turn a bog of a system into quite a snappy little vixen without too much effort or complexity. A little can get you a long way, and writing systems that use caching as a core component is something I'd recommend.
The main advantage, and also the goal, of caching is speeding up loading and minimizing system resources needed to load a page.
The main disadvantage is how it's implemented by the developers, and then maintaining proper caching system for the website, making it properly manageable by the Admin.
The above statements are purely said in general terms.
Caching is used to reduce hefty/slow operations (heavy calculations/parsing/database operations) which will consistently product the same result. Caching this result will reduce the server load and speed up the application (because the hefty/slow operation does not need executing)
The disadvantage is that it'll often increase complexity of the application, because the cache should be purged/altered when the result of the operation will no longer be the result cached.
Simple example: a website whose navigation is stored in the database could cache the navigation once the navigation has been fetched from the database, thus reducing the total amount of db-calls, because we no longer need to execute a query to retrieve the navigation.
When the navigation changes (e.g. a page had been added), the cached value for the navigation should be rebuilt, because the navigation that has been cached does not yet reflect the latest change: the new page is not present there.
When a page is Cached, instead of regenerating the page every time, they store a copy of what they send to your browser. The next time a visitor requests the same page, the script will know it'd already generated one recently, and simply send that to the browser without all the hassle of re-running database queries or searches.
Advantage of Caching:
Reduce load on Web Servers and Database
Page downloads faster
Disadvantage:
As information is stored in cache, it make page the heavy.
Sometimes the updated information doesnot show as the cache is not updated
Advantages and disadvantages of caching in web development totally depends upon our context!
Main advantage is reduce data retrieval time either from database or at page loading time.
and disadvantage is separate maintenance or using third party services or tools for that.
Currently, I have a website that people can open up during a certain team's hockey games. When the hockey team scores, a designated person clicks a button in a secure location. This updates a single entry in MySQL database with the current timestamp.
On the front-end of the website, there is an asynchronous call that runs every 15 seconds to a PHP script to query the database for that timestamp. The script then compares the current time to the timestamp pulled and if it's within 15 seconds of the current timestamp, it triggers an event on the webpage that includes playing the sound of an air horn and playing a short clip of the team's goal song.
I usually get a good amount of traffic to the sight during the team's games, however many people complain about the (up to) 15 second delay after the goal is scored for the sound to be triggered. I'd like to find a way to remedy that.
Obviously, I don't think querying the database every single second for every single users who is on the page (think 100+) is going to work; I'll likely kill my database. So, is there another way I can achieve my result? Would it be possible to place a PHP variable into the server's memory that can be pulled by each session without the negative consequences as using a database or file system read?
EDIT: My host doesn't have memcached available for me to use and I cannot install it. It's disappointing because that sounds like it would have been the optimal solution. Does anyone have an alternative idea I could look in to that doesn't use memcached?
In this situation, something like memcached (also available in an objectified form as memcache) is most likely the perfect solution, one of its design goals being "to decrease database load in dynamic web applications".
You can read more about memcached at its main web site, or simply use the links above to investigate PHP's modules.
For something like this you want to use a technique known as Comet. Its not particularly difficult, but requires a bit of effort.
Basically you'll keep a live connection open to each of the browsers, instead of having them re-open the connection every 15 seconds. This allows you to write to the connection immediately.
Google for "Comet" and "PHP" and you should find some good resources. http://www.zeitoun.net/articles/comet_and_php/start looks thorough.
http://memcached.org/ is what your looking for. This is directly made for fast ram based data access to objects, arrays, and variables in your system. Cuts out the load on MySQL as long as your doing concurrent updates.
If you're not locking, just reading, "100+" requests isn't even that heavy. Have you considered just doing a stress test?