I have installed redis on my server and implemented object caching for data returned within a PHP based web application. The php model essentially executes a reasonably complex query and returns a detailed array of data. I have tested the caching and found everything to be working as expected. I first check to see if the key exists in redis. If it does, redis returns the data, the model unserializes and returns the previously cached data. If the cache has expired, the model executes the sql query, returns the data and sets the key and serialized value in redis.
So here are my questions.
I'm not sure how to really benchmark this as it is all browser based. What tools are there out there that would allow me to get a reasonable benchmark to compare caching and not caching. I'm thinking of perhaps a php script that calls the api 1000 times via curl.
I implemented this in redis because I once read that caching with redis will work across multiple sessions or ip addresses accessing the site. For example, if the api is accessed 1000 time an hour by multiple ip addresses/users, I am assuming this approach will reduce the load on the mysql server and let redis do the work of returning the cached data instead. Can anyone shed some light on this? Are my assumptions valid?
All comments are welcome!
Thanks!
Dave
To benchmark the web site, I would use something like Siege rather than writing a specific PHP script.
Regarding Redis usage, caching things in in-memory stores like memcached or Redis is now very common. Both memcached and Redis are suitable for this purpose, although for pure caching, memcached is arguably easier to setup. 1000 times an hour represents only 3.6 TPS - any data store (including MySQL) will support such traffic without any issue. Now, multiply this traffic by 100 or 1000, and the caching layer (memcached or Redis) will become mandatory to protect your database.
To use Redis for caching, you may want to check the EXPIRE command and have a look at the maxmemory-policy parameter in the configuration file.
I have done extensive testing of cache backends for the Zend_Cache library. The tests were done using multiple php-cli processes and randomized data and considered read performance, write performance and cache tag cleaning performance. If testing just the cache backend the web server performance is not relevant so I recommend testing via CLI to simplify the testing. Also, testing with only one process will not give you an accurate picture of a backend's characteristics under heavy load.
MySQL is very fast itself and if you are doing single-record indexed queries then MySQL's own query cache will be very fast. I'd only recommend adding an additional caching layer for things that are slow (aggregated results of multiple queries or generating chunks of HTML). You can use Zend_Cache without including the entire Zend framework so I highly recommend you check out both Cm_Cache_Backend_Redis and Cm_Cache_Backend_File.
Related
I want to use session values in my script which are stored at the server using php can any one kindly explain the process to achieve this.
I want to build a chat app for this am planning to use those session values.
Assume usera and userb are logged in and their userid is sessioned based on this scenario i want to do a chat app.
Now i had done the app but i had used setinterval function of Javascript and am calling the chats i want to avoid the database hits on every 3 mill sec.
Kindly Help me out
Thanks In Advance
You're basically attempting to use PHP session files as a file cache.
Instead, you should use an object caching system such as Memcached or Redis. If memory caching isn't an option (shared hosting, etc), then you could implement your own file cache (or you could use something like PHPFastCache, which supports file caching).
Note: File caching for a chat app may or may not speed up your application. It depends on how you implement it and a number of other factors.
Hi put the session value in input box,
<input type='hidden' id='session_value' value='<?php $_SESSION['value']?>'>
Using the id fetch the session value in script,
<script>
var session_value = document.getElementById ( "session_value" );
</script>
3ms is insanely short delay to run a polling chat system. I suggest increasing it to at least 200ms but preferably around 1000ms.
$_SESSION values are per user and not recommended for viewing a chat stream for a number of reasons. Instead it sounds like you are looking more to just update the chat feed.
The database unless it is hosted on another server and $_SESSION will be the equivalent, since the database is effectively files as well. The database will actually generally be faster than reading raw file storage since Queries are normally cached and Indexing helps lookup records quicker. In addition you won't have to worry about concurrent connections to the files either.
If anything enable OPCache and install APCu for your PHP installation, to help aid the serving of requests. OPCache will cache your compiled OP code into memory so that subsequent requests to the file won't need to be recompiled.
APCu will act as your file cache, again storing your rendered data in memory.
Additionally many Database Frameworks such as Doctrine can also utilize APC caching for query and result caching.
Instead of using a InnoDB or MyISAM storage engines for your chat messages I suggest trying the MEMORY storage engine.
So instead of accessing the File System I/O your database would instead be utilizing the Memory I/O. The general concept is few writes, many reads. Since one person writes to the database, requires everyone to read the data. Just keep in mind that the Memory storage engine is temporary and is lost if the server restarts or power is lost.
For more information see: https://dev.mysql.com/doc/refman/5.6/en/memory-storage-engine.html
Overall if you are able, I would suggest look at using Socket IO (Websockets) instead of either database or file based caching. This puts the load on the clients instead of the server, and everything occurs in real-time instead of polling for changes.
For some examples see:
Ratchet http://socketo.me/
React http://reactphp.org/
Node.js http://tutorialzine.com/2014/03/nodejs-private-webchat/
We are developing a high end web application with php/mysql and would like to explore more into memcached usage.
From the tutorials we understand that we need to write to memcached server along with writing to mysql tables. But we are confused how to use this if we are loading a list of data with a pagination. In that page we might need to filter data based on different fields. In this scenario, can we rely on memcached along without using mysql database. If not, how can memcached help to scale php application.
first of all, memcached does not persist your data to the disk, so relying on it to keep your data is not a good practice
memcached helps to scale your php application by minimizing the load from your db by serving results from queries or even better minimizing the load off the web server by storing the entire html if you can (there are actually many good use cases for memcached)
consider the follwing flow:
1. client browsing to your page
2. php application requests the data from memcached by a key (the key can be the sql query, the url ... it depends)
3. if memcached has the data use it
4. if not the php application should query the data from mysql
5. after the data is retrieved save it in memcached so next request it will be available in 2
to handle updates, make sure to define ttl to each item you store in the memcached (of course there are also other ways to invalidate the cache)
I have a situation where my Linux server will be running a website which gets some of its data from a 3rd-party server through a SOAP interface. The data isn't exactly real-time, but it does change every 5 minutes or so. I was told not to have our website hammer their website for data, which I can completely understand.
So I wondered if this was a good candiate to use a cache scheme of some type. Where when a user comes to our web page to display the data, if it's less than 5 minutes old (for example), it would get that data from our server instead of polling the 3rd-party for it. This way, if 100 users at once come to our website, our server won't be access the 3rd-party website 100 times to share the same exact data within a given time-frame.
Is this a practical thing to do in PHP? Or should this be written in a faster language when it comes to caching? Are their cache packages for this sort of situation which can be used along with a PHP Joomla application? Thanks!
I think memcached is a good choice.
You can set timeout when you store content to memcached server, if key-value missed, retrieve data from 3rd-part server and store again.
There is memcached extension for PHP, check doc here.
There's lots of ways to solve the problem -we can't say which is the right one without knowing a lot more about the constraints you are working in or how the service is used. If you are using Joomla then you're obviously not bothered about performance - it would be really hard to write anything which has a measurable impact on your html generation times. This does not need to "be written in a faster language", but....
can you install additional software?
have you got access to cron?
at what rate is the service consumed?
how many webservers do you have consuming the service - do they have a shared filesystem? Are they on the same sub-net?
Is the SOAP response cacheable?
how do you deal with non-availability of the service?
For a very scalable solution I would suggest running a simple forward proxy (e.g. squid) but do make sure that it's not accessible from the internet. Sven (see comment elsewhere) is right about POST sometimes not being cacheable - but you can cache the response from a surrogate script on your own site accessed via GET returning appropriate caching instructions - and this could return the data as a serialized php array / object which is much less expensive to process. Indeed whichever method you choose I would recommend caching the parsed response - not the XML. This also allows you to override poor caching information from the service.
If the rate is less than around 1 per minute then the cron solution is overkill. But if its more than 20 per minute then it makes a lot of sense. If you don't have access to cron / can't install your own software then you might consider simply caching the response and refreshing the cache on demand. Don't bother with memcache unless you are already using it. APC is faster on a single server - but memcache is distributed. If you have multiple servers then use whatever cluster storage you are currently sharing your data in (distributed filesystem / database cluster / shared filesystem....).
Don't try to use locking / mutexes around the cache refresh unless you really have to (i.e. only if accessing the service more than once every 5 minutes is a mortal sin) - this gets real complicated real quick - it's too easy to introduce bugs.
Do make sure you buffer and validate any responses before writing them to the cache.
Yes, just use HTTP. Most of the heavy lifting has already been built into your web server.
Since SOAP is just a simple HTTP POST request with an XML body, you could set up your website or HTTP API in front of the SOAP endpoint to act like a translator to regular HTTP, attaching the appropriate HTTP caching headers on the transformed response body and then configure an NGinx reverse proxy in front of it.
Notably: if the transformation is simple you could just use XSLT to transform the response body from the SOAP API and remove the web service layer entirely.
Your problem is a very small one, which does not require a complicated solution.
You could write a small cron job that is executed every five minutes, sends the request to the SOAP server, and stores the result in a local file. If any script needs the data, it reads the local file. This will result in 288 requests to the SOAP server per day, and have excellent performance for any script call that needs the results because they are already on your server.
If you do not have cron jobs available and cannot fake them, any other cache will do. You really don't need fancy stuff like Memcached, unless it already is available. Storing the result to a cache file will work as well. Note that if you have to really fetch the SOAP result from the origin, this will take some more time and might affect the perceived performance of your site.
There are plenty of frameworks which also offer cache support, and if you use one you should investigate if there is support included. I'm not sure if Joomla has something appropriate for you. Otherwise, you can implement something yourself. It isn't that hard.
Cache functionality comes in various flavours:
memory-based, where a separate process on the server holds data in RAM (or overflows to disk) and you query it like you would a database; very efficient and powerful, and will have options to manage storage use and clear up after themselves, but requires setting up additional software on the server; e.g. memcached, redis
file-based, where you just write the data to disk; less efficient, but can be implemented in "user-land" code, i.e. pure PHP; beware of filling up your disk with variant caches that have expired but not been cleaned up; many frameworks have an implementation of this built in
database-backed, where you push data into an RDBMS (e.g. MySQL, PostgreSQL) or fully-featured NoSQL store (e.g. MongoDB); might make sense if you have a large amount of data, and can trade a bit of performance; as with files, you need to make sure that stale data is cleaned up
In each case, the basic idea is that you create a "key" that can tell one request from another (e.g. the name of the SOAP call and its input parameters, serialized), and pick a "lifetime" (how long you want to carry on using the same copy of the data). The caching engine or library then checks for a cache with that key, and if it is still within its "lifetime" returns the previously cached data. If there is a "cache miss" (there is no cache for that key, or it has expired), you perform the costly operation (in your case, the SOAP call) and save to the cache, using the same key.
You can do more complex things, like pre-caching things in the background so that there is never a cache miss, or having some code paths which accept stale data in order to return quickly, but these can generally be implemented on top of whatever you're using as the main caching solution.
Edit Another important decision is at what level of granularity to cache the data, in relation to processing it. At one extreme, you could cache each individual SOAP call: simple to set up, but means re-processing the same data repeatedly, and can cause problems if two responses are related, but cached independently and may get out of sync. At the other extreme, you can cache whole rendered pages: pages load very fast once cached, but creating variations based on the same data without repeating work becomes tricky. In between are various points in your code where you have processed and combined data into meaningful chunks: if your application is well-written, these are the input and output of major functions, or possibly even complete model objects; this is more work to implement, as you have to choose the right keys (avoiding two contexts overwriting each other's caches while ignoring variables that have no impact on the data in question) and values (avoiding repeats of costly work without having to store huge blobs of data which will be slow to unserialize and use up the capacity of your cache store). As with anything else, no approach suits all needs, and a complex application will probably involve caching at multiple levels for different purposes.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"
I have a PHP application that calls web services APIs to get some objects before rendering a web page that incorporates those objects. In some cases these APIs are really slow (seconds) and that is not acceptable from a user experience point of view. Two things I know I can do...
Use ajax and make these calls in the background
Time out the call and degrade gracefully if it is taking too long
Neither is ideal, so I was thinking about using memcache (the PHP extension for memcached) to cache the object that I get from the 3rd party web service. The objects will be loaded many times by different users loading the same page, so this seems to make sense.
The objects are relatively small (~1k).
Does this sound like a reasonable approach? I know memcached was originally designed to alleviate database load, so I'm wondering whether there is a gotcha somewhere that I'm not seeing.
Thanks.
This is a perfectly legitimate use of memcache. It is not only for database load reduction, it is for caching and object storage in general. :)
Also note, PHP has two interfaces for memcached. Confusingly, they are named "memcache" and "memcached". Read these to pick between the two:
https://serverfault.com/questions/63383/memcache-vs-memcached
http://code.google.com/p/memcached/wiki/PHPClientComparison
I'd highly recommend memcache for this situation as it will:
Reduce DNS calls.
Reduce page latency.
Reduce bandwidth usage.
Your only real task is to determine how often the data you are dealing with will be changing. This will help you to optimize your expiry time for the cache key(s).
This approach may not work for you in your situation, but you might use cron jobs to call a PHP script that loads the required information then caches it to a more speedy data source (XML or Database).
This may not work if the information is updated really often or if there is a lot of different data that needs to be loaded, but it is an option. I've used this approach for other tasks that take a lot of time to complete and have found it to be a reasonable solution.