Performance difference of caching PHP Objects on file - php

Is there difference between caching PHP objects on disk rather than not? If cached, objects would only be created once for ALL the site visitors, and if not, they will be created once for every visitor. Is there a performance difference for this or would I be wasting time doing this? Thank you :)

The performance comes down to subsequent use so even if your object is cached once and used by all ... or cached once per visitor ... it is the subsequent use that counts. If your cached object is used 10,000,000 times a day then you will save.
If the cached object is used once or not at all, the gain is negligible.

It's like this:
File is way faster then Database.
Memory is way faster then files (when not swapping disk space).
So cache to memory what you need the most, and try and think when you need to make the effort to cache to file.
Remember: premature optimisation usually isn't the best thing to do.

You'll have to take in account that dynamic content must be cached whenever it changes. But, to static content, it's much more faster to use cached content, especially in database schemas, configurations, and that kind of thing.

Assuming that they are large objects, and especially if they are being built based on data pulled from a database, caching is a good idea. For some testing I did with a particular instance, it was quicker to build an object, write it to a file, and load from the file, than to go to the DB every time.
However: there are better ways than writing to a file. You would probably be better off storing the object in memory with memcached or APC, and don't forget that you may have issues with locks on the file is several people are hitting the site at once.

Caching objects on disk will give you the additional overhead of a disk write whenever there is a cache miss, whereas on a cache hit you will be able to save the step of instantiating the object.
If construction of the object is enough of a burden that saving it sometimes will more than outweighs the additional burden of a disk write, then go ahead and cache on disk.
I'd be curious, however, to know more about why instantiating your objects is so costly that you would want to do this.

#Aiden Bell
Errm. Depends on the object. If one instance can be shared then you save storage ... but it depends if the object is altered by the user. You will be reading/writing the object creating a bottle-neck. Without more information, you will have to lookup the theory here. – Aiden Bell 11 mins ago
--
These instances don't have data that can be altered by the user, however, they would contain data that would be used by users throughout the site.
Basically, when it comes down to it, the main question is:
Multiple objects in memory, PER user (each user has his own set of instantiated objects)
VS
Single objects in cached in file for all users (all users use the same objects, for example, same error handler class, same template handler class, and same database handle class)
--
Hi, anyone else have any ideas?

Related

PHP persistent cache data

I need to hold a semi-static large object in cache so I don't need to request it every time from database. Something like $_SESSION, but not tied to a session, because the data are common to all users.
I can cache client side that data, once I got it, but I would like to avoid disturbing the database with select queries of large data that (almost) never changes.
Also, I cannot add modules (like APC cache) in this environment.
I could store my data into a file, say a JSON, which I read with php instead of querying db, but accessing filesystem is also disturbing if php needs to do it many times per seconds AND filesize is not tiny.
Is there a built in way in php to store objects in memory, common to all php instances?
EDIT: Could I use $_session as storing space, forcing session_id to be always the same? Is it dangerous? I don't use sessions for the application itself. I tried and it works
Most Operating systems will store the result of reading from disk in its cache.
This means that the disk will not be hit each time. File based storage is actually pretty quick for multiple reads of the same file as its really just coming direct from memory.
as long as "pretty large" still means fits in memory this way should be fine

Is it good practice to store large Data in session variable?

I am currently programming a php site, which atm needs to query a large amount of data (about 4 - 5MB) everytime. I already have a session going and wanted to ask, if its good practice to store that data in the session variable?
The current plan is to also maintain a table in the Database containing when a table has changed last. If that timestamp would be newer, then the data would be queried again, if not, use the data of the session variable as its still consistent...
Is this a good way to avoid querying too much data? And what speed impacts would the site have when a session is about 5MB in size?
Thanks in advance!
It's not really good practice (it will make PHP chew far more memory than it really should), but I'm not sure how it will affect performance.
I suppose the real question is this: Why do you need to store so much in the session? If it's information that is meant to be accessible between sessions, then you should be storing it in a database and loading it 'at need'.
If it's binary data (images, files, etc.) that are only relevant while the session is valid, then store it in a temporary file for the user (look at tempnam() and sys_get_temp_dir()), then store the temporary filename on the session.
No, it's not good practice to do this.
Points to consider:
By defailt, the session data is stored on disk in a temp folder. Every time you call session_start() (ie every page load), it will have to load the whole of that data into memory and populate it into the session array. If you're loading large amounts of data, this could have performance implications.
Also, since you're loading this large chunk of data every time, it means that each page load will take more memory. This reduces the number of concurrent users that your server can support.
If you're doing this for caching purposes to reduce hits to your DB, there are much better solutions available. APCu, Memcache, Redis and others can all do a much better job of caching your data than your proposed custom-written solution. There are also wrapper libraries available that make it even easier and allow you to mix and match between caching solutions. If you're using a framework like Laravel or Symphony, there may be caching classes built into your framework. Alternatively, you could try a stand-alone library like phpFastCache. But also, don't forget that modern DB engines have their own caching mechanisms built in, so repeated calls to the same or similar queries should be reasonbly fast anyway.

Identify data to cache in which layer - PHP/MySQL

Think you are the proud owner of Facebook, then
which data you want to store in app layer [memcached/ APC] and which data in MySQL cache ?
Please explain also why you think so.
[I want to have an idea on which data to cache where]
For memcache, store session data. You have to typically query from a large table or from the filesystem to get it, depending on how it's stored. Putting that on memory removes hitting the disk for a relatively small amount data (that is typically critical to one's web application).
For your database cache, put stuff in there that is not changing so often. We're talking about wall posts, comments, etc. They are queried a lot and rarely change, all things considered. You may also want to consider doing a flat file cache, so you can purge individual files with greater ease, and divide it up as you see fit.
I generally don't directly cache any arbitrary data with APC, usually I will just let it cache stuff automatically and get lessened memory loads.
This is only one way to do it, but as far as the industry goes, this is a somewhat well-used model.

PHP APC To cache or not to cache?

I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.

PHP Object Caching performance

Is there difference between caching PHP objects on disk rather than not? If cached, objects would only be created once for ALL the site visitors, and if not, they will be created once for every visitor. Is there a performance difference for this or would I be wasting time doing this?
Basically, when it comes down to it, the main question is:
Multiple objects in memory, PER user (each user has his own set of instantiated objects)
VS
Single objects in cached in file for all users (all users use the same objects, for example, same error handler class, same template handler class, and same database handle class)
To use these objects, each PHP script would have to deserialize them anyway. So it's definitely not for the sake of saving memory that you'd cache them on disk -- it won't save memory.
The reason to cache these objects is when it's too expensive to create the object. For an ordinary PHP object, this is not the case. But if the object represents the result of an costly database query, or information fetched from a remote web service, for instance, it could be beneficial to cache it locally.
Disk-based cache isn't necessarily a big win. If you're using PHP and concerned about performance, you must be running apps in an opcode-caching environment like APC or Zend Platform. These tools also provide caching you can use to save PHP objects in your application. Memcached is also a popular solution for a fast memory cache for application data.
Also keep in mind not all PHP objects can be serialized, so saving them in a cache, whether disk-based or in-memory, isn't possible for all data. Basically, if the object contains a reference to a PHP resource, you probably can't serialize it.
Is there difference between caching PHP objects on disk rather than not?
As with all performance tweaking, you should measure what you're doing instead of just blindly performing some voodoo rituals that you don't fully understand.
When you save an object in $_SESSION, PHP will capture the objects state and generate a file from it (serialization). Upon the next request, PHP will then create a new object and re-populate it with this state. This process is much more expensive than just creating the object, since PHP will have to make disk I/O and then parse the serialized data. This has to happen both on read and write.
In general, PHP is designed as a shared-nothing architecture. This has its pros and its cons, but trying to somehow sidestep it, is usually not a very good idea.
Unfortunately there is not right answer for this. The same solution for the same website on the same server can provide better performance or a lot worse. It really depends on too many factors (application, software, hardware, configuration, server load, etc).
The points to remember are:
- the slowest part of a server is the hard drive.
- object creation is WAY better than disk access.
=> Stay as far as possible from the HD and cache data in RAM if possible.
If you do not have performance issue, I would advice to do... nothing.
If you have performance issue: benchmark, benchmark, benchmark. (The only real way to find a better solution).
Interesting video on that topic: YouTube Scalability
I think you would be wasting time, unless the data is static and complex to generate.
Say you had an object representing an ACL (Access Control List) stating which user levels have permissions for certain resources.
Populating this ACL might take considerable time, especially if data comes from a database. The cached ACL could be instantiated much quicker.
I have used caching SQL query results, and time-intensive calculation results and have had impressive results. right now I'm working on an application that fetches more than 200 database records (which have a a lot of SQL functions and calculation in them) from a table with more than 200,000 records, calculate results from the fetched data, for each request. I use Zend_Cache component of Zend Framework to cache the calculated results, so next time I do not need to:
connect to database
wait for database server to find my records, calculation my sql functions, return results
fetch at least 200 (could even rich 1000) records into memory
step over all these data and calculate what I want from them
I just do:
call for Zend_Cache::load() method, that will do some file reading.
that will save me at least 4-5 seconds on each request (very inaccurate, I did not profile it actually. but the performance gain is quite visible)
Can be useful in certain cases, but comes with careful study of implications and after other kind of performance improvements (like DB queries, data structure, algorithms, etc.).
The query you cache should be constant (and limited in number) and the data, pretty static. To be effective (and worth it), your hard disk access needs to be far quicker than your DB query for that data.
I once used that by serializing cached objects in files, on relatively static content on a home page taking 200+ hits/s with a heavily loaded single-instance DB, with unavoidable queries (at my level). Gained about 40% performance on that home page.
Code -when developing that from scratch- is very quick and straightforward, with pile_put/get_contents and un/serialize. You can name your file after, say, the md5 checksum of your query.
Having the objects cached in memory is usually better then on the disk:
http://code.google.com/p/php-object-cache/
However, benchmark for yourself and compare the results. Thats they only you can know for sure.

Categories