Is there difference between caching PHP objects on disk rather than not? If cached, objects would only be created once for ALL the site visitors, and if not, they will be created once for every visitor. Is there a performance difference for this or would I be wasting time doing this?
Basically, when it comes down to it, the main question is:
Multiple objects in memory, PER user (each user has his own set of instantiated objects)
VS
Single objects in cached in file for all users (all users use the same objects, for example, same error handler class, same template handler class, and same database handle class)
To use these objects, each PHP script would have to deserialize them anyway. So it's definitely not for the sake of saving memory that you'd cache them on disk -- it won't save memory.
The reason to cache these objects is when it's too expensive to create the object. For an ordinary PHP object, this is not the case. But if the object represents the result of an costly database query, or information fetched from a remote web service, for instance, it could be beneficial to cache it locally.
Disk-based cache isn't necessarily a big win. If you're using PHP and concerned about performance, you must be running apps in an opcode-caching environment like APC or Zend Platform. These tools also provide caching you can use to save PHP objects in your application. Memcached is also a popular solution for a fast memory cache for application data.
Also keep in mind not all PHP objects can be serialized, so saving them in a cache, whether disk-based or in-memory, isn't possible for all data. Basically, if the object contains a reference to a PHP resource, you probably can't serialize it.
Is there difference between caching PHP objects on disk rather than not?
As with all performance tweaking, you should measure what you're doing instead of just blindly performing some voodoo rituals that you don't fully understand.
When you save an object in $_SESSION, PHP will capture the objects state and generate a file from it (serialization). Upon the next request, PHP will then create a new object and re-populate it with this state. This process is much more expensive than just creating the object, since PHP will have to make disk I/O and then parse the serialized data. This has to happen both on read and write.
In general, PHP is designed as a shared-nothing architecture. This has its pros and its cons, but trying to somehow sidestep it, is usually not a very good idea.
Unfortunately there is not right answer for this. The same solution for the same website on the same server can provide better performance or a lot worse. It really depends on too many factors (application, software, hardware, configuration, server load, etc).
The points to remember are:
- the slowest part of a server is the hard drive.
- object creation is WAY better than disk access.
=> Stay as far as possible from the HD and cache data in RAM if possible.
If you do not have performance issue, I would advice to do... nothing.
If you have performance issue: benchmark, benchmark, benchmark. (The only real way to find a better solution).
Interesting video on that topic: YouTube Scalability
I think you would be wasting time, unless the data is static and complex to generate.
Say you had an object representing an ACL (Access Control List) stating which user levels have permissions for certain resources.
Populating this ACL might take considerable time, especially if data comes from a database. The cached ACL could be instantiated much quicker.
I have used caching SQL query results, and time-intensive calculation results and have had impressive results. right now I'm working on an application that fetches more than 200 database records (which have a a lot of SQL functions and calculation in them) from a table with more than 200,000 records, calculate results from the fetched data, for each request. I use Zend_Cache component of Zend Framework to cache the calculated results, so next time I do not need to:
connect to database
wait for database server to find my records, calculation my sql functions, return results
fetch at least 200 (could even rich 1000) records into memory
step over all these data and calculate what I want from them
I just do:
call for Zend_Cache::load() method, that will do some file reading.
that will save me at least 4-5 seconds on each request (very inaccurate, I did not profile it actually. but the performance gain is quite visible)
Can be useful in certain cases, but comes with careful study of implications and after other kind of performance improvements (like DB queries, data structure, algorithms, etc.).
The query you cache should be constant (and limited in number) and the data, pretty static. To be effective (and worth it), your hard disk access needs to be far quicker than your DB query for that data.
I once used that by serializing cached objects in files, on relatively static content on a home page taking 200+ hits/s with a heavily loaded single-instance DB, with unavoidable queries (at my level). Gained about 40% performance on that home page.
Code -when developing that from scratch- is very quick and straightforward, with pile_put/get_contents and un/serialize. You can name your file after, say, the md5 checksum of your query.
Having the objects cached in memory is usually better then on the disk:
http://code.google.com/p/php-object-cache/
However, benchmark for yourself and compare the results. Thats they only you can know for sure.
Related
Lets assume you're developing a multiplayer game where the data is stored in a MySQL-database. For example the names and description texts of items, attributes, buffs, npcs, quests etc.
That data:
won't change often
is frequently requested
is required on server-side
and cannot be cached locally (JSON, Javascript)
To solve this problem, i wrote a file-based caching system that creates .php-files on the server and copies the entire mysql-tables as pre-defined php variables into them.
Like this:
$item_names = Array(0 => "name", 1 => "name");
$item_descriptions = Array(0 => "text", 1 => "text");
That file contains a loot of data, will end up having a size of around 500 KB and is then loaded on every user request.
Is that a good attempt to avoid unnecessary queries; Considering that query-caching is being deprecated in MySQL 8.0? Or is it better to just get the data needed using individual queries, even if ending up with hundreds of them per request?
I suggest you to use some kind of PSR-6 compilant cache system (it could be filesystem also) and later when your requests grow you can easily swap out to a more performant cache, like a PSR-6 Redis cache.
Example for PSR-6 compatible file system cache.
More info about PSR-6 Caching Interface
Instead of making your own caching mechanism, you can use Redis as it will handle all your caching requirements.
It will be easy to implement.
Follow the links to get to know more about Redis
REDIS
REDIS IN PHP
REDIS PHP TUTORIALS
In my experience...
You should only optimize for performance when you can prove you have a problem, and when you know where that problem is.
That means in practice that you should write load tests to exercise your application under "reasonable worst-case scenario" loads, and instrument your application so you can see what its performance characteristics are.
Doing any kind of optimization without a load test framework means you're coding on instinct; you may be making things worse without knowing it.
Your solution - caching entire tables in arrays - means every PHP process is loading that data into memory, which may or may not become a performance hit in its own right (do you know which request will need which data?). It also looks like you'll be doing a lot of relational logic in PHP (in your example, gluing the item_name to the item_description). This is something MySQL is really good at; your PHP code could easily be slower than MySQL at joins.
Then you have the problem of cache invalidation - how and when do you refresh the cached data? How does your application behave when the data is being refreshed? I've seen web sites slow to a crawl when cached data was being refreshed.
In short - it's a complicated decision, there are no obvious right/wrong answers. My first recommendation is "build a test framework so you can approach performance based on evidence", my second is "don't roll your own - consider using an ORM with built-in cache support", my third is "consider using something like Redis or memcached to store your cache information".
There are many possible solutions, depends on your requirements. Possible solution could be:
File base JSON format caching. Data retrieve from database will be save to a file for next time use before the program process.
Memory base cache, such as Memcached, APC, Redis, etc. Similar the upon solution, better performance but more integrated code required.
Memory base database, such as NoSQL, MongoDB, etc. It is a memory base database.
Multiple database servers, one master write database with multiple salve for read databases, there are a synchronisation between servers.
Quick and minimise the code changes, I suggest using option B.
In PHP,
What are the Advantage and Disadvantage of Caching in Web Development In PHP, how does it affect Database?
Caching works in many different ways, but for PHP specifically I can think of a few ways;
Database calls; they are slow, require computation, and can be quite intensive. If you've got repeated calls, caching the query is golden. There's two levels; at the PHP side where you control the cache, and at the database side where they do.
Running PHP code means the webserver calls the PHP interpreter, it parses the code, and the run it. A PHP cacher can cache the parsing part, and go straight for the running part. THen there's the next generation of directly compiling PHP code to C, and run it from there (like Facebook does).
Computations; if you're doing math or heavy lifting of repeated operation, you can cache the result instead of calculate it every time.
Advantages;
speed
less resources used
reuse
being smart
Disadvantages;
stale data
overhead
complexity
I'll only deal with the disadvantages here;
First, stale data; this means that when you use cached content/data you are at risk of presenting old data that's no longer relevant to the new situation. If you've cached a query of products, but in the mean time the product manager has delete four products, the users will get listings to products that don't exists. There's a great deal of complexity in figuring out how to deal with this, but mostly it's about creating hashes/identifiers for caches that mean something to the state of the data in the cache, or business logic that resets the cache (or updates, or appends) with the new data bits. This is a complicated field, and depends very much on your requirements.
Then overhead is all the business logic you use to make sure your data is somewhere between being fast and being stale, which lead to complexity, and complexity leads to more code that you need to maintain and understand. You'll easily lose oversight of where data exists in the caching complex, at what level, and how to fix the stale data if you get it. It can easily get out of hand, so instead of doing caching on complex logic you revert to simple timestamps, and just say that a query is cached for a minute or so, and hope for the best (which, admittedly, can be quite effective and not too crazy). You could give your cache life-times (say, it will live X minutes in the cache) vs. access (it will live for 10 requests) vs. timed (it will live until 10pm) and variations thereof. The more variation, the more complexity, of course.
However, having said that, caching can turn a bog of a system into quite a snappy little vixen without too much effort or complexity. A little can get you a long way, and writing systems that use caching as a core component is something I'd recommend.
The main advantage, and also the goal, of caching is speeding up loading and minimizing system resources needed to load a page.
The main disadvantage is how it's implemented by the developers, and then maintaining proper caching system for the website, making it properly manageable by the Admin.
The above statements are purely said in general terms.
Caching is used to reduce hefty/slow operations (heavy calculations/parsing/database operations) which will consistently product the same result. Caching this result will reduce the server load and speed up the application (because the hefty/slow operation does not need executing)
The disadvantage is that it'll often increase complexity of the application, because the cache should be purged/altered when the result of the operation will no longer be the result cached.
Simple example: a website whose navigation is stored in the database could cache the navigation once the navigation has been fetched from the database, thus reducing the total amount of db-calls, because we no longer need to execute a query to retrieve the navigation.
When the navigation changes (e.g. a page had been added), the cached value for the navigation should be rebuilt, because the navigation that has been cached does not yet reflect the latest change: the new page is not present there.
When a page is Cached, instead of regenerating the page every time, they store a copy of what they send to your browser. The next time a visitor requests the same page, the script will know it'd already generated one recently, and simply send that to the browser without all the hassle of re-running database queries or searches.
Advantage of Caching:
Reduce load on Web Servers and Database
Page downloads faster
Disadvantage:
As information is stored in cache, it make page the heavy.
Sometimes the updated information doesnot show as the cache is not updated
Advantages and disadvantages of caching in web development totally depends upon our context!
Main advantage is reduce data retrieval time either from database or at page loading time.
and disadvantage is separate maintenance or using third party services or tools for that.
I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.
For a very large site such as a Social Network (say Facebook), which method would you recommend for user accounts storage?
1) Single XML files for each type of features, on the user's directory: basicinfo.xml, comments.xml, photos.xml, ...
2) MySQL, although not sure how to organize on this. Maybe separated tables for each feature? E.g. a tables for Comments, where columns are id,from,message,time?
I know XML is not designed for storage, and PHP (this is the language I use) must read the entire XML file and store in memory before it is used.
But, here are the reasons why I prefer XML (but I may be wrong, please tell me if you disagree with any):
1) If I have user accounts' paths organized in this way
User ID 2342:
/users/00/00/00/00/00/00/00/23/42/
I think it's faster to find the Comments of a user by file path than seeking in a large database.
Also, if each feature is split in tables, each user profile will seek more than once, to display comments, photos, basic info, etc.
2) I heard MySQL is globaly locked when writing on it. Is this true? If yes, I rather to lock a single file than everything.
3) Is MySQL "shared" between the cluster? I mean, if 1 disk gets full, will it "continue" on another? Or do I, as the programmer, have to manage it myself and create new databases on another disk? (note, I use Linux)
It is ok that it is about the same by using XML files, but it is easier to split between disks, because structure is split by account IDs, not by feature as it would be in a database.
4) Note that I don't store each comment on the comments.xml. I just note their attributes in each XML tag, and the messages are in separated text files commentid.txt. Once each XML should not be much large, there should not be problems with memory/time.
As for the problem of parsing entire XML, maybe I should think on using XMLReader/Writer instead of SimpleXML/DOM? Or, will it decrease performance allot?
Thank you!
Facebook uses MySQL.
That being said. Here's the long version:
I always say that XML is a data transfer technology, not a data storage technology, but not everyone agrees. XML is not designed to be use a relational datastore. XML was first introduced in order to provide a standard way of transmitting data from system to system w/o giving access to the originating systems.
Since you are talking about a large application, I would strongly urge you to use MySQL (or other RDBMS), as your dataset grows and grows the XML will be increasingly slower and slower unless you always keep a fresh copy in memory and only read the XML files upon service reboot.
Using an XML database is reportedly more efficient in terms of conversion costs when you're constantly sending XML into and retrieving XML out of a database. The rationale is, when XML is the only transport syntax used to get things in and out of the DB, why squeeze everything through a layer of SQL abstraction and all those relational tables, foreign keys, and the like? It basically takes a parsing layer out of the application and brings it into the data engine - where it's probably going to work faster and more efficiently than the SQL alternative. Probably.
Depends heavily on the nature of your site. On the one hand the XML approach gives you a free pass on things like “SELECT * FROM $table where $table.id=$id” type queries. On the other hand...
For a very large site, in the worst case scenario the data files end up pretty big too. If it is any kind of community site this may easily happen for any account go to any forum with a true number of old-guard members in its community and you'll find a couple of posters that have say 10K posts... This means you will wish for SQL style result sets which are implemented using a memory efficient model, rather than a speed efficient one. To the end user 1s versus 1.1s response time is not that much of a deal; but to you 1K of simultaneous requests versus 1.5K or better definitely is.
Then there is the aspect that if you are mostly reading data XML may be fine if somewhat crude for large data sets and DOM based implementations. But if you are writing a lot, things become much much worse. Caching of data is still possible, but giving ACID like guarantees on these file transactions requires you to pretty much write your own database software.
And then there is storage requirements and such like which mean you may need a distributed approach for storing your data. These kind of setups are relatively well understood in the database world, and they bring a lot of interesting problems with them to the table (like what do you do if a single disk fails?, how do you know on what disk to find the data and how do you implement efficient caching?) that essentially amount to again writing your own mini-database software from scratch.
So for a very large site I think the hard technical requirements of performance at not too great a cost in terms of memory and also a certain reliability and not needing to reinvent 21 wheels at the same time means that your approach would not work that well. I think it is better suited to smallish read-only sites where you can afford to experiment with and pursue alternative routes, where you can easily make changes and roll them out across the entire site.
IME: An in-house application using a single XML file for persistence didn't stand up to use by a single user...
1) What you're suggesting is that an XML file system with a manager application... There are XML databases, and XML there's been increasing support for storing XML within RDBMS. You're looking at re-inventing the wheel...
That's besides the normalization that would come out of storing the data in a RDBMS, which would enforce referential integrity that XML will never do...
2) "Global locking" is without any contextual scope. No database I know of locks globally when writing; most support degrees of locking (table/row/etc, varies between vendors) for sake of retaining concurrency when directed to - not by default.
3) Without a database, data or actual users--being concerned about clustering is definitely premature optimization.
4) If the system crashes without having written the referential integrity to some sort of persistence that will survive the application being turned off, the data will be useless.
Is there difference between caching PHP objects on disk rather than not? If cached, objects would only be created once for ALL the site visitors, and if not, they will be created once for every visitor. Is there a performance difference for this or would I be wasting time doing this? Thank you :)
The performance comes down to subsequent use so even if your object is cached once and used by all ... or cached once per visitor ... it is the subsequent use that counts. If your cached object is used 10,000,000 times a day then you will save.
If the cached object is used once or not at all, the gain is negligible.
It's like this:
File is way faster then Database.
Memory is way faster then files (when not swapping disk space).
So cache to memory what you need the most, and try and think when you need to make the effort to cache to file.
Remember: premature optimisation usually isn't the best thing to do.
You'll have to take in account that dynamic content must be cached whenever it changes. But, to static content, it's much more faster to use cached content, especially in database schemas, configurations, and that kind of thing.
Assuming that they are large objects, and especially if they are being built based on data pulled from a database, caching is a good idea. For some testing I did with a particular instance, it was quicker to build an object, write it to a file, and load from the file, than to go to the DB every time.
However: there are better ways than writing to a file. You would probably be better off storing the object in memory with memcached or APC, and don't forget that you may have issues with locks on the file is several people are hitting the site at once.
Caching objects on disk will give you the additional overhead of a disk write whenever there is a cache miss, whereas on a cache hit you will be able to save the step of instantiating the object.
If construction of the object is enough of a burden that saving it sometimes will more than outweighs the additional burden of a disk write, then go ahead and cache on disk.
I'd be curious, however, to know more about why instantiating your objects is so costly that you would want to do this.
#Aiden Bell
Errm. Depends on the object. If one instance can be shared then you save storage ... but it depends if the object is altered by the user. You will be reading/writing the object creating a bottle-neck. Without more information, you will have to lookup the theory here. – Aiden Bell 11 mins ago
--
These instances don't have data that can be altered by the user, however, they would contain data that would be used by users throughout the site.
Basically, when it comes down to it, the main question is:
Multiple objects in memory, PER user (each user has his own set of instantiated objects)
VS
Single objects in cached in file for all users (all users use the same objects, for example, same error handler class, same template handler class, and same database handle class)
--
Hi, anyone else have any ideas?