Long story short, I am looking for the best way to quickly and efficently store, mostly, boolean variables, like:
Has current user viewed this page? (Boolean)
Has current user voted for this page? (Boolean again)
How many times today this user got points for voting? (Integer)
These variables are going to be stored only for ONE day, that is at midnight each day they will be removed.
I can think of five ways to accomplish this, but I don't know how to properly speedtest them, so I could certainly use some help with this.
1. Single File - Single Variable
The first idea is to store some variable in a file like this <?php $___XYZ = true;, then include it and return $___XYZ. The problem is, most likely there are going to be hundreds of these variables and this can take potentially a lot of space (since, correct me if I am wrong, each file takes minimum ~4KB of space, depending on partition format). Big plus is ease of access, easy to work with, and easy to clear the whole thing at the beginning of a day (just delete the whole folder with contents). Any problems with speed of access?
2. Single File - Many Variables
I could store groups of variables in one file, in such fashion:
0:1
1:1
14:0
154:0
Then use fgets to find and read the variable but what about writing mid-file? Can fwrite be used effectively? I am not really confident this way is much better than 1., but what do you think?
3. APC
Use apc_store and others to store, modify and access the data. I have three concerns here - I read somewhere that enabling APC can seriously slow down your site, that there are sometimes strange problems with caching, and am curious about how to effectively remove only the "daily" cache, and leave anything else I might have cached? And how fine is it with hundreds of thousands variables stored in it?
4. MySQL Database
I could create a table with two rows (name and variable) but... I have this feeling it will be painfully slow when compared to any from the above options.
To sum it up - which of these ways to store variables in PHP is the best? Or maybe there is something even better?
For profiling, you can use Xdebug, which stores profiling informations in the defined folder, and use webgrind to view the profiling data.
My settings in php.ini for xdebug:
zend_extension=C:/WEB/PHP-ts/php_xdebug-2.1.0-5.3-vc6.dll
xdebug.collect_params=4
xdebug.show_local_vars=on
xdebug.scream=1
xdebug.collect_vars=on
xdebug.dump_globals=on
xdebug.profiler_enable=1
xdebug.profiler_output_dir=C:/WEB/_profiler/
xdebug.profiler_output_name=cachegrind.%s.out
xdebug.collect_return=1
xdebug.collect_assignments=1
xdebug.show_mem_delta=1
And I found a blog post about cache performance comparison (but it's from 2006!):
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
What about memcached? It's really fast, and when you're just storing bools it all fits in memory no problem. It is definitely the fastest option of them all. At midnight, you can easily read out all the stats gathered during the day and clear the cache.
Use memcached, you can store variables in memory and set their expiry time, so they can be rotated like everyday.
Memcached is way faster than any other method you listed, if you're new to it, try this class i did.
My favorite way of doing this is a variation of #2, where I make an array out of the number pairs. Then it is easy to serialize the array and save it to a file.
For your application this has a disadvantage if multiple visitors/processes need access to the array at the same time. Perhaps there is a way around that by using a separate file for each user.
I would opt for mySQL and/or an in memory cache like APC or memcache.
Honestly, a properly indexed database table will probably be plenty fast for most operations. And you can easily clear all the records via a DELETE statement comparing timestamps.
It will definitely be faster than home-brew solution on the filesystem. And assuming you're already using mySQL for the rest of your site, you don't need to worry about an extra storage layer.
EDIT: I'd also like to point out that memory is volatile. Should your server lose power, your data will disappear if it's not persisted somewhere (like a database).
Related
Users in my webgame are having certain player information cached in the $_SESSION of PHP.
Each time they load the game it checks if the session exists, if not they get the player information from a MySQL database and then it gets stored in the $_SESSION.
Now my problem is, what if the player information gets updated by another process or player? They can't update the $_SESSION cache of the other player.
I know memcached is most probably the solution for this, but I'm not sure if I should take the time for something like this. $_SESSION cache is doing well for me, except for this.
I was thinking about creating a MySQL table for it which get read at every request and if there's a record for the player that it recreates the cache.
One other solution would be to create a file in a directory with the id of the player in the name of the file. Every request PHP will check with file_exist if it should clear the cache or not.
What would you guys do? It gets executed every request so it's pretty important to get this optimized.
From a design standpoint alone I'd avoid the file_exists and directory approach. Sure 'file_exists' is fast, but it won't scale well... What happens if a use changes their name?
If you're using APC (and you should) you could APC's user memory cache. As long as you're on a single server it should give you similar performance benifits as memcached without the need for a separate memory caching server process. If a user entry changes frequently, you could run into fragmemntation issues with APC though. In that case, time to bite the bullet and go with memcached--you can even store your session data in memcached for a performance boost.
Also, neither APC or your file_exists solution will scale to multiple load balanced servers--you'd need a DB solution or memcached for that.
The way you exposed it, is not about how fast is one vs the other, the SESSION approach is just not valid because of your concurrency issue.
If your data can change concurrently, then your data storage needs to be able to handle that concurrency and whatever caching layer you want to use needs to behave accordingly to the nature of your problem.
If it is only about cache, and you dont want to install memcache(d), you can go with a mysql table in memory. It is not as fast as memcached, but still a fine solution. And make sure to create proper indexes on all your tables (maybe that is the better solution, no cache, just select it from your table).
CREATE TABLE t (i INT) ENGINE = MEMORY;
I have searched for a few hours already but have found nothing on the subject.
I am developing a website that depends on a query to define the elements that must be loaded on the page. But to organize the data, I must repass the result of this query 4 times.
At first try, I started using mysql_data_seek so I could repass the query, but I started losing performance. Due to this, I tried exchanging the mysql_data_seek for putting the data in an array and running a foreach loop.
The performance didn't improve in any way I could measure, so I started wondering which is, in fact, the best option. Building a rather big data array ou executing multiple times the mysql_fetch_array.
My application is currently running with PHP 5.2.17, MySQL, and everything is in a localhost. Unfortunatly, I have a busy database, but never have had any problems with the number of connections to it.
Is there some preferable way to execute this task? Is there any other option besides mysql_data_seek or the big array data? Has anyone some information regarding benchmarking testes of these options?
Thank you very much for your time.
The answer to your problem may lie in indexing appropriate fields in your database, most databases also cache frequently served queries but they do tend to discard them once the table they go over is altered. (which makes sense)
So you could trust in your database to do what it does well: query for and retrieve data and help it by making sure there's little contention on the table and/or placing appropriate indexes. This in turn can however alter the performance of writes which may not be unimportant in your case, only you really can judge that. (indexes have to be calculated and kept).
The PHP extension you use will play a part as well, if speed is of the essence: 'upgrade' to mysqli or pdo and do a ->fetch_all(), since it will cut down on communication between php process and the database server. The only reason against this would be if the amount of data you query is so enormous that it halts or bogs down your php/webserver processes or even your whole server by forcing it into swap.
The table type you use can be of importance, certain types of queries seem to run faster on MYISAM as opposed to INNODB. If you want to retool a bit then you could store this data (or a copy of it) in mysql's HEAP engine, so just in memory. You'd need to be careful to synchronize it with a disktable on writes though if you want to keep altered data for sure. (just in case of a server failure or shutdown)
Alternatively you could cache your data in something like memcache or by using apc_store, which should be very fast since it's in php process memory. The big caveat here is that APC generally has less memory available for storage though.(default being 32MB) Memcache's big adavantage is that while still fast, it's distributed, so if you have multiple servers running they could share this data.
You could try a nosql database, preferably one that's just a key-store, not even a document store, such as redis.
And finally you could hardcode your values in your php script, make sure to still use something like eaccelerator or APC and verify wether you really need to use them 4 times or wether you can't just cache the output of whatever it is you actually create with it.
So I'm sorry I can't give you a ready-made answer but performance questions, when applicable, usually require a multi-pronged approach. :-|
I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.
I have some small sets of data from the database (mysql) who are seldom updated.
Basically 3 or 4 small bi dimensional arrays (50-200 items).
This is the ideal case for memcached, but I'm on a shared server and can't install anything.
I only have PHP and MySQL.
I'm thinking about storing the arrays on file and regenerate the file via a cron job every 2-3 hours.
Any better idea or suggestion about this approach?
What's the best way to store those arrays?
If you're working with an overworked MySQL server then yes, cache that data into a file. Then you have two ways to update your cache: either via a cron job, unconditionally, every N minutes (I wouldn't update it less frequently than every hour) or everytime the data changes. The best approach depends on your specific situation. In general, the cron job way is the simplest but the on-change way pretty much guarantees that you won't ever use stale data.
As for the storage format, you could just serialize() the array and save the string to a file. With big arrays, unserialize() is faster than a big array(...) declaration.
As said in the comments, it would be better to check whether the root of the problem can't be fixed first. A roundtrip that long sounds like a network configuration problem.
Otherwise, if the DB simply is that slow, nothing speaks against a filesystem based cache. You could turn each query into an md5() hash, and use that as a file name. Serialize() the result set into the file and fetch it from there. Use filemtime() to determine whether the cache file is older than x hours. If it is, regenerate the query - or in fact, to avoid locking problems on the cache files, use a cron job to regenerate it.
Just note that this way, you would be dealing with whole result sets that you have to load into your script's memory all at once. You wouldn't have the advantage of being able to query a result set row by row. This can be done too in a cached way, but it's more complicated.
My english is not good, sorry.
Some times I have read about any alternative to memcache. Is complex, but I think that you can use http://www.php.net/manual/en/ref.sem.php acceding to shared memory.
A simple class example used for storing data is here:
http://apuntesytrucosdeprogramacion.blogspot.com/2007/12/php-variables-en-memoria-compartida.html
Is written in spanish, sorry, but the code is easy to understand (Eliminar=delete)
I never have test this code!! and I don't know if it's viable in a shared server.
First of all, the website I run is hosted and I don't have access to be able to install anything interesting like memcached.
I have several web pages displaying HTML tables. The data for these HTML tables are generated using expensive and complex MySQL queries. I've optimized the queries as far as I can, and put indexes in place to improve performance. The problem is if I have high traffic to my site the MySQL server gets hammered, and struggles.
Interestingly - the data within the MySQL tables doesn't change very often. In fact it changes only after a certain 'event' that takes place every few weeks.
So what I have done now is this:
Save the HTML table once generated to a file
When the URL is accessed check the saved file if it exists
If the file is older than 1hr, run the query and save a new file, if not output the file
This ensures that for the vast majority of requests the page loads very fast, and the data can at most be 1hr old. For my purpose this isn't too bad.
What I would really like is to guarantee that if any data changes in the database, the cache file is deleted. This could be done by finding all scripts that do any change queries on the table and adding code to remove the cache file, but it's flimsy as all future changes need to also take care of this mechanism.
Is there an elegant way to do this?
I don't have anything but vanilla PHP and MySQL (recent versions) - I'd like to play with memcached, but I can't.
Ok - serious answer.
If you have any sort of database abstraction layer (hopefully you will), you could maintain a field in the database for the last time anything was updated, and manage that from a single point in your abstraction layer.
e.g. (pseudocode): On any update set last_updated.value = Time.now()
Then compare this to the time of the cached file at runtime to see if you need to re-query.
If you don't have an abstraction layer, create a wrapper function to any SQL update call that does this, and always use the wrapper function for any future functionality.
There are only two hard things in
Computer Science: cache invalidation
and naming things.
—Phil Karlton
Sorry, doesn't help much, but it is sooooo true.
You have most of the ends covered, but a last_modified field and cron job might help.
There's no way of deleting files from MySQL, Postgres would give you that facility, but MySQL can't.
You can cache your output to a string using PHP's output buffering functions. Google it and you'll find a nice collection of websites explaining how this is done.
I'm wondering however, how do you know that the data expires after an hour? Or are you assuming the data wont change that dramatically in 60 minutes to warrant constant page generation?