I beleive the standard way to cache something with mamcached is to insert the object into the cache for a set period of time e.g.
memcli:set(key, rows_array, 5 * 60)
Is there not a better way to cache things where the cache will check inside the database to see if the data has changed rather than relying on a timer which could cause sync issues?
I'm going to use PHP.
The cache will not check the database, because that is contrary to the idea of caching.
Instead you could either update or remove objects from the cache when you change their value in the database.
If data is subject to be modified in the database, not controlled by your application where you can implement your cache in write-through fashion, then that data is probably not a good candidate for caching (assuming you can live with the stale data until it is evicted).
The cache certainly does not check the database. You have to do that yourself. You know how the tables in the database are related to each other and what tables you need to pull information from to render a page. Do that (or don't if you are satisfied with what you told mamcached).
Is there not a better way to cache
things where the cache will check
inside the database to see if the data
has changed rather than relying on a
timer which could cause sync issues?
That timer is not because of checking of database, but to free up memory(evicting data from the cache).
From google app engine(python):
Memcache is typically used with the
following pattern: The application
receives a query from the user or the
application. The application checks
whether the data needed to satisfy
that query is in memcache. If the data
is in memcache, the application uses
that data. If the data is not in
memcache, the application queries the
datastore and stores the results in
memcache for future requests. The
pseudocode below represents a typical
memcache request:
def get_data():
data = memcache.get("key")
if data is not None:
return data
else:
data = self.query_for_data()
memcache.add("key", data, 60)
return data
On update of key(database) you will also have to do an update to memcache key.
You may want to consider using redis. While not persistant (semi), this will offer some data storage mechanism and has a fast performance.
http://www.redis.io
Related
I'm designing an application in PHP which involves Trie data structure.
For time efficient prefix search, I'm using Trie.
I'm constructing the Trie using records from the database.
Now, the database has millions of records. So it is not feasible to everytime create the Trie and then search in it, for every new user request.
Instead can I create the Trie only once and somehow store this information, such that it does not have to be re-created for every new user request, and then searching can be immediately done. Is there somehow I can cache the created Trie (not just for one user session, but for all user requests) using PHP?
Any help would be much appreciated.
You have a couple of standard options.
Cache the database result in memory, using a simple cache like memcached
Cache using Redis, perhaps taking advantage of some of its extra features. This might involve a process where you load the data into a structure in REDIS and have your trie search code work against Redis directly rather than the database result set.
In either case, you are going to cache the result for some period of time that is acceptable, and since the database result will be in memory in some form, there is no load placed on the RDBMS.
In your related question, you indicated that he raw serialized form of the variable would be about 200mb in size. That is well within the max object size (512mb) for Redis, but could be problematic for memcached. I personally use Redis for most app server caching these days.
Users in my webgame are having certain player information cached in the $_SESSION of PHP.
Each time they load the game it checks if the session exists, if not they get the player information from a MySQL database and then it gets stored in the $_SESSION.
Now my problem is, what if the player information gets updated by another process or player? They can't update the $_SESSION cache of the other player.
I know memcached is most probably the solution for this, but I'm not sure if I should take the time for something like this. $_SESSION cache is doing well for me, except for this.
I was thinking about creating a MySQL table for it which get read at every request and if there's a record for the player that it recreates the cache.
One other solution would be to create a file in a directory with the id of the player in the name of the file. Every request PHP will check with file_exist if it should clear the cache or not.
What would you guys do? It gets executed every request so it's pretty important to get this optimized.
From a design standpoint alone I'd avoid the file_exists and directory approach. Sure 'file_exists' is fast, but it won't scale well... What happens if a use changes their name?
If you're using APC (and you should) you could APC's user memory cache. As long as you're on a single server it should give you similar performance benifits as memcached without the need for a separate memory caching server process. If a user entry changes frequently, you could run into fragmemntation issues with APC though. In that case, time to bite the bullet and go with memcached--you can even store your session data in memcached for a performance boost.
Also, neither APC or your file_exists solution will scale to multiple load balanced servers--you'd need a DB solution or memcached for that.
The way you exposed it, is not about how fast is one vs the other, the SESSION approach is just not valid because of your concurrency issue.
If your data can change concurrently, then your data storage needs to be able to handle that concurrency and whatever caching layer you want to use needs to behave accordingly to the nature of your problem.
If it is only about cache, and you dont want to install memcache(d), you can go with a mysql table in memory. It is not as fast as memcached, but still a fine solution. And make sure to create proper indexes on all your tables (maybe that is the better solution, no cache, just select it from your table).
CREATE TABLE t (i INT) ENGINE = MEMORY;
Reading this brief example of using memcached with PHP, I was wondering how memcached knows when a request for data needs to actually come from the database instead of coming from the cache.
It doesn't. It comes down to your caching strategy. That is so with all forms of cache, a tradeoff between getting the latest data and getting some data quickly. If you need to have the data up to date, invalidate (delete) the cache when updating the original. If performance is more important, let the cache expire by itself, at which point it will be renewed. Or something somewhere in-between. It depends on your restrictions and goals.
It doesn't, you code does this. In most cases you will do something like this:
key = /* build cache key somehow */
data = memcache.get(key)
if data is null:
data = /* read data from database */
cached.set(key, data)
// now you can use the data
I think you need to program that logic.
e.g. When you update the database then update the memcached value associated with that key, or make that key expire.
I'm creating a web service that often scrapes data from remote web pages. After scraping this data, I have a simple multidimensional array of information to use. The scraping process is fairly taxing on my server, and the page load takes a while. I was considering adding a simple cache system using a MySQL database, where I create one row per remote web page with a the array of information pulled from it stored as a JSON encoded string. Is this a good enough system? Or would something like a text file per web page be a better idea?
Since you're scraping multiple web pages, and you want to your data to be persistently cached, you have a few options -- the best of which would be to use memcache or a database such as MySQL. Using text files is not a good idea, because you would have to serialize / deserialize your data, and read from your filesystem. To query a database or a memcache is many times more efficient.
Since you're probably looking for your cache to be somewhat persistent, I would suggest going with MySQL. You would simply create a table that has an auto-incrementing primary key, which a column for each element in your parsed JSON object. (Note that MySQL currently does not support arrays. In order to emulate them, you will need to use relational tables, or serialize your array data and provide it to a text field. The former method is preferred).
Every time you scrape a page, you would run an UPDATE statement to update that individual page's information in the database. If you specify a unique index on whatever you use to uniquely identify your page (URL / etc), you will achieve optimal look-up performance.
If you're looking to store the cache locally on 1 server (e.g. if your mysql server and http server are on the same box), you might be better off using APC, which is a cache service that comes with PHP.
If you're looking to store the data remotely (e.g. a dedicated cache box) then I would go with Memcache instead of MySQL.
"When all you have is a hammer ..."
I don;'t tend to have particularly large APC configs, 64 - 128MB max. Memcache can go to a couple of gigabytes or maybe more (far more if you run multiple instances). Both are also transient - a restart of Apache, or Memcache (the the latter is slightly less likely, or often) will lose the data
It depends then, on how often you are willing to process the data to produce the cache, and how long that cache could otherwise be useful for. If it was good for weeks before you re-scraped the pages - Mysql is a entirely suitable backing store.
Potential pther options, depending on how many items are being cached & how big the data is, are, as you suggest, a file-based cache, SQlite, or other systems.
Is it "better" (more efficient, faster, more secure, etc) to (A) cache data that is used on every page load in the $_SESSION array (but still querying a table for a flag to reload the data fresh), or (B) to load it from the database each time?
I'm using the cache method (A), but I'm worried that with hundreds of users, memory could become an issue? It's just simple data, like firstname, lastname, birthday, etc.
With either method, there's still a query being run. Thoughts?
If your data is used on every pages, and is the same for all users, I wouldn't cache it in $_SESSION (which means having a different copy of that data for each user), but with another mecanism, like :
file
In memory, with APC for instance (if only 1 server)
In memory, with memcached, for instance (if you have several servers)
If your data requires long calculations or several DB queries to be obtained, caching it in database could be another possibility (would mean only 1 query to fetch back, and less calculations)
If your data is not the same for each user (which seems to be the case in your situation, as you are caching names, birthdates, ...) :
I would make sure I only cache what is necessary
Once you only have a few data to cache, putting it in session should be quite OK
If you really have that many users, you'll probably have some other scalability problems, and will most likely come to use something like memcached anyway ; which means you'll have some other way of caching ;-)
As a sidenote : if you are doing the same query over and over again, you DB server should cache it by itself (for MySQL, it would go into the "query cache") ; so, it would not be as bad as you think, I suppose -- even if not that much optimized ^^
It depends on what you're session handler is. Your session handler could be MySQL, and thus the question would not be which is better, but how to optimize your session handling.
The default PHP session handler is files, but it can be changed to mysql quite easily.
If you're talking about non-user specific data, then just save it to the DB. Worry about optimizing if you run into problems later. It is usually much more beneficial to use a better design pattern then thinking about optimizing before hand. Design your code so you can easily use a different handler for storage, and you won't have optimizing problems later.
If it is user specific, use the session, but use an appropriate session handler if necessary.