I'm creating a web app using PHP and PostgreSQL. Database are too big and some kinds of
search takes a lot of time to process.
User can wait into first search, but it sucks when paginates. Can I store the resultset into sessions vars?
It would be great to deal with pagination.
Yes, of course you can, but with a lot of different resultsets, loading the amount of data on a session_start can become cumbersome. Also, the same search could be done by more then one user, resulting in duplicate (time consuming) retrievals / large sessions (every pageload takes longer) / more storage space.
If the data is safe from alteration enough to store in a session, look at caches like APC or Memcached to store the result with a meaningful key. That way, the load is lifted from the database, and searches can be shared amongst users.
If you have a fixed number of items for pagination, you could even consider storing the different 'pages' with a pre- or postfix in the key, so you don't have to select a subset of the result every time (i.e. store with a key like 'search:foo=bar|1' / 'search:foo=bar|2' etc.)
Related
I'm designing an application in PHP which involves Trie data structure.
For time efficient prefix search, I'm using Trie.
I'm constructing the Trie using records from the database.
Now, the database has millions of records. So it is not feasible to everytime create the Trie and then search in it, for every new user request.
Instead can I create the Trie only once and somehow store this information, such that it does not have to be re-created for every new user request, and then searching can be immediately done. Is there somehow I can cache the created Trie (not just for one user session, but for all user requests) using PHP?
Any help would be much appreciated.
You have a couple of standard options.
Cache the database result in memory, using a simple cache like memcached
Cache using Redis, perhaps taking advantage of some of its extra features. This might involve a process where you load the data into a structure in REDIS and have your trie search code work against Redis directly rather than the database result set.
In either case, you are going to cache the result for some period of time that is acceptable, and since the database result will be in memory in some form, there is no load placed on the RDBMS.
In your related question, you indicated that he raw serialized form of the variable would be about 200mb in size. That is well within the max object size (512mb) for Redis, but could be problematic for memcached. I personally use Redis for most app server caching these days.
In PHP, does array_slice() serve good enough to process large data array that cannot be paginated since its not stored in database but calculated on other db tables.
Anyways, so I have an array of around 50k which might increase later. First time on page load it fetches all 50k records then slices it for ajax based pagination.
Will this cause server load in future since all records are being fetched on page load?
At first its a bad idea to create array containing 50K moreover it can encrease. It may "eat" all your memory in high traffic.
Also where you store sliced parts of array for using on ajax requests?
I think (if you can not set limit in query) you can create additional table in which you can store your data (with cron for example) and show users data from it using limit for pagination, or you can create caching layer (or use existed caching systems: file cache, php memcache, ...), and write some algorithm for updating cache (it depends on your programm logic).
I'm creating a web service that often scrapes data from remote web pages. After scraping this data, I have a simple multidimensional array of information to use. The scraping process is fairly taxing on my server, and the page load takes a while. I was considering adding a simple cache system using a MySQL database, where I create one row per remote web page with a the array of information pulled from it stored as a JSON encoded string. Is this a good enough system? Or would something like a text file per web page be a better idea?
Since you're scraping multiple web pages, and you want to your data to be persistently cached, you have a few options -- the best of which would be to use memcache or a database such as MySQL. Using text files is not a good idea, because you would have to serialize / deserialize your data, and read from your filesystem. To query a database or a memcache is many times more efficient.
Since you're probably looking for your cache to be somewhat persistent, I would suggest going with MySQL. You would simply create a table that has an auto-incrementing primary key, which a column for each element in your parsed JSON object. (Note that MySQL currently does not support arrays. In order to emulate them, you will need to use relational tables, or serialize your array data and provide it to a text field. The former method is preferred).
Every time you scrape a page, you would run an UPDATE statement to update that individual page's information in the database. If you specify a unique index on whatever you use to uniquely identify your page (URL / etc), you will achieve optimal look-up performance.
If you're looking to store the cache locally on 1 server (e.g. if your mysql server and http server are on the same box), you might be better off using APC, which is a cache service that comes with PHP.
If you're looking to store the data remotely (e.g. a dedicated cache box) then I would go with Memcache instead of MySQL.
"When all you have is a hammer ..."
I don;'t tend to have particularly large APC configs, 64 - 128MB max. Memcache can go to a couple of gigabytes or maybe more (far more if you run multiple instances). Both are also transient - a restart of Apache, or Memcache (the the latter is slightly less likely, or often) will lose the data
It depends then, on how often you are willing to process the data to produce the cache, and how long that cache could otherwise be useful for. If it was good for weeks before you re-scraped the pages - Mysql is a entirely suitable backing store.
Potential pther options, depending on how many items are being cached & how big the data is, are, as you suggest, a file-based cache, SQlite, or other systems.
I have a large table and I'd like to store the results in Memcache. I have Memcache set up on my server but there doesn't seem to be any useful documentation (that I can find) on how to efficiently transfer large amounts of data. The only way that I currently can think of is to write a mysql query that grabs the key and value of the table and then saves that in Memcache. Its not a particularly scalable solution (especially when my query generates a few hundred thousand rows). Any advice on how to do this?
EDIT: there is some confusion about what I"m attempting to do. Lets say that I have a table with two fields (key and value). I am pulling in information on the fly and have to match it to the key and return the value. I'd like to avoid having to execute ~1000 queries per page load. Memcache seems like a perfect alternative because its set up to use key value. Lets say this table has 100K rows. THe only way that I know to get that data from the db table to memcache is to run a query that loops through every row in the table and creates an individual memcache row.
Questions: Is this a good way to use memcache? If yes, is there a better way to transfer my table?
you can actually pull all the rows in an array and store the array in memcache
memcache_set($memcache_obj, 'var_key', $your_array);
but you have to remember few things
PHP will serialize/unserialize the array from memcache so if you have many rows it might be slower then actually querying the DB
you cannot do any filtering (NO SQL), if you want to filter some items you have to implement this filter yourself and it would probably perform worst then the DB engine.
memcache won't store more then 1 megabyte ...
I don't know what you try to achieve but the general use of memcache is:
store the result of SQL/time consuming processing but the number of resulting row should be small
store some pre created (X)HTML blobs to avoid DB access.
user session storage
Russ,
It sounds almost as if using a MySQL table with the storage engine set to MEMORY might be your way to go.
A RAM based table gives you the flexibility of using SQL, and also prevents disk thrashing due to a large amount of reads/writes (like memcached).
However, a RAM based table is very volatile. If anything is stored in the table and not flushed to a disk based table, and you lose power... well, you just lost your data. That being said, ensure you flush to a real disk-based table every once in a while.
Also, another plus from using memory tables is you can store all the typical MySQL data types, so there is no 1MB size limit.
Within a php/mysql system we have a number of configuration values (approx 200) - these are mostly booleans or ints and store things such as the number of results per page and whether pages are 2 or 3 columns. This is all stored in a single mysql table and we use a single function to return these values as they are requested, on certain pages loads there can probably be up to around 100 requests to this config table. With the number of sites using this system this means potentially thousands of requests each second to retrieve these values. The question is whether this method makes sense or whether it would be preferable to perform a single request per page and store all the configs in an array and retrieve from here each time instead.
Use a cache such as memcache, APC, or any other. Load the settings once, cache it, and share it through your sessions with a singleton object.
Even if the query cache is saved, it's a waste of time and resources to query the database over and over. Rather, on any request that modifies the values, invalidate the cache that is in memory so it is reloaded immediately the next time someone requests a value from it.
If you enable MySQL query cache, the query that selects your values will be cached in memory, and MySQL will give an instant answer from memory unless the query or data in the underlying tables are changed.
This is excellent both for performance and for manageability.
The query results may be reused between the sessions: that means, if you have 1000 sessions, you don't need to keep 1000 copies of your data.
You might want to consider using memcache for this. I think it would be faster than multiple DB queries (even with query caching on), and you won't need a database connection to get them.
You might want to consider just loading them from a flat file into memory, this affords you the opportunity to version control your config values.
I would defintely recommend memcache for this. We have a similar setup and it has noticably brought resource usage down on that server.