I have a large table and I'd like to store the results in Memcache. I have Memcache set up on my server but there doesn't seem to be any useful documentation (that I can find) on how to efficiently transfer large amounts of data. The only way that I currently can think of is to write a mysql query that grabs the key and value of the table and then saves that in Memcache. Its not a particularly scalable solution (especially when my query generates a few hundred thousand rows). Any advice on how to do this?
EDIT: there is some confusion about what I"m attempting to do. Lets say that I have a table with two fields (key and value). I am pulling in information on the fly and have to match it to the key and return the value. I'd like to avoid having to execute ~1000 queries per page load. Memcache seems like a perfect alternative because its set up to use key value. Lets say this table has 100K rows. THe only way that I know to get that data from the db table to memcache is to run a query that loops through every row in the table and creates an individual memcache row.
Questions: Is this a good way to use memcache? If yes, is there a better way to transfer my table?
you can actually pull all the rows in an array and store the array in memcache
memcache_set($memcache_obj, 'var_key', $your_array);
but you have to remember few things
PHP will serialize/unserialize the array from memcache so if you have many rows it might be slower then actually querying the DB
you cannot do any filtering (NO SQL), if you want to filter some items you have to implement this filter yourself and it would probably perform worst then the DB engine.
memcache won't store more then 1 megabyte ...
I don't know what you try to achieve but the general use of memcache is:
store the result of SQL/time consuming processing but the number of resulting row should be small
store some pre created (X)HTML blobs to avoid DB access.
user session storage
Russ,
It sounds almost as if using a MySQL table with the storage engine set to MEMORY might be your way to go.
A RAM based table gives you the flexibility of using SQL, and also prevents disk thrashing due to a large amount of reads/writes (like memcached).
However, a RAM based table is very volatile. If anything is stored in the table and not flushed to a disk based table, and you lose power... well, you just lost your data. That being said, ensure you flush to a real disk-based table every once in a while.
Also, another plus from using memory tables is you can store all the typical MySQL data types, so there is no 1MB size limit.
Related
I have the following web site:
The user inputs some data and based on it the server generates a lot of results, that need to be displayed back to the user. I am calculating the data with php, storing it in a MySQL DB and display it in Datatables with server side processing. The data needs to be saved for a limited time - on every whole hour the whole table with it is DROPPED and re-created.
The maximum observed load is: 7000 sessions/users per day, with max of 400 users at a single time. Every hour we have over 50 milion records inserted in the main table. We are using a Dedicated server with Intel i7 and 24GB ram, HDD disk.
The problem is that when more people (>100 at a time) use the site, the MySQL cannot handle the load and MySQL + hard disk become the bottleneck. The user has to wait minutes even for a few thousand results. The disk is HDD and for now there is not an option to put SSD.
The QUESTION(S):
Can replacing MySQL with Redis improve the performance and how much?
How to store the produced data in redis, so i can retrieve it for 1 user and sort it by any of the values and filter it?
I have the following data in php
$user_data = array (
array("id"=>1, "session"="3124", "set"=>"set1", "int1"=>1, "int2"=>11, "int3"=>111, "int4"=>1111),
array("id"=>2, "session"="1287", "set"=>"set2", "int1"=>2, "int2"=>22, "int3"=>222, "int4"=>2222)...
)
$user_data can be an array with length from 1 to 1-2milion (I am calculating it and inserting in the DB in chunks of 10000)
I need to store in redis data for at least 400 such users and be able to retrieve data for particular user in chunks of 10/20 for the pagination. I also need to be able to sort by any of the fields set (string), int1, int2... (i have around 22 int fields) and also filter by any of the integer fields ( similar to sql WHERE clause 9000 < int4 < 100000 ).
Also can redis make something similar to SQLs WHERE set LIKE '%value%'?
Probably Redis is a good fit for you problem, if you can hold all your data in memory. But you must re-think your data structure. Redis is very different than a relational database, and there is no direct migration.
As for you questions.
Probably it can help with performance. How much, it will depends on your use-case and data structure. Your constraint will not be hard-disk anymore, but maybe something else.
Redis have no concept similar to ORDER BY, or WHERE as SQL. You will be responsible to maintain your indices and filters.
I would create a HSET for every "record" and then, use several ZSET to create indexes of that records. (if you really need to order on any field, then you'll need one ZSET per field)
As for filters, the ZSET used for indexes, will probably be useful to filter ranges of int values.
Unfortunately for LIKE query, I really don't have a answer. When I need advanced search capabilities, I usually use ElasticSearch (in combination with redis and/or mysql)
1. Can replacing MySQL with Redis improve the performance and how much?
Yes, Redis can improve your basic read/write performance due to the fact that it stores the information directly in memory. This post describes a performance increase by a factor of 3, but the post is dated in 2009 so the numbers may have changed since.
However, this performance gain is only relevant as long as you have enough memory. Once you exceed the allotted amount of memory, your server will start swapping to disk, drastically reducing Redis performance.
Another thing to keep in mind is that information stored in Redis is not guaranteed to be persistent by default--the data set is only stored every 60 seconds or if at least 1000 keys change. Other changes will be lost on a server restart or power loss.
2. How to store the produced data in redis, so i can retrieve it for 1 user and sort it by any of the values and filter it?
Redis data store and has a different approach from traditional relational databases. It does not offer complex sorting, but basic sorting can be done through sorted sets and the SORT command. That will have to be done by the PHP server.
Redis does have any searching support--it will have to be implemented by your PHP server.
3. Conclusion
In my opinion, the best way to handle what you are asking is to use a Redis server for caching and the MySQL server for storing information that you need to be persistent (if you don't have any information that has to be persistent, you can just have the Redis server).
You said that
The data needs to be saved for a limited time - on every whole hour
the whole table with it is DROPPED and re-created.
which is perfect for Redis. Redis supports a TTL through the EXPIRE command on keys, which automatically deletes a key after a set amount of time. This way you don't need to drop and re-create any tables--Redis does it for you.
In PHP, does array_slice() serve good enough to process large data array that cannot be paginated since its not stored in database but calculated on other db tables.
Anyways, so I have an array of around 50k which might increase later. First time on page load it fetches all 50k records then slices it for ajax based pagination.
Will this cause server load in future since all records are being fetched on page load?
At first its a bad idea to create array containing 50K moreover it can encrease. It may "eat" all your memory in high traffic.
Also where you store sliced parts of array for using on ajax requests?
I think (if you can not set limit in query) you can create additional table in which you can store your data (with cron for example) and show users data from it using limit for pagination, or you can create caching layer (or use existed caching systems: file cache, php memcache, ...), and write some algorithm for updating cache (it depends on your programm logic).
I have a program that creates logs and these logs are used to calculate balances, trends, etc for each individual client. Currently, I store everything in separate MYSQL tables. I link all the logs to a specific client by joining the two tables. When I access a client, it pulls all the logs from the log_table and generates a report. The report varies depending on what filters are in place, mostly date and category specific.
My concern is the performance of my program as we accumulate more logs and clients. My intuition tells me to store the log information in the user_table in the form of a serialized array so only one query is used for the entire session. I can then take that log array and filter it using PHP where as before, it was filtered in a MYSQL query (using multiple methods, such as BETWEEN for dates and other comparisons).
My question is, do you think performance would be improved if I used serialized arrays to store the logs as opposed to using a MYSQL table to store each individual log? We are estimating about 500-1000 logs per client, with around 50000 clients (and growing).
It sounds like you don't understand what makes databases powerful. It's not about "storing data", it's about "storing data in a way that can be indexed, optimized, and filtered". You don't store serialized arrays, because the database can't do anything with that. All it sees is a single string without any structure that it can meaningfully work with. Using it that way voids the entire reason to even use a database.
Instead, figure out the schema for your array data, and then insert your data properly, with one field per dedicated table column so that you can actually use the database as a database, allowing it to optimize its storage, retrieval, and database algebra (selecting, joining and filtering).
Is serialized arrays in a db faster than native PHP? No, of course not. You've forced the database to act as a flat file with the extra dbms overhead.
Is using the database properly faster than native PHP? Usually, yes, by a lot.
Plus, and this part is important, it means that your database can live "anywhere", including on a faster machine next to your webserver, so that your database can return results in 0.1s, rather than PHP jacking 100% cpu to filter your data and preventing users of your website from getting page results because you blocked all the threads. In fact, for that very reason it makes absolutely no sense to keep this task in PHP, even if you're bad at implementing your schema and queries, forget to cache results and do subsequent searches inside of those cached results, forget to index the tables on columns for extremely fast retrieval, etc, etc.
PHP is not for doing all the heavy lifting. It should ask other things for the data it needs, and act as the glue between "a request comes in", "response base data is obtained" and "response is sent back to the client". It should start up, make the calls, generate the result, and die as fast as it can again.
It really depends on how you need to use the data. You might want to look into storing with mongo if you don't need to search that data. If you do, leave it in individual rows and create your indexes in a way that makes them look up fast.
If you have 10 billion rows, and need to look up 100 of them to do a calculation, it should still be fast if you have your indexes done right.
Now if you have 10 billion rows and you want to do a sum on 10,000 of them, it would probably be more efficient to save that total somewhere. Whenever a new row is added, removed or updated that would affect that total, you can change that total as well. Consider a bank, where all items in the ledger are stored in a table, but the balance is stored on the user account and is not calculated based on all the transactions every time the user wants to check his balance.
I am storing some history information on my website for future retrieval of the users. So when they visit certain pages it will record the page that they visited, the time, and then store it under their user id for future additions/retrievals.
So my original plan was to store all of the data in an array, and then serialize/unserialize it on each retrieval and then store it back in a TEXT field in the database. The problem is: I don't know how efficient or inefficient this will get with large arrays of data if the user builds up a history of (e.g.) 10k pages.
EDIT: So I want to know what is the most efficient way to do this? I was also considering just inserting a new row in the database for each and every history, but then this would make a large database for selecting things from.
The question is what is faster/efficient, massive amount of rows in database or massive serialized array? Any other better solutions are obviously welcome. I will eventually be switching to Python, but for now this has to be done in PHP.
There is no benefit to storing the data as serialized arrays. Retrieving a big blob of data, de-serializing, modifying it and re-serializing to update is slow - and worse, will get slower the larger the piece of data (exactly what you're worried about).
Databases are specifically designed to handle large numbers of rows, so use them. You have no extra cost per insert as the data grows, unlike your proposed method, and you're still storing the same amount of data, so let the database do what it does best, and keep your code simple.
Storing the data as an array also makes any sort of querying and aggregation near impossible. If the purpose of the system is to (for example) see how many visits a particular page got, you would have to de-serialize every record, find all the matching pages, etc. If you have the data as a series of rows with user and page, it's a trivial SQL count query.
If, one day, you find that you have so many rows (10,000 is not a lot of rows) that you're starting to see performance issues, find ways to optimize it, perhaps through aggregation and de-normalization.
you can check for session variable and store all data of one session and can dump it together into database.
You can do Indexing at db level to save time.
Last and the most effective thing you can do is to do operation/manipulation on data and store it in separate table.And always select data from manuplated table.You can achieve this using cron job or some schedular.
I'm creating a web app using PHP and PostgreSQL. Database are too big and some kinds of
search takes a lot of time to process.
User can wait into first search, but it sucks when paginates. Can I store the resultset into sessions vars?
It would be great to deal with pagination.
Yes, of course you can, but with a lot of different resultsets, loading the amount of data on a session_start can become cumbersome. Also, the same search could be done by more then one user, resulting in duplicate (time consuming) retrievals / large sessions (every pageload takes longer) / more storage space.
If the data is safe from alteration enough to store in a session, look at caches like APC or Memcached to store the result with a meaningful key. That way, the load is lifted from the database, and searches can be shared amongst users.
If you have a fixed number of items for pagination, you could even consider storing the different 'pages' with a pre- or postfix in the key, so you don't have to select a subset of the result every time (i.e. store with a key like 'search:foo=bar|1' / 'search:foo=bar|2' etc.)