Redis - Best data structure to store and then fetch large data - php

I've recently implemented Redis into one of my Laravel projects. It's currently more of an technical exercise as opposed to production as I want to see what it's capable of.
What I've done is created a list of payment transactions. What I'm pushing to the list is the payload which I receive from a webhook every time a transaction is processed. The payload is essentially an object containing all the information to do with that particular transaction.
I've created a VueJS frontend that then displays all the data in a table and has pagination so it's show 10 rows at a time.
Initially this was working super quick but now that the list contains 30,000 rows which is about 11MB worth of data, the request is taking about 11seconds.
I think the issue here is that I'm using a list and am fetching all the rows from the list using LRANGE.
The reason I used a list was because it has the LPUSH command so that latest transactions go to the start of the list.
I decided to do a test where I got all the data from the list and outputted the value to a blank page and this took about the same time so it's not an issue with Vue, Axios, etc.
Firslty, is this read speed normal? I've always heard that Redis is blazing fast.
Secondly, is there a better way to increase read performance when using Redis?
Thirdly, am I using the wrong data type?
In time I need to be able to store 1m rows of data.

As I realized you get all 30,000 rows in any transaction update and then paginate it in frontend. In my opinion, the true strategy is getting lighter data packs in each request.
For example, use Laravel pagination in response to your request.

In my opinion:
Firstly: As you know, Redis is blazing fast and Redis is really fast. Because Redis data always in memory, you say read 11MB data about use 11s, you can check your bandwidth
Secondly: I'm sorry I don't know how to increase in this env.
Thirdly: I think your choice ok.
So, you can check your bandwidth first(redis server).

Related

MySQL or JSON for data retrieval

So, I have situation and I need second opinion. I have database and it' s working great with all foreign keys, indexes and stuff, but, when I reach certain amount of visitors, around 700-800 co-current visitors, my server hits bottle neck and displays "Service temporarily unavailable." So, I had and idea, what if I pull data from JSON instead of database. I mean, I would still update database, but on each update I would regenerate JSON file and pull data from it to show on my homepage. That way I would not press my CPU to hard and I would be able to make some kind of cache on user-end.
What you are describing is caching.
Yes, it's a common optimization to avoid over-burdening your database with query load.
The idea is you store a copy of data you had fetched from the database, and you hold it in some form that is quick to access on the application end. You could store it in RAM, or in a JSON file. Some people operate a Memcached or Redis in-memory database as a shared resource, so your app can run many processes or threads that access the same copy of data in RAM.
It's typical that your app reads some given data many times for every single time it updates the data. The greater this ratio of reads to writes, the better the savings in terms of lightening the load on your database.
It can be tricky, however, to keep the data in cache in sync with the most recent changes in the database. In other words, how do all the cache copies know when they should re-fetch the data from the database?
There's an old joke about this:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
So after another few days of exploring and trying to get the right answer this is what I have done. I decided to create another table, instead of JSON, and put all data, that was suposed to go in JSON file, in the table.
WHY?
Number one reason is MySQL has ability to lock tables while they're being updated, JSON has not.
Number two is that I will downgrade from few dozens of queries to just one, simplest, query: SELECT * FROM table.
Number three is that I have better control over content this way.
Number four, while I was searching for answer I found out that some people had issues with JSON availability if a lot of co-current connections were making request for same JSON, I would never have a problem with availability.

NewsFeed with Redis At Scale Stratagy

I'm running into an interesting dilema while trying to solve a scale problem.
Current we have a social platform that has a pretty typical feed. We are using a graph database and each time a feed is requested by the user we hit the DB. While this is fine now, it will come to a grinding halt as we grow our user base. Enter Redis.
Currently we store things like comments, likes and such in individual Redis keys in JSON encoded strings by post ID and update them when there are updates, additions or deletes. Then in code we loop through the DB results of posts and pull in the data from the Redis store. This is causing multiple calls to Redis to construct each post, which is far better than touching the DB each time. The challenge is keeping up with changing data such as commenter's/liker's avatars, Screen Names, closed accounts, new likes, new comments etc associated with each individual post.
I am trying to decide on a strategy to handle this the most effective way. Redis will only take us so far since we will top out at about 12 gig of ram per machine.
One of the concepts in discussion is to use a beacon for each user that stores new post ID. So when a user shares something new, all of their connected friends' beacon get the post ID so that when the user logs in their feed is seen as Dirty requiring an update, then storing the feed by ID in a Redis Set sorted by timestamp. To retrieve the feed data we can do a single query by IDs rather than a full traversal which is hundreds of times faster. That still does not solve the interacting user's account information, their likes, and comments problem which is ever changing, but does solve in part building the feed problem.
Another idea is to store a user's entire feed (JSON encoded) in a MYSQL record and update it on the fly when the user requests it and the beacon shows a dirty feed. Otherwise it's just a single select and json decode to build the feed. Again, the dynamic components are the huddle.
Has anyone dealt with this challenge successfully, or have working knowledge of a strategy to approach this problem.
Currently we store things like comments, likes and such in individual Redis keys in JSON encoded strings by post ID
Use more efficient serializer, like igbinary or msgpack. I suggest igbinary (check http://kiss-web.blogspot.com/)
Then in code we loop through the DB results of posts and pull in the data from the Redis store.
Be sure to use pipelining for maximum performance.
This is causing multiple calls to Redis to construct each post, which is far better than touching the DB each time.
Do not underestimate power of DB primary keys. Try to do the same (not join, but select by keys) with your DB:
SELECT * FROM comments WHERE id IN (1,2,3,4,5,6):
Single redis call is faster than single DB call, but doing lots of redis calls (even pipelined) compared to one sql query on primary keys - not so much. Especially when you give your DB enough memory for buffers.
You can use DB and Redis by "caching" DB data in redis. You do something like that:
Every time you update data, you update it in DB and delete from Redis.
When fetching data, you first try to get them from Redis. If data is not found in Redis, you search them id DB and insert into Redis for future use with some expire time.
That way you store in redis only usefull data. Unused(old) data will stay only in DB.
You may want to use a mongodb as #solocommand has described, but you may want to stop the expectation that you "update" data on demand. Instead push your users changes into a "write" queue, which will then update the database as needed. Then you can load from the database (mongodb) and work with it as needed, or update the other redis records.
Moving to a messaging system Amazon SQS, IronMQ, or RabbitMQ, may help you scale better. You can also use redis queues as a basic message bus.

Access and store large amount of data from mysql server

We are developing an iOS/Android application which downloads large amounts of data from a server.
We're using JSON to transfer data between the server and client devices.
Recently the size of our data increased a lot (about 30000 records).
When fetching this data, the server request gets timed out and no data gets fetched.
Can anyone suggest the best method to achieve a fast transfer of data?
Is there any method to prepare data initially and download data later?
Is there any advantage of using multiple databases in the device(SQLite dbS) and perform parallel insertion into db's?
Currently we are downloading/uploading only changed data (using UUID and time-stamp).
Is there any best approach to achieve this efficiently?
---- Edit -----
i think its not only the problem of mysql records, at peak times multiple devices are connecting to the server to access data, so connections also goes to waiting. we are using performance server. i am mainly looking for a solution to handle this sync in device. any good method to simplify the sync or make it faster using multi threading, multiple sqlite db etc,...? or data compression, using views or ...?
A good way to achieve this would probably be to download no data at all.
I guess you won't be showing these 30k lines at your client, so why download them in the first place?
It would probably be better to create an API on your server which would help the mobile devices to communicate with the database so the clients would only download the data they actually need / want.
Then, with a cache system on the mobile side you could make yourself sure that clients won't download the same thing every time and that content they have already seen would be available off-line.
When fetching this data, the server request gets timed out and no data gets fetched.
Are you talking only about reads or writes, too?
If you are talking about writing access, as well: Are the 30,000 the result of a single insert/update? Are you using a transactional engine like InnoDB, e.g.? If so, Are your queries wrapped in a single transaction? Having auto commit mode enabled can lead to massive performance issues:
Wrap several modifications into a single transaction to reduce the number of flush operations. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second (for a 10,000RPM disk), which constrains the number of commits to the same 167th of a second if the disk does not “fool” the operating system.
Source
Can anyone suggest the best method to achieve a fast transfer of data?
How complex is your query designed? Inner or outer joins, correlated or non-correlated subqueries, etc? Use EXPLAIN to inspect the efficiency? Read about EXPLAIN
Also, take a look at your table design: Have you made use of normalization? Are you indexing properly?
Is there any method to prepare data initially and download data later?
How do you mean that? Maybe temporary tables could do the trick.
But without knowing any details of your project, downloading 30,000 records on a mobile at one time sounds weird to me. Probably your application/DB-design needs to be reviewd.
Anyway, for any data that need not be updated/inserted directly to the database use a local SQLite on the mobile. This is much faster, as SQLite is a file-based DB and the data doesn't need to be transferred over the net.

High performance impression tracking

Basically, one part of some metrics that I would like to track is the amount of impressions that certain objects receive on our marketing platform.
If you imagine that we display lots of objects, we would like to track each time an object is served up.
Every object is returned to the client through a single gateway/interface. So if you imagine that a request comes in for a page with some search criteria, and then the search request is proxied to our Solr index.
We then get 10 results back.
Each of these 10 results should be regarded as an impression.
I'm struggling to find an incredibly fast and accurate implementation.
Any suggestions on how you might do this? You can throw in any number of technologies. We currently use, Gearman, PHP, Ruby, Solr, Redis, Mysql, APC and Memcache.
Ultimately all impressions should eventually be persisted to mysql, which I could do every hour. But I'm not sure how to store the impressions in memory fast without effecting the load time of the actual search request.
Ideas (I just added option 4 and 5)
Once the results are returned to the client, the client then requests a base64 encoded URI on our platform which contains the ID's of all of the objects that they have been served. This object is then passed to gearman, which then saves the count to redis. Once an hour, redis is flushed and the count is increments for each object in mysql.
After the results have been returned from Solr, loop over, and save directly to Redis. (Haven't benchmarked this for speed). Repeat the flushing to mysql every hour.
Once the items are returned from Solr, send all the ID's in a single job to gearman, which will then submit to Redis..
new idea Since the most number of items returned will be around 20, I could set a X-Application-Objects header with a base64 header of the ID's returned. These ID's (in the header) could then be stripped out by nginx, and using a custom LUA nginx module, I could write the ID's directly to Redis from nginx. This might be overkill though. The benefit of this though is that I can tell nginx to return the response object immediately while it's writing to redis.
new idea Use fastcgi_finish_request() in order to flush the request back to nginx, but then insert the results into Redis.
Any other suggestions?
Edit to Answer question:
The reliability of this data is not essential. So long as it is a best guess. I wouldn't want to see a swing of say 30% dropped impressions. But I would allow a tolerance of 10% -/+ acurracy.
I see your two best options as:
Using the increment command I redis to incremenent counters as you pull the dis. Use the Id as a key and increment it in Redis. Redis can easily handle hundreds of thousands of increments per second, so that should be fast enough to do without any noticeable client impact. You could even pipeline each request if the PHP language binding supports it. I think it does.
Use redis as a plain cache. In this option you would simply use a Redis list and do an rpush of a string containing the IDs separated by eg. a comma. You might use the hour of the day as the key. Then you can have a separate process pull it out by grabbing the previous hour and massaging it however you want to into MySQL. I'd you put an expires on keys you can have them cleaned out after a period of time, or just delete the keys with the post-processing process.
You can also use a read slave to do the exporting to MySQL from if you have very high redis traffic or just want to offload it and get as a bonus a backup of it. If you do that you can set the master redis instance to not flush to disk, increasing write performance.
For some additional options regarding a more extended use of redis' features for this sort of tracking see this answer You could also avoid the MySQL portion and pull the data from redis, keeping the overall system simpler.
I would do something like #2, and hand the data off to the fastest queue you can to update Redis counters. I'm not that familiar with Gearman, but I bet it's slow for this. If your Redis client supports asynchronous writes, I'd use that, or put this in a queue on a separate thread. You don't want to slow down your response waiting to update the counters.

Out of memory in PHP

I am having a problem with memory in PHP. I am required to obtain a huge report, a table with a lot of information.
This information is obtained also via some complex functions, which means that it cannot be obtained as the user requests it, as he/she would have to wait around 5 minutes for the information to be displayed.
I was caching the result, but as the information grew, now the browser just crashes and does not show anything. What are possible solutions for this?
I was thinking on storing the data in mysql instead of cache, and then execute a couple of selects when the user requests the information. What do you think about that solution? Any other better options?
Update
Looks like the problem was not understood, so I add more detail.
A search is being used already. There are many points to be kept in mind:
1) The information itself has to be calculated. I have a cron running that builds it (takes like 5 minutes), and stores it in cache. The browser is just rendering from cache, and the search is searching on this cached data. The information cannot be obtained in real time.
2) That is why I was thinking on storing the calculated results in MySQL, that way the search can search in the MySQL table, instead of searching the cached data (which is huge and impossible to handle now by the browser).
I hope the problem is more clear now!
I think storing the data in mysql is good idea specially if you build some indices on frequently queried data columns , caching will consume large memory (it often store data in RAM) .
Another thing you should consider , try to view server logs (access, and error) logs because you could find your solution there
and finally hope you solve your probelm
Maybe, you should paginate the output if it is possible. I think storing data in MySQL is not great idea...
check the php.ini setting for this parameter
memory_limit = 512M
set it to some more value

Categories