I'm trying to replace the sql implementation of Drupal 8's flood control service with a redis based implementation.
See https://github.com/drupal/drupal/blob/8.0.x/core/lib/Drupal/Core/Flood/DatabaseBackend.php
The requirements are like this:
Each occurrence of an action/event (e.g trying to log in) is logged with an expiration, identifier and timestamp
I need to be able to prevent that a certain action can be done more than N times in a given timeframe
I want to be able to clean up expired events
In case of a threshold of 3 in 10 minutes, if the user tries once, then twice after 5 minutes, he is blocked and can try again once after 5 more minutes. Not 10. While the second would be a valid way to do this, it's not how the sql implementation works or how the tests expect it to work.
As you can see based on the API, I also don't know when registering the event what the threshold is, I only know the expiration of a single event.
My thoughts on how to implement this:
If, after N occurrences should be locked for the given time, then this would be easy with a single KEY for event:identifier that is incremented, once the max is reached, it is locked until it expires again and each INCR would also update the expiration (or not).
I found many posts that ask about expiration of list entries, which is not possible. There are workarounds using sorted sets and delete by range. Most seem to use a single global set, but then I can't easily count my event + identifier - I think.
After writing all this down, I might actually have an idea how it could work, so I guess what I'm looking for is feedback on whether that makes sense or if there's an easier way.
Each event:identifier combination is a key and contains a sorted set. That uses the expiration as score and as value a unique value, possibly creation time in microseconds. I count the non-expired records to detect if the threshold was reached. I'm updating the expiration of each event:identifier to the provided expiration window, so it will be auto-deleted assuming unless a given identifier/client doesn't give up and keeps on trying, without ever reaching the expiration. Is it worth to clean up the records inside a set e.g. when doing a new register? It seems to be fairly fast, and I could also only do it sometimes.
I would prefer to use Redis' key expiration feature, instead of reimplementing one.
A simpler alternative would be the following one:
just SET a simple value, which is the the number of attempts; use a key built on a pattern like "identifier":"event type" :
SETNX <identifier>:<event type> 1
if the response is 1, this is the first attempt, so you set a timeout on this key:
EXPIRE <identifier>:<event type> <timeout in seconds>
otherwise you increment the number of attempts
INCR <identifier>:<event type>
The response of the INCR will give you the number of attempts during the window, so you know if you can allow the action or not.
You could also use a hash instead of a simple value, if you need to store more data, like the max number of allowed attempts in the given time window. In this case you will probably use HSETNX and HINCR.
Related
The issue of the same ID being generated if I were to call uniqid() multiple times isn't really an issue. But it's not clear to me whether there is a risk of ID collision at a later point in time? I mean, technically even proper cryptographic hashes have a chance of collision, but am I right in thinking that
uniqid is particularly susceptible?
The result is based on the time in microseconds. As long as you call it at different microseconds, the results should be different.
But if the clock is reset back to the same time as a previous call, you would get the same result. This is the value of the $more_entropy parameter. If you add this parameter, it will add a random string at the end. The chance of the clock being reset to the same time and the RNG producing the same random string is miniscule. It's also unusual for server clocks to jump backwards; unless the time is very far off from correct, time corrections are usually done by changing the rate of clock increments, so it's monotonic and approaches the correct time.
Strings generated at the same microsecond on different hosts, without $more_entropy = true, will be the same. This is the benefit of the $prefix parameter; you can use something host-specific there to avoid collisions between servers. This is only needed if the ID needs to be globally unique, rather than just unique within the server.
If you use the $prefix and $more_entropy parameters, you should not really have to worry about collisions.
No. Well, assuming your system clock isn't reseting to the same date and time every night, you shouldn't have collisions from subsequent calls. Mostly because the generated value is largely based on the unixtime with microseconds.
You run a greater risk of collision in a scenario where you have a pool of front end web servers using the same code. It's conceivable that a call made at the same exact moment could result in the same value being generated on two machines.
Use the more_entropy option and optionally concatenate something that identifies the server onto the unique ID to ensure it is unique.
Since it’s based on time, there shouldn’t be future collision. As future entails a change in time.
I'd love to create great things with number based stuff.
I totally have no idea how / where should I start.
Let's say users have to register then log in to the site to use this feature (already done this).
I tried to save their registration date in timestamp and calculate their value (the one that i need to increment) from the elapsed timestamp since registration. It was working, but when I set a maximum to this value, and then raised it's maximum, it just jumped to the new maximum (since the time was still going). Btw, this thing needs to be working even if the user is completely logged out, and also not on the site. (so it's server sided)
So let's say I need to increment this value by 550 an hour, but after the first hour elapsed the incrementation grows to 650, after the second hour grows to 750 and so on... and as soon as it reached 3272 it must stay there.
It's also important to visually upgrade this value LIVE. So the user doesn't have to refresh the page every time he/she wishes to take a look at their new value. I guess the hard part is that to calculate every second's incrementation value to match the value of the hour. Okay not that hard I guess it must be like 650/60/60 = 1 second
Best Regards,
Henrik
It is not clear what you are trying to do, but you have multiple options.
You can use timestamp itself as a increment value.
Or
You can use cron-jobs. (Google if you need more information about this)
Create a cron to automatically increment your mysql value and set the interval.
Hope this helps.
I would like to know how to implement a unique visitor counter using Redis with a timeout of 1 hour. I have a little online shop and I would like to show how many guys are viewing an article currently. What would be the best way? I'm using Predis
Thanks inadvance
You will need to do 2 things:
Generate a unique ID for each visitor (they are many ways to do it)
Use a Redis SET or ZSET to store those ids and add an expire mechanism.
Method #1: SET + SCARD + EXPIRE
Solution
The SET name could be something like [article_id]:[current_hour_timestamp], each time you SADD a visitor id inside it, do an EXPIRE. Each set will expire after a certain time.
Use SCARD [article_id]:[current_hour_timestamp] to know how many visitors are online
Notes
The key can be [article_id]:[current_hour_timestamp] or anything like articles:[article_id]:readers:[current_hour_timestamp] if you prefer...
At every hour start the SET will be empty that's not what you may want.
Method #2: ZSET + ZCARD + cron
Solution
Each time a visitor read open an article send a ZADD [article_id]:readers [expiry_time] [visitor_id], it does not matter if it reload the page, the expiry_time will just be updated.
expiry_time is the timestamp in the future when we consider the visitor to have leaved the article.
Setup a cron to run each hour:
Retrieve all article ids in databases
For each article
Send a `zremrangebyscore "article.id:readers" -inf [current_timestamp]` to remove expired visitors_id
Finally to retrieve current readers count run ZCARD [article_id]:readers.
Note
This approach is way better because it uses a sliding window so you won't have the "0 reader" effect at each new hour. However the downside is that you will need to loop over all your article id which could be an issue.
Method #3: ZSET + ZCARD + ZSET as index + cron
Solution
Same as method #2, however this one uses another zset articles:readers:zset that contains all zset keys recently updated.
You will then have to loop through this zset members instead of all article ids.
Note
If you go down that way don't forget to remove expired members from articles:readers:zset !
I already have a sceen scraper built using PHP cURL, tied to a mySQL database. I have stored products that need to be updated weekly rather than what I have now (a form that I input the url/product and hit go).
My first thought would be to use standard cron every 30 minutes on a PHP file like so.
I would like to randomize two things, the delay on the PHP script actually accessing the source site (i.e. 0 - 20 minutes) so the process timing is random. Second, I want to access my target items/pages randomly, but be sure to get all of them weekly and/or consistently, before cycling through the list again.
The timer is fairly strait forward and needs no storage of data, but how should I keep track of my items/uri's in this fashion? I was thinking a second cron to clear data, while the first just increments. But still I have to set flags as to what was updated already and I am just not familiar enough for choice of where and how to store this data.
I am using mySQL, with HTML5 options and is on Codeigniter, so can also hold data in SQLite as options..along with cookies if that makes sense. I couple questions on this part, do I query my database (mySQL) for what I need every-time, or do I store on a jSON file once a week, and run that? This obviously depends and/or determines on where I flag what was already processed.
You have a list of items to scrape in your MySQL database. Ensure that there is field that holds the last time the item was scraped.
Set a cron job to run every minute with this workflow:
Ensure that the previous run of the script has completed (see step #4). If not, end.
Check last time you scraped any item.
Ensure enough time has passed (see step #9). If not, end.
Set a value somewhere to show that you are processing (so step #1 of subsequent runs is aware).
Select an item to scrape at random. (from those that haven't been scraped in n time.)
Delay random interval of seconds to ensure all requests aren't always on the minute.
Scrape it.
Update time last scraped for that item.
Set a random time to wait before next operation (so step #3 of subsequent runs is aware).
Set a value to show that you are not processing (so step #1 of subsequent runs is aware).
End.
Once all items have been scraped, you can set a variable to hold the time the batch was completed and use it for n in step #5.
I have a PHP website using a MySQL database.
We have items that users create, that are on a timer, and once this timer has counted down, without user interaction (basically next time someone sees it) the status needs to have changed.
I'm not sure how to implement this in a way to be accurate to the minute.
So we have an object X, that expires at 10:15pm tommorrow, and the next person to see object X after that time has to see it as expired.
Is the correct way to do this to be the next time object X is loaded we check if it's expired, and if so, update the database?
What happens if 10 people load object X at the same time after it's expired, what's to prevent some sort of race condition from all 10 requests attempting to update the database?
Is there a cron job that runs every minute that I can some how make use of, or any type of timer in MySQL to kick off every minute checking for these and running a script?
I have several ideas on how it -could- be done, like those listed above, but I'm not sure what the most practical is, or what the standard way to do it is as I'm positive someone has solved this problem before.
Is the correct way to do this to be the next time object X is loaded we check if it's expired, and if so, update the database?
Why do you need to update the database? It seems like you might have some redundancy in your DB table - from what you've said, it sounds like you have (for instance) an is_expired column and then an expires_at column.
Why not just get rid of the is_expired column? It's cheap to compare 2 integers, so when you want to determine if something is expired, just fetch the expires_at column and compare with the current time. This avoids any race conditions with expiry, since nothing in the DB changes.
You can do it with cron of course. Or with javascript native function setInterval( "checkFunction()", 10000 ); to alter the db. Or you could use a date field in DB and check for expiration
Make a field date_to_expire as DATE , enter the expiration date and everytime you query for it check to see if the item is expired (this can go up to seconds)