Managing old elements by timestamp in Redis sorted set

Managing old elements by timestamp in Redis sorted set - php

You have a sorted set A in redis, every now and then you add new elements to it, they get sorted by rank e.g. You also have a sorted set B.
Is there a way to check if there are elements in set A that have been there for more then say 20 seconds, and move them to sorted set B
because this checking operation is done very frequently, and list can be very big, iterating through every element in set is a not a good solution. Need fastest one.
Thanks.
UPDATE:
Here is what I was trying to do:
Basically the idea was, imagine you have some kind of game server that matches opponents when they put a fight request. The current design was that every request get's to the set, and the rank/score is the player rank. so that way every 2 players that are near each other in the list are perfect matches. every 5 seconds or so a script get's called that pulls 50 rows from top of set, and matches them 2 by 2 (and removes them). This was working fine, and I think that was a very fast working solution. But then the idea of creating a Bot (AI) players came. so that when player is waiting too long in que, he get's matched with a bot (AI) player. And I cannot figure out a way to see "who is waiting too long" Basically maybe the entire idea was wrong.. so any better ideas are welcome :) Thanks a lot.

If the score in your sorted set is a unix timestamp, you can use zrange to grab the oldest NN items from set A. You can then do your checks, add qualifying entries to set B, then remove them from set A.
If your scoring in set A is not based on timestamp, then you will have to iterate over your set A entirely, or rethink your design. Redis keys do not have an inherent available timestamp of when they are added (which holds doubly true for items in a key such as a sorted set), so it has to be something you specifically create and track. Perhaps if you share more about what you are doing and why we can help with more detail.
Edit:
based on the additions to your questions, I would recommend trying a solution similar to what #akonsu is proposing.
Specifically:
Sorted-Set-A: players by rank just as they are now.
Sorted-Set-B:
uses timestamp as the time the person went into the queue, stores their userid. In other words, when you zadd to SetA with their rank & ID, you zadd to SetB with the timestamp and ID.
When you match players you remove them from both sets. If you pull your set of users to match from SetB using a zrange command to grab the X oldest entries, you will have the time they queued up, in order of their entry (like a FIFO). You then use a zrange command on SetA with their rank +/- whatever rank range you need. If you get a match you proceed with the match by removing them from both sets and moving on.
If there is no suitable opponent in SetA and their timestamp is old enough you match with an AI, then remove them from both sets and move on.
Essentially it is an index queue of users->timestamp. Doing it this way mean shorter queue times for all users as you are now matching them in order of queue length. You still use SetA for matching based on players' rank, but now you are indexing and prioritizing based on time.
The specific implementation may be a bit more interesting than this, but as an overall stratagem I think this fits what you need.

Related

In sphinx how does the search result display if the index updates inbetween two setlimit calls

I have just started working on sphinx with php. Was just wondering is if i set limit to 20 records per call.
$cl->SetLimits ( 0, 20);
the index recreate is say set to 5 minutes with a --rotate option.
So if in my application i have to call the next 20 search results i call the command
$cl->SetLimits ( 20, 20);
Suppose the index is recreated in between the two setlimit calls. And say a new document is inserted with say the highest weight. (and i am sorting results by relevance.)
Wouldnt the search result shift by one position down so the earlier 20th record will now be the 21st record and so i again get the same result at the 21st position that i got in the 20th position & so my application will display a duplicate search result. Is this true..any body else got this problem.
Or how should I overcome this?
Thanks!
Edit (Note: The next setlimit command is called based on a user event say 'See more Results')

Yes, that can happen.
But usually happens so rarely that nobody notices.
About the only way to avoid it would be to store some sort of index with the query. So as well as a page number, you include a last id. Then when on the second page etc, use that id to exclude any new results created since the search started.
On the first page query, you lookup the biggest id in the index, need to run a second query for that.
(this at least copes with new additions to the index, but its harder to cope with changes to documents, but can be done in a similar way)

setLimit sets the offset on the result server side, http://php.net/manual/en/sphinxclient.setlimits.php.
So to answer your question, no, it will query the with max_matches and save a result set, from there you work with the result set and not the indexed data.
One question though, why are you indexing it every 5 minutes? It would be better just to re-index every time your data changes.

Possible ways to create a turn based system using PHP/MySQL [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Right, I'm trying to create a system where by the user can do something, but then must wait until all the other users in the mysql table have made their move, i.e
User1 makes move, user2 and 3 must wait
user2 makes move, user 1 and 3 must wait
user3 makes move, user 1 and 2 must wait
user1 makes move...
One way I thought of was to give each of the users an number (ranging from 1 to the total number of players, say 6) and then when a player makes a move, set their number to the max number (6) and decrease everyone else's number by one, so the one with the minimum number is the only one who can play.
That's my only idea, is there an easier or alternative way?

My suggestion would be just store the last move date as a datetime. When you need to check if a user can move, simply just select out of the table all of the other players where the last move date is less than or equal to the current player's last move date. If the number of rows is not 0, then the player cannot move yet.
The benefits of this approach is the simplicity- every time you allow a player to make a move, just update the column with the current date and time.

Your proposed solution seems a little circuitous:
You're updating+reading every player every move, when the minimum information you need to maintain is whose move it is.
You're losing information about player order as you encode next turn information.
A high-level solution:
Create a games table, one row per game, with a column like INT currentTurn
Create a gameUsers table on a per-game basis, linked to its game in games
Do assign each of the n users in gameUsers an INT playerOrder ranging [1-n]
Only accept a move from playerN if playerN == "SELECT playerID FROM gameUsers WHERE playerOrder = currentTurn"
After a successful move: "UPDATE games SET currentTurn = currentTurn + 1 WHERE game = thisGame"
I believe above table structure is a good object oriented representation of an actual game model. You can stash other per-game things into games like winner, length, date, etc. Pardon the pseudoSQL.

You could have a table with column hasMoved tinyint(1) required default 0, and query for where hasMoved == 0; if the query returns null, then all players have moved.
(Note: this is based on "must wait for all other users", NOT for a strict move order - i.e. 'A' must move before 'B' must move before 'C', etc.)
Additionally, queries using this method is somewhat slow and (to me) seems somewhat unnecessarily resource-intensive - perhaps think about using Ajax instead?

Have a game sequence number that starts at zero. Have a "last moved" number for each player. When a player moves, set their "last moved" number equal to the game sequence number. Once every player has moved, increment the game sequence number.
You may want to use a timeout though, otherwise a player who doesn't move can delay the other players indefinitely.

I would first determine $sequence by calculating speed. Then comparing speeds to determine order. Then use the order to send out notices for their move. Use a timestamp to ensure the user doesn't take over a day or however long, you will need a cron job just for this.
Have a variable or array hold the first n last sequence so u can easily move the last moved player to the back without mixing uP orders.
Have the page check the players order sequence and not allow action unless it's at 1 or 0. Be sure to sanitize inputs so no manipulation exists. Then insert your form and graphics and game equations.

You can save date-time of the last move of the each user. So when you DESC sort this table by this date-time column, you will have to fetch only the first row of the result, that will contain the ID of the allowed to make move player.

Ordering Combinations for Maximum Effectiveness

So recently I was given a problem, which I have been mulling over and am still unable to solve; I was wondering if anyone here could point me in the right direction by providing me with the psuedo code (or at least a rough outline of the pseudo code) for this problem. PS I'll be building in PHP if that makes a difference...
Specs
There are ~50 people (for this example I'll just call them a,b,c... ) and the user is going to group them into groups of three (people in the groups may overlap), and in the end there will be 50-100 groups (ie {a,b,c}; {d,e,f}; {a,d,f}; {b,c,l}...). *
So far it is easy, it is a matter of building an html form and processing it into a multidimensional array
There are ~15 time slots during the day (eg 9:00AM, 9:20AM, 9:40AM...). Each of these groups needs to meet once during the day. And during one time slot the person cannot be double booked (ie 'a' cannot be in 2 different groups at 9:40AM).
It gets tricky here, but not impossible, my best guess at how to do this would be to brute force it (pick out sets of groups that have no overlap (eg {a,b,c}; {l,f,g}; {q,n,d}...) and then just put each into a time slot
Finally, the schedule which I output needs to be 'optimized', by that I mean that 'a' should have minimal time between meetings (so if his first meeting is at 9:20AM, his second meeting shouldn't be at 2:00PM).
Here's where I am lost, my only guess would be to build many, many schedules and then rank them based on the average waiting time a person has from one meeting to the next
However My 'solutions' (I hesitate to call them that) require too much brute force and would take too long to create. Are there simpler, more elegant solutions?

These are the table laid out, modified for your scenerio
+----User_Details------+ //You may or may not need this
| UID | Particulars... |
+----------------------+
+----User_Timeslots---------+ //Time slots per collumn
| UID | SlotNumber(bool)... | //true/false if the user is avaliable
+---------------------------+ //SlotNumber is replaced by s1, s2, etc
+----User_Arrangements--------+ //Time slots per collumn
| UID | SlotNumber(string)... | //Group session string
+-----------------------------+
Note: That the string in the Arrangement table, was in the following format : JSON
'[12,15,32]' //From SMALLEST to BIGGEST!
So what happens in the arrangement table, was that a script [Or an EXCEL column formula] would go through each slot per session, and randomly create a possible session. Checking all previous sessions for conflicts.
/**
* Randomise a session, in which data is not yet set
**/
function randomizeSession( sesionID ) {
for( var id = [lowest UID], id < [highest UID], id++ ) {
if( id exists ) {
randomizeSingleSession( id, sessionID );
} //else skips
}
}
/**
* Randomizes a single user in a session, without conflicts in previous sessions
**/
function randomizeSingleSession( id, sessionID ) {
convert sessionID to its collumns name =)
get the collumns name of all ther previous session
if( there is data, false, or JSON ) {
Does nothing (Already has data)
}
if( ID is avaliable in time slot table (for this session) ) {
Get all IDs who are avaliable, and contains no data this session
Get all the UID previous session
while( first time || not yet resolved ) {
Randomly chose 2
if( there was conflict in UID previous session ) {
try again (while) : not yet resolved
} else {
resolved
}
}
Registers all 3 users as a group in the session
} else {
Set session result to false (no attendance)
}
}
You will realize the main part of the assignment of groups is via randomization. However, as the amount of sessions increases. There will be more and more data to check against for conflicts. Resulting to a much slower performance. However large being, ridiculously large, to an almost perfect permutation/combination formulation.
EDIT:
This setup will also help ensure, that as long as the user is available, they will be in a group. Though you may have pockets of users, having no user group (a small number). These are usually remedied by recalculating (for small session numbers). Or just manually group them together, even if it is a repeat. (having a few here and there does not hurt). Or alternatively in your case, along with the remainders, join several groups of 3's to form groups of 4. =)
And if this can work for EXCEL with about 100+ ppl, and about 10 sessions. I do not see how this would not work in SQL + PHP. Just that the calculations may actually take some considerable time both ways.

Okay, for those who just join in on this post, please read through all the comments to the question before considering the contents of this answer, as this will very likely fly over your head.
Here is some pseudo code in PHP'ish style:
/* Array with profs (this is one dimensional here for the show, but I assume
it will be multi-dimensional, filled with availability and what not;
For the sake of this example, let me say that the multi-dimensional array
contains the following keys: [id]{[avail_from],[avail_to],[last_ses],[name]}*/
$profs = array_fill(0, $prof_num, "assoc_ids");
// Array with time slots, let's say UNIX stamps of begin time
$times = array_fill(0, $slot_num, "time");
// First, we need to loop through all the time slots
foreach ($times as $slot) {
// See when session ends
$slot_end = $slot + $session_time;
// Now, run through the profs to see who's available
$avail_profs = array(); // Empty
foreach ($profs as $prof_id => $data) {
if (($data['avail_from'] >= $slot) && ($data['avail_to'] >= $slot_end)) {
$avail_prof[$prof_id] = $data['last_ses'];
}
}
/* Reverse sort the array so that the highest numbers (profs who have been
waiting the longest) will be up top */
arsort($avail_profs);
$profs_session = array_slice($avail_profs, 0, 3);
$profs_session_names = array(); // Empty
// Reset the last_ses counters on those profs
foreach ($profs_session as $prof_id => $last_ses) {
$profs[$prof_id]['last_ses'] = 0;
$profs_session_names[0] = $profs[$prof_id]['name'];
}
// Now, loop through all profs to add one to their waiting time
foreach ($profs as $prof_id = > $data) {
$profs[$prof_id]['last_ses']++;
}
print(sprintf('The %s session will be held by: %s, $s, and %s<br />', $slot,
$profs_session_names[0], $profs_session_names[1],
$profs_session_names[2]);
unset ($profs_session, $profs_session_names, $avail_prof);
}
That should print something like:
The 9:40am session will be held by: C. Hicks, A. Hole, and B.E.N. Dover

I see an object model consisting of:
Panelists: a fixed repository of of your the panelists (Tom, Dick, Harry, etc)
Panel: consists of X Panelists (X=3 in your case)
Timeslots: a fixed repository of your time slots. Assuming fixed duration and only occurring on a single day, then all you need is track is start time.
Meeting: consists of a Panel and Timeslot
Schedule: consists of many Meetings
Now, as you have observed, the optimization is the key. To me the question is: "Optimized with respect to what criteria?". Optimal for Tom might means that the Panels on which he is a member lay out without big gaps. But Harry's Panels may be all over the board. So, perhaps for a given Schedule, we compute something like totalMemberDeadTime (= sum of all dead time member gaps in the Schedule). An optimal Schedule is the one that is minimal with respect to this sum
If we are interested in computing a technically optimal schedule among the universe of all schedules, I don't really see an alternative to brute force .
Perhaps that universe of Schedules does not need to be as big as might first appear. It sounds like the panels are constituted first and then the issue is to assign them to Meetings which them constitute a schedule. So, we removed the variability in the panel composition; the full scope of variability is in the Meetings and the Schedule. Still, sure seems like a lot of variability there.
But perhaps optimal with respect to all possible Schedules is more than we really need.
Might we define a Schedule as acceptable if no panelist has total dead time more than X? Or failing that, if no more than X panelists have dead time more than X (can't satisfy everyone, but keep the screwing down to a minimum)? Then the user could assign meeting for panels containing the the more "important" panelists first, and less-important guys simply have to take what they get. Then all we have to do is fine a single acceptable Schedule
Might it be sufficient for your purposes to compare any two Schedules? Combined with an interface (I'm seeing a drag-and-drop interface, but that's clearly beyond the point) that allows the user to constitute a schedule, clone it into a second schedule, and tweak the second one, looking to reduce aggregate dead time until we can find one that is acceptable.
Anyway, not a complete answer. Just thinking out loud. Hope it helps.

php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount

In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.

please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?

Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?

Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.

Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.

I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.

Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.