I'm using a database for a project where I'm inserting things and I have an auto-increment. However, randomly, the auto-increment ID started counting wrong.
For example, the last one in the table has an ID of 227. When I insert another row, it should auto-assign it as 228, but instead it jumps to 232. How do I fix this?
Auto-increment doesn't mean use the next highest number in the table.
There are a few reasons why the auto-increment number is non-contiguous.
The auto-increment step may not be 1
The previous rows may have been deleted
Transactions inserting into the table may have been rolled back
A record may have updated the auto-increment column
The auto-increment start index may have been changed by a DDL modification.
There's probably a few other scenarios which would cause this, however.
To answer the question of "How do I fix this?", don't bother, the ID is supposed to be unique and nothing else, having IDs contiguous is usually not that useful (unless you're doing paging assuming they are contiguous, which is a bad assumption to have).
AutoIncrement numbers can be collected by the DB Server ahead of time. Instead of getting just the one asked for, a DB will sometimes get say 5 (or 10 or 100) and update the long term store of the next number to be assigned with the next + 5 (or 10 or 100). Then the next time the next one is assigned, it's got it in memory which is much faster, especially as it does not have to store the next to assign to the disk.
Timetable:
Asked for next number
Hasn't got one in memory so:
Go to disk to get the next (call it a)
Update disk with a + 100
Give the asker a
Store a+1 in memory as the next to give out
Asked for next number
Gives it a+1 and stores a+2 in memory
...and so on.
However if the DB Server gets stopped before it has given out all the numbers, then when it is restarted it will look for the next number and find a+100 - hence the gap. Autonumbers are generally guaranteed to given out as constantly increasing values, and should always be unique (watch what happens when max values are reached though). They are usually but not always sequential.
(Oracle does the same with Sequences. It's a while since I used it, but with a sequence you can specify a cache size. The only way to guarantee sequential assignments is to have a cache size of zero, which is slower.)
Related
You have a sorted set A in redis, every now and then you add new elements to it, they get sorted by rank e.g. You also have a sorted set B.
Is there a way to check if there are elements in set A that have been there for more then say 20 seconds, and move them to sorted set B
because this checking operation is done very frequently, and list can be very big, iterating through every element in set is a not a good solution. Need fastest one.
Thanks.
UPDATE:
Here is what I was trying to do:
Basically the idea was, imagine you have some kind of game server that matches opponents when they put a fight request. The current design was that every request get's to the set, and the rank/score is the player rank. so that way every 2 players that are near each other in the list are perfect matches. every 5 seconds or so a script get's called that pulls 50 rows from top of set, and matches them 2 by 2 (and removes them). This was working fine, and I think that was a very fast working solution. But then the idea of creating a Bot (AI) players came. so that when player is waiting too long in que, he get's matched with a bot (AI) player. And I cannot figure out a way to see "who is waiting too long" Basically maybe the entire idea was wrong.. so any better ideas are welcome :) Thanks a lot.
If the score in your sorted set is a unix timestamp, you can use zrange to grab the oldest NN items from set A. You can then do your checks, add qualifying entries to set B, then remove them from set A.
If your scoring in set A is not based on timestamp, then you will have to iterate over your set A entirely, or rethink your design. Redis keys do not have an inherent available timestamp of when they are added (which holds doubly true for items in a key such as a sorted set), so it has to be something you specifically create and track. Perhaps if you share more about what you are doing and why we can help with more detail.
Edit:
based on the additions to your questions, I would recommend trying a solution similar to what #akonsu is proposing.
Specifically:
Sorted-Set-A: players by rank just as they are now.
Sorted-Set-B:
uses timestamp as the time the person went into the queue, stores their userid. In other words, when you zadd to SetA with their rank & ID, you zadd to SetB with the timestamp and ID.
When you match players you remove them from both sets. If you pull your set of users to match from SetB using a zrange command to grab the X oldest entries, you will have the time they queued up, in order of their entry (like a FIFO). You then use a zrange command on SetA with their rank +/- whatever rank range you need. If you get a match you proceed with the match by removing them from both sets and moving on.
If there is no suitable opponent in SetA and their timestamp is old enough you match with an AI, then remove them from both sets and move on.
Essentially it is an index queue of users->timestamp. Doing it this way mean shorter queue times for all users as you are now matching them in order of queue length. You still use SetA for matching based on players' rank, but now you are indexing and prioritizing based on time.
The specific implementation may be a bit more interesting than this, but as an overall stratagem I think this fits what you need.
I have just started working on sphinx with php. Was just wondering is if i set limit to 20 records per call.
$cl->SetLimits ( 0, 20);
the index recreate is say set to 5 minutes with a --rotate option.
So if in my application i have to call the next 20 search results i call the command
$cl->SetLimits ( 20, 20);
Suppose the index is recreated in between the two setlimit calls. And say a new document is inserted with say the highest weight. (and i am sorting results by relevance.)
Wouldnt the search result shift by one position down so the earlier 20th record will now be the 21st record and so i again get the same result at the 21st position that i got in the 20th position & so my application will display a duplicate search result. Is this true..any body else got this problem.
Or how should I overcome this?
Thanks!
Edit (Note: The next setlimit command is called based on a user event say 'See more Results')
Yes, that can happen.
But usually happens so rarely that nobody notices.
About the only way to avoid it would be to store some sort of index with the query. So as well as a page number, you include a last id. Then when on the second page etc, use that id to exclude any new results created since the search started.
On the first page query, you lookup the biggest id in the index, need to run a second query for that.
(this at least copes with new additions to the index, but its harder to cope with changes to documents, but can be done in a similar way)
setLimit sets the offset on the result server side, http://php.net/manual/en/sphinxclient.setlimits.php.
So to answer your question, no, it will query the with max_matches and save a result set, from there you work with the result set and not the indexed data.
One question though, why are you indexing it every 5 minutes? It would be better just to re-index every time your data changes.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Right, I'm trying to create a system where by the user can do something, but then must wait until all the other users in the mysql table have made their move, i.e
User1 makes move, user2 and 3 must wait
user2 makes move, user 1 and 3 must wait
user3 makes move, user 1 and 2 must wait
user1 makes move...
One way I thought of was to give each of the users an number (ranging from 1 to the total number of players, say 6) and then when a player makes a move, set their number to the max number (6) and decrease everyone else's number by one, so the one with the minimum number is the only one who can play.
That's my only idea, is there an easier or alternative way?
My suggestion would be just store the last move date as a datetime. When you need to check if a user can move, simply just select out of the table all of the other players where the last move date is less than or equal to the current player's last move date. If the number of rows is not 0, then the player cannot move yet.
The benefits of this approach is the simplicity- every time you allow a player to make a move, just update the column with the current date and time.
Your proposed solution seems a little circuitous:
You're updating+reading every player every move, when the minimum information you need to maintain is whose move it is.
You're losing information about player order as you encode next turn information.
A high-level solution:
Create a games table, one row per game, with a column like INT currentTurn
Create a gameUsers table on a per-game basis, linked to its game in games
Do assign each of the n users in gameUsers an INT playerOrder ranging [1-n]
Only accept a move from playerN if playerN == "SELECT playerID FROM gameUsers WHERE playerOrder = currentTurn"
After a successful move: "UPDATE games SET currentTurn = currentTurn + 1 WHERE game = thisGame"
I believe above table structure is a good object oriented representation of an actual game model. You can stash other per-game things into games like winner, length, date, etc. Pardon the pseudoSQL.
You could have a table with column hasMoved tinyint(1) required default 0, and query for where hasMoved == 0; if the query returns null, then all players have moved.
(Note: this is based on "must wait for all other users", NOT for a strict move order - i.e. 'A' must move before 'B' must move before 'C', etc.)
Additionally, queries using this method is somewhat slow and (to me) seems somewhat unnecessarily resource-intensive - perhaps think about using Ajax instead?
Have a game sequence number that starts at zero. Have a "last moved" number for each player. When a player moves, set their "last moved" number equal to the game sequence number. Once every player has moved, increment the game sequence number.
You may want to use a timeout though, otherwise a player who doesn't move can delay the other players indefinitely.
I would first determine $sequence by calculating speed. Then comparing speeds to determine order. Then use the order to send out notices for their move. Use a timestamp to ensure the user doesn't take over a day or however long, you will need a cron job just for this.
Have a variable or array hold the first n last sequence so u can easily move the last moved player to the back without mixing uP orders.
Have the page check the players order sequence and not allow action unless it's at 1 or 0. Be sure to sanitize inputs so no manipulation exists. Then insert your form and graphics and game equations.
You can save date-time of the last move of the each user. So when you DESC sort this table by this date-time column, you will have to fetch only the first row of the result, that will contain the ID of the allowed to make move player.
In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.
please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?
Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.
I have a relatively large database (130.000+ rows) of weather data, which is accumulating very fast (every 5minutes a new row is added). Now on my website I publish min/max data for day, and for the entire existence of my weatherstation (which is around 1 year).
Now I would like to know, if I would benefit from creating additional tables, where these min/max data would be stored, rather than let the php do a mysql query searching for day min/max data and min/max data for the entire existence of my weather station. Would a query for max(), min() or sum() (need sum() to sum rain accumulation for months) take that much longer time then a simple query to a table, that already holds those min, max and sum values?
That depends on weather your columns are indexed or not. In case of MIN() and MAX() you can read in the MySQL manual the following:
MySQL uses indexes for these
operations:
To find the MIN() or MAX() value for a
specific indexed column key_col. This
is optimized by a preprocessor that
checks whether you are using WHERE
key_part_N = constant on all key parts
that occur before key_col in the
index. In this case, MySQL does a
single key lookup for each MIN() or
MAX() expression and replaces it with
a constant.
In other words in case that your columns are indexed you are unlikely to gain much performance benefits by denormalization. In case they are NOT you will definitely gain performance.
As for SUM() it is likely to be faster on an indexed column but I'm not really confident about the performance gains here.
Please note that you should not be tempted to index your columns after reading this post. If you put indices your update queries will slow down!
Yes, denormalization should help performance a lot in this case.
There is nothing wrong with storing calculations for historical data that will not change in order to gain performance benefits.
While I agree with RedFilter that there is nothing wrong with storing historical data, I don't agree with the performance boost you will get. Your database is not what I would consider a heavy use database.
One of the major advantages of databases is indexes. They used advanced data structures to make data access lightening fast. Just think, every primary key you have is an index. You shouldn't be afraid of them. Of course, it would probably be counter productive to make all your fields indexes, but that should never really be necessary. I would suggest researching indexes more to find the right balance.
As for the work done when a change happens, it is not that bad. An index is a tree like representation of your field data. This is done to reduce a search down to a small number of near binary decisions.
For example, think of finding a number between 1 and 100. Normally you would randomly stab at numbers, or you would just start at 1 and count up. This is slow. Instead, it would be much faster if you set it up so that you could ask if you were over or under when you choose a number. Then you would start at 50 and ask if you are over or under. Under, then choose 75, and so on till you found the number. Instead of possibly going through 100 numbers, you would only have to go through around 6 numbers to find the correct one.
The problem here is when you add 50 numbers and make it out of 1 to 150. If you start at 50 again, your search is less optimized as there are 100 numbers above you. Your binary search is out of balance. So, what you do is rebalance your search by starting at the mid-point again, namely 75.
So the work a database is just an adjustment to rebalance the mid-point of its index. It isn't actually a lot of work. If you are working on a database that is large and requires many changes a second, you would definitely need to have a strong strategy for your indexes. In a small database that gets very few changes like yours, its not a problem.