I am building a system that requires to store certain records with a consecutive number. At this point AUTO INCREMENT works just fine, until there is need for ROLLBACKS: After such consecutive record exists, I have to perform some processes using that consecutive number and that process may Fail, which leads to an inevitable ROLLBACK...
The next time I try to insert a record, the AUTO INCREMENT column can not use the already lost consecutive number due to certain "unbreakable" rules from MySQL.
I can not use the MAX(id) + 1 way, because there may be another user in the system doing the same process, generating his/her own/next consecutive id with success.
I have an idea about this: To get all consecutive IDs so far in that table, and loop them with the program to find the first missing ID, although I'm not sure about this because another user may be doing the same (and there could be only 1 missing consecutive, so both users will try to insert using the same consecutive.... etc, etc.).
So, I need this consecutive to be consistent even if there is need of ROLLBACK.
Is there any alternative to AUTO INCREMENT (in MySQL, with PHP, or anything else), so that I can generate consecutive IDs in a consistent way no matter if I have to ROLLBACK one or two of those insertions?.
Thanks for any help.
First of all, there's no declarative way to do what you want. You have to write procedural code.
I can not use the MAX(id) + 1 way
Yes, you can. To start with, you have to write code that either
locks the table, or
tries again when the dbms returns a duplicate key error.
There are tradeoffs in each approach, but both can maintain an unbroken sequence of numbers even when there are rollbacks.
But that's not the whole story. You also have to prevent changes from breaking the sequence. So you must either revoke update and delete privileges on that table, or you must prevent updating or deleting an existing id number in some other way.
To get all consecutive IDs so far in that table, and loop them with
the program to find the first missing ID,
If you already have a broken sequence, you'd better fix that first.
Related
This is more of a best practice question. I'm generating a random string identifier with the following parameters:
length: 7 characters
seed: A-Za-z0-9 (lowercase + uppercase alphabet and numbers)
I need to check if that string exists in the database before I'm inserting it. I can do that in two ways:
Run a do...while loop. In it, generate a random string and each time, query the database with COUNT(*) until count === 0.
First get all existing unique strings from the database with a single query, then run a do...while loop to generate a random string that's not in the fetched array.
It's obvious to me that the second method is technically less resource intensive for the database server, because there's only a single query, as opposed to querying over and over. So I'm leaning towards that method, but I see two potential caveats:
Large databases, and time passed between fetching and inserting.
Large database results: How many rows can be in a query result before I need to consider switching to the first method? In other words, when is the strain of a large result set on the database server lower than running multiple subsequent queries? 1,000 results? 5,000? 20,000?
Time between fetch & insert: If I use the second method, I see a risk when two or more users try to run the same function simultaneously. The first user's result set (of unique strings fetched from the database) may not include the other user's unique string that has just been added 2ms after the query. This could produce duplicates in the database.
Is the second method actually feasable in production, or is it just a dream?
The second option does not seem practical to me. If you have only a few rows in the table, you have low risk of collisions, if you have many rows, the risk of collision increase but it is not memory efficient on the php-side to fetch all rows.
The first solution seems better to me.
But I think a third option can be used. Add a unique index on your random value in MySQL. Generate a random value, then try to insert it. Catch the error if a collision happen. That's efficient because MySQL is quick at checking if a value exist when it is indexed. You have no concurrency issue with this approach.
The only caveat (for all approaches) is when your number of rows in the table is high, you will have difficulties to find a yet unused value. To lower the risk of collision, you can increase the size of the random value. You could also create another table that contains unused values, and fill this table with an other algorithm when it has too few values.
Suppose Table1 contains column orderid (not a key, although it's NOT NULL and unique). It contains 5-digit numbers.
What's the best way to generate a php var $unique_var that is not in that column.
What could be important, from 10% to 30% of 5-digit numbers are in the table (so generating a number and checking while (mysql_num_rows() ==0) {} is not a best way to find one, are there any better solution for performance?).
Thank you.
If there is just 10-30% of numbers already taken - then it means that only 10-30% of queries will be performed at least twice. Which is not a big performance issue at all.
Otherwise - just create all-5-digit-numbers-list table (just 100k rows) and remove all that exist. When you need another random number - just pick one and delete.
I would suggest finding the biggest number (with a MAX() clause) and start from there.
Here are a couple of suggestions. Each has its drawbacks.
Pre-populate your table and add a column to indicate that a number is unused. Select an unused number using LIMIT = 1 and mark it used. This uses a lot of space.
Keep a separate table containing previously used numbers. If that table is empty, generate numbers sequentially from the last used number (or from 00001 if Table1 is empty). This requires some extra bookkeeping.
OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****
You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.
Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.
From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?
Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.
Thank you.
To get the number of items (rows in a table), you'd use standard SQL and do it on demand
SELECT COUNT(*) FROM MyTable
Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.
This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.
Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse
I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.
I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.
Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.
I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.
Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items
In my forum I count the sub-threads in a forum like this:
SELECT COUNT(forumid) AS count FROM forumtable
As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.
I have a query to list a set of numbers, for example 1000000 through 2000000, I then run another query in that while loop to see if it matches another table in another database. This part runs fine, but a little slow.
I then need to have another query such that if it returns false, then it does another check on yet another table. The problem I'm having though, is that the check in this table is not as simple as a match.
The table structure on last table is like this:
firstnum
secondnum
This is intended for use in a range of numbers. So row 1 for example might be:
1000023, 1000046
This would mean it's for all numbers between and including those values.
There are thousands of these entries in the DB, and I'm trying to figure out the best way to determine if that particular number I'm searching on exists in that table somewhere, but since it's not a direct match, I'm not sure how to accomplish this. The table is also PostgreSQL while the main queries are MySQL.
It's a bit hard to understand what you're trying to say, but I'm afraid the solution is ridiculously simple: ... WHERE firstnum <= X AND X <= secondnum, where X is the number you are looking for.