This is more of a best practice question. I'm generating a random string identifier with the following parameters:
length: 7 characters
seed: A-Za-z0-9 (lowercase + uppercase alphabet and numbers)
I need to check if that string exists in the database before I'm inserting it. I can do that in two ways:
Run a do...while loop. In it, generate a random string and each time, query the database with COUNT(*) until count === 0.
First get all existing unique strings from the database with a single query, then run a do...while loop to generate a random string that's not in the fetched array.
It's obvious to me that the second method is technically less resource intensive for the database server, because there's only a single query, as opposed to querying over and over. So I'm leaning towards that method, but I see two potential caveats:
Large databases, and time passed between fetching and inserting.
Large database results: How many rows can be in a query result before I need to consider switching to the first method? In other words, when is the strain of a large result set on the database server lower than running multiple subsequent queries? 1,000 results? 5,000? 20,000?
Time between fetch & insert: If I use the second method, I see a risk when two or more users try to run the same function simultaneously. The first user's result set (of unique strings fetched from the database) may not include the other user's unique string that has just been added 2ms after the query. This could produce duplicates in the database.
Is the second method actually feasable in production, or is it just a dream?
The second option does not seem practical to me. If you have only a few rows in the table, you have low risk of collisions, if you have many rows, the risk of collision increase but it is not memory efficient on the php-side to fetch all rows.
The first solution seems better to me.
But I think a third option can be used. Add a unique index on your random value in MySQL. Generate a random value, then try to insert it. Catch the error if a collision happen. That's efficient because MySQL is quick at checking if a value exist when it is indexed. You have no concurrency issue with this approach.
The only caveat (for all approaches) is when your number of rows in the table is high, you will have difficulties to find a yet unused value. To lower the risk of collision, you can increase the size of the random value. You could also create another table that contains unused values, and fill this table with an other algorithm when it has too few values.
Related
I have a multiple language table for texts. (Columns: hu, en, de, sr, ...)
The rows contain texts.
Every text in the website stored in this table. So I have a lot of select queries in different places.
Which is better, faster?
Lots of select queries with only 1-1 result
SELECT en FROM langtexts WHERE id='41';
SELECT en FROM langtexts WHERE id='63';
SELECT en FROM langtexts WHERE id='89';
Or one query with all the texts, stored in an array, and use this array to output texts.
SELECT en FROM langtexts
The table has a lot of rows with large texts.
You mention that:
The select queries are in different places in different files. I don't know all the exact ids before outputting the texts. I can only select all texts without ids. So this is useless for me.
However, you can pass variables between different PHP scripts by storing them in the PHP $_SESSION variable.
I think a way to achieve what you want would be to update a $_SESSION['GUIMessages'] with all the results of your query.
This variable, which is actually a vector containing the data in the respective database column, will then exist in memory for the remaining lifetime of the user's session.
In this scenario, you might save some processing time. However, if many users use the same language, this approach doesn't scale and I would expect it to actually yield worst results. This, since in the session would increase the overhead and decrease the performance, due to the fact that the "GUIMessages" variable will be kept for each user. Since sessions in PHP are stored in files by default, you'd actually be introducing extra disk I/O operations.
If the problem is convenience for you as a programmer, why don't you define a function getGUIMessages which takes care of the SQLQueries and receives a parameter key and returns an array messages? You then can invoke it wherever you need it without having the SQL Queries spread all over the place.
Second one is the faster solution, you can get all the required texts in one single query and use through an indexed array where you want to use.
I am building a system that requires to store certain records with a consecutive number. At this point AUTO INCREMENT works just fine, until there is need for ROLLBACKS: After such consecutive record exists, I have to perform some processes using that consecutive number and that process may Fail, which leads to an inevitable ROLLBACK...
The next time I try to insert a record, the AUTO INCREMENT column can not use the already lost consecutive number due to certain "unbreakable" rules from MySQL.
I can not use the MAX(id) + 1 way, because there may be another user in the system doing the same process, generating his/her own/next consecutive id with success.
I have an idea about this: To get all consecutive IDs so far in that table, and loop them with the program to find the first missing ID, although I'm not sure about this because another user may be doing the same (and there could be only 1 missing consecutive, so both users will try to insert using the same consecutive.... etc, etc.).
So, I need this consecutive to be consistent even if there is need of ROLLBACK.
Is there any alternative to AUTO INCREMENT (in MySQL, with PHP, or anything else), so that I can generate consecutive IDs in a consistent way no matter if I have to ROLLBACK one or two of those insertions?.
Thanks for any help.
First of all, there's no declarative way to do what you want. You have to write procedural code.
I can not use the MAX(id) + 1 way
Yes, you can. To start with, you have to write code that either
locks the table, or
tries again when the dbms returns a duplicate key error.
There are tradeoffs in each approach, but both can maintain an unbroken sequence of numbers even when there are rollbacks.
But that's not the whole story. You also have to prevent changes from breaking the sequence. So you must either revoke update and delete privileges on that table, or you must prevent updating or deleting an existing id number in some other way.
To get all consecutive IDs so far in that table, and loop them with
the program to find the first missing ID,
If you already have a broken sequence, you'd better fix that first.
Suppose Table1 contains column orderid (not a key, although it's NOT NULL and unique). It contains 5-digit numbers.
What's the best way to generate a php var $unique_var that is not in that column.
What could be important, from 10% to 30% of 5-digit numbers are in the table (so generating a number and checking while (mysql_num_rows() ==0) {} is not a best way to find one, are there any better solution for performance?).
Thank you.
If there is just 10-30% of numbers already taken - then it means that only 10-30% of queries will be performed at least twice. Which is not a big performance issue at all.
Otherwise - just create all-5-digit-numbers-list table (just 100k rows) and remove all that exist. When you need another random number - just pick one and delete.
I would suggest finding the biggest number (with a MAX() clause) and start from there.
Here are a couple of suggestions. Each has its drawbacks.
Pre-populate your table and add a column to indicate that a number is unused. Select an unused number using LIMIT = 1 and mark it used. This uses a lot of space.
Keep a separate table containing previously used numbers. If that table is empty, generate numbers sequentially from the last used number (or from 00001 if Table1 is empty). This requires some extra bookkeeping.
From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?
Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.
Thank you.
To get the number of items (rows in a table), you'd use standard SQL and do it on demand
SELECT COUNT(*) FROM MyTable
Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.
This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.
Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse
I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.
I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.
Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.
I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.
Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items
In my forum I count the sub-threads in a forum like this:
SELECT COUNT(forumid) AS count FROM forumtable
As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.
I need to check if some integer value is already in my database (which is growing all the time). And it should be done several thousand times in one script. I'm considering two alternatives:
Read all those numbers from MySQL database into PHP array and every time I need to check it, use in_array function.
Every time I need to check the number, just execute something like SELECT number FROM table WHERE number='#' LIMIT 1
On the one hand, searching in array which is stored in RAM should be faster than querying mysql every time (as I have mentioned, these checks are performed about a thousand times during one script execution). On the other hand, DB is growing, ant that array may become quite big and that may slow things down.
Question is - which way is faster or better by some other aspects?
I have to agree that #2 is your best choice. When performing a query with a LIMIT 1 MySQL stops the query when it finds the first match. Make sure the columns you intend to search by are indexed.
It sounds like you are duplicating a Unique Constraint in code...
CREATE TABLE MyTable(
SomeUniqueValue INT NOT NULL
CONSTRAINT MyUniqueKey UNIQUE (SomeUniqueValue));
How does the number of times you need to check compare with the number of values stored in the database? If it's 1:100 then your probably better of searching in the database each time, if it's (some amount) less then preloading the list will be faster. What happened when you tested it?
However even if the ratio is low enough for it to be faster loading the full table, this will gobble up memory and could, as a result, make everything else run more slowly.
So I would recommend not loading it all into memory. But if you can, then batch the checks up to minimise the number of round trips to the database.
C.
querying the database is the best option, one because you said the database is growing so that means new values are being added to the table, whereis in in_array you would be reading old values. Secondly, you might exhaust the RAM alloted to PHP with very large amount of data. Thirdly, mysql has its own query optimizers and other optimizations which makes it a far better choice as compared to php