I'm sorry if this question is stupid or already asked, but I couldn't find much about it.
What is fastest / best method of unique storing in SQL
Option 1: Create unique index, and use a try -> catch block with PHP?
Option 2: Query to check if exists, and then act on that?
I would think option 2 is the best, but with option 1 I only have 1 query, vs 2 queries if not exists.
And since I need to minimize DB queries the best I can, I would go for option 1, but not sure if it a good option to bypass it with the try block?
Thanks
As with all optimisation related question, the anser is: well, it depends.
But let's get one thing straight: if you do not want duplicate value in a field or combination of fields, then use primary key or unique index constraints just to make sure that the integrity of data is not compromised under any circumstances.
So, your question is really: shall I check before inserting a record in a table if the data would violate uniqueness constraints.
If you do not really care whether the insert is successful or not, then do not check. Moreover, use insert ignore, so you do not even get an error message if the insert violates uniqueness constraints. Such situation could be, if you want to log if a user logs in within a certain period at least once or not.
Consider how costly it is to check before each and every insert and update, if the data violates any constraint and how often do you think it would occur. If it is a costly operation, then rely on the indexes to prevent the inserts with duplicate data and find out what data violates the constraints after you know that the query has failed.
Do not only consider the cost of the select, but also take into account if the insert is part of a larger transaction which may have to be rolled back in case an insert fails. Checking before the transaction even starts for constraint violations may have a positive impact on your db's performance.
In my opinion
Always use unique property for the field you want to be, actually - unique !
If by any means you can not do so, and also want to know before hand if the desired value already exists in the table / collection then additionally use if-Exists functionality.
Why not any one?
May be because improper sharding key in mongodb allows non-unique value in each shards!
Cannot offer knowledge in SQL but I think methodology goes same - use unique indexing where possible.
Cost effectiveness
Both methods cost server resources and hits the db server minimum twice.
So what's the big deal?
In the unknown universe -- you sent a request to know if a value exists and response was false to let you know the uniqueness! By the time you recognize it in the application server and requests the insertion may be someone else has already inserted the same value, by any chance in a busy server!
The server may never bother telling you the discrepancy as the indexing was unavailable!
For this point of view if - uniqueness is that important then you should enable indexing in table or collection level first!
Create unique auto incremented primary key index.
and then insert the data to SQL without entering the auto incremented primary key index value.
The inserting will never duplicate the data.
Related
I was wondering if logic duplication can be reduced on this one. Let's say I have a users table and email column, which should be unique per record. What I normally do, is having a unique index on the column and validation code that checks if the value is already used:
SELECT EXISTS (SELECT * FROM `users` WHERE `email` = 'foo#bar.com')
Is it possible and practical to skip the explicit check and just rely on the database error when trying to put non-unique value? If we repeat the logic of uniqueness in two layers (database and application code), it's not really DRY.
I do that pretty often. In my custom database class I throw a specific exception for violated restrictions (this can be easily deduced from the numeric error code returned by MySQL) and I catch such exception upon insert.
It has the benefit of simplicity and it also prevents race conditions—MySQL takes care of data integrity in both variants, data itself and concurrent accesses.
The only drawback is that it can be tricky to figure out which index is failing when you have more than one (for instance, you may want to have a unique email and a unique username). MySQL drivers only report the violated key name in the text of the error message, there's no specific API for it.
In any case, you may want to inform the user about available names in an earlier stage, so a prior check can also make sense.
It makes sense to enforce the uniqueness of the email address in the database. Only that way you can be sure it is really unique.
If you do it only in the PHP code then any error in that code may corrupt the database.
Doing it in both is not needed but does, in my opinion, not offend against the DRY rule. For instance, you might want to check the presence of an email address during registration of a new user, and not only rely on the database reporting an error.
I assume by "DRY" you mean Don't Repeat Yourself. Applying the same logic in more than one place is not intrinsically bad - there's the old adage "measure twice, cut once".
In a more general case, I usually follow the pattern of applying the insert and catching the constraint violation, but for users with email addresses it's a much more complicated story. If your email is the only attribute required to be unique, then we can skip over a lot of discussion about a person having more than one account and working out which attribute is not unique when a constraint violation is reported. That the email is the only unqiue attribute is implied in your question, but not stated.
Based on your comments you appear to be sending this SQL from your PHP code. Given that, there are 2 real issues with polling the record first:
1) Performance and Capacity: it's an extra round trip, parse and read operation on the database
2) Security: Giving your application user direct access (particularly to tables controlling access) is not good for security. Its is much safer to encapsulate this as a stored procedure/function running with definer privileges and returning messages more aligned to the application logic. Even if you still go down the route of implementing poll first / insert if absent, you are eliminating most of the overhead in issue 1.
You might also spend a moment considering the difference between your query and...
SELECT 1 FROM `users` WHERE `email` = 'foo#bar.com'
On top of the database constraint, you should check if the email given already exists in it before trying to insert. Handling it that way is cleaner and allows for better validation and response for the client, without throwing an error.
The same goes for classic constraints such as MIN / MAX (note that such constraints are ignored on MySQL). You should check, validate and return a validation error message to the client before committing any change to the database.
Suppose I'd like to check whether a comment is duplicated or not.
I have two options:
1) Create a query to database and check for it:
select * from comments where content=$santized_content and post_id=$id
2) Create a unique index for comment and post_id and catch MySQL error.
It's important for my complex and busy app to decrease number of queries to database as much as possible. However the first option is more usual and readable.
You can generalize this question to other situations.
MySQL is definitely faster than PHP. I would always prefer using a failing INSERT or a REPLACE against a suitable key than checking in PHP.
The only exception would probably be that your key becomes very complex which will obviously create overhead on MySQL for all queries run against that table. However, there isn't a one-size-fits-all answer to the question what would be too complex to be worthwhile doing. It's largely a matter of real life testing.
i prefer number 2 ,
your PHP script will less code.
It will check automatically whenever comment is duplicate or not.
then this is the function of UNIQUE.
I'm about to implement a memcached class which can be extended by our database class. However i have looked at many different ways of doing this.
My First question was, what is the point in the Memcached::set() as it seems to replace the value of the key. Does this not defeat the object of caching your results?
My Second question was, technically speaking what is the fastest/best way to update the value of a key without checking the results every time the query is executed. If this happens, the results been retreived would have to be checked constantly so there would be no point in caching as the results would constantly be connecting to the mysql database.
Lastly, What is the best way of creating a key? most people recommend using MD5 however using MD5 on a PDO query would be the same regardless.
I.E.
$key = MD5("SELECT * FROM test WHERE category=?");
the category could produce many different results however the key would be constantly replaced. Is there a best practice for this?
You set a cache entry when you had to read the database, so that next time, you don't have to read the database first. You'd check the cache, and if it was not there, or otherwise out of date, then you fall back to the database read, and reset the key.
As for a key name, it depends very much on the expected values of the category. If if was a simple integer, or string, I'd use a key like test.category:99 or test.category:car. If it was likely to be more, it may be useful to encode it, so there were no spaces in it (say, urlencode).
Finally, if it were any more complex than that - test:category:{MD5(category)}.
Since the key is only a reference to the data and you'll never be using it in any kind of SQL query, putting the value in there is not generally going to be a security issue.
Since you control when the cache is set, if the underlying database entry is changed, it's simple to also update the cache with the new data at the same time - you just have to use the same key.
I am currently maintaining a rather large office web-application. I recently became aware that via various developer-tools within web-browsers that values of select-boxes can easily be modified by a user (among other things). On the server side I do validation if the the posted data is numerical or not (for drop-downs), but don't actually check if the value exists in a database table, for example I have a dropdown box for salutation ('mr','ms','mrs','Mr/ms') etc. which correspond with a numerical values.
Currently I use Mysql's Myisam tables which don't offer foreign keys referential integrity, so I am thinking about moving to Innodb, yet this posses the following issue:
If I want to apply referential integrity (to insure valid ID's are inserted), it would mean I'd have to index all columns (if using integrity checks) that do not necessarily need to be indexed for performance issues at all (e.g. a salutation dropdown). If a very large database client-table has say 10 simular dropdowns (e.g. clientgroup, no. of employees, country-region etc) it would seem an overkill to index every linked table.
My questions:
1) when using referential integrity, do columns really need to be indexed also?
2) are there other practical solutions I may be overlooking? (e.g. use a separate query for every dropdown-list to see if the value exists in a table?)
3) How do other web-applications deal with such issues?
Help Appreciated!
thanks
Patrick
You only have to index the fields used in the foreign key relationships, and recent version of mysql do this automatically for you anyways. It's not "overkill". it's actually an optimization.
Consider that anytime you update/delete/insert a record, the foreign tables have to be checked for matching records - without the indexes, those checks could be glacially slow.
InnoDB automatically creates an index when you define a foreign key. If an index on that column already exists, InnoDB uses it instead of creating a new index.
As #MarcB mentioned in his answer, InnoDB uses these indexes to make referential integrity checks more efficient during some types of data changes. These changes include updating or deleting values in the parent table, and cascading operations.
You could use the ENUM data type to restrict a column to a fixed set of values. But ENUM has some disadvantages too.
Some web developers eschew foreign keys. To provide the same data integrity assurances, they have to write application code for every such case. So if you like to write and test lots of repetitive code, unnecessarily duplicating features the RDBMS already provides more efficiently, then go ahead! :-)
Most developers who don't use foreign keys don't write those extra checks either. They just don't have data integrity enforcement. I.e. they have sacrificed quality.
PS: I do recommend switching to InnoDB, and referential integrity is just one of the reasons to do so. Basically, if you want a database that supports ACID, InnoDB supports all aspects of that and MyISAM supports none.
Alright, I've got a question, not really an issue.
I've got a table in my database, fairly small, only 3 columns but potential to grow. I've got two solutions to my problem, but not sure why to use one or the other.
I've got a piece of data, which might or might not already be in the database. Two ways to solve this. I've got the unique ID, so it is easy to check.
Check if the records exists in the database, and if not, INSERT INTO database
Use REPLACE INTO, because I've got the ID already.
My question now is. Which one is better to use. What are the pros and cons in using either of the 2 results. Or is there a better result?
A note, the data is exactly the same, so there is no chance the record gets updated with a newer value. Thus the REPLACE INTO will insert data which is already there.
REPLACE INTO is not recommended here - you don't really need to replace anything. It does DELETE followed by INSERT, with all the consequences. For example all indexes have to be updated, which leads to unnecessary work and index fragmenting if you use it frequently.
On the other hand there is ON DUPLICATE KEY UPDATE, which is used mainly for counters, but you are not updating your row with increments or any other value changes, so you would have to use weird syntax like SET id=id or something similar.
Checking if the record exists in the database would be the best solution for you, but instead of using another query let mysql do that check for you and use:
`INSERT IGNORE INTO ...`
This way if you try to insert any row with duplicated unique or primary key it simply won't be inserted without generating any error. Note the side effect of possibly missing other error messages, but if you know exactly what you insert you should be fine.