Should I rely on MySQL errors in PHP code?

Should I rely on MySQL errors in PHP code? - php

I was wondering if logic duplication can be reduced on this one. Let's say I have a users table and email column, which should be unique per record. What I normally do, is having a unique index on the column and validation code that checks if the value is already used:
SELECT EXISTS (SELECT * FROM `users` WHERE `email` = 'foo#bar.com')
Is it possible and practical to skip the explicit check and just rely on the database error when trying to put non-unique value? If we repeat the logic of uniqueness in two layers (database and application code), it's not really DRY.

I do that pretty often. In my custom database class I throw a specific exception for violated restrictions (this can be easily deduced from the numeric error code returned by MySQL) and I catch such exception upon insert.
It has the benefit of simplicity and it also prevents race conditions—MySQL takes care of data integrity in both variants, data itself and concurrent accesses.
The only drawback is that it can be tricky to figure out which index is failing when you have more than one (for instance, you may want to have a unique email and a unique username). MySQL drivers only report the violated key name in the text of the error message, there's no specific API for it.
In any case, you may want to inform the user about available names in an earlier stage, so a prior check can also make sense.

It makes sense to enforce the uniqueness of the email address in the database. Only that way you can be sure it is really unique.
If you do it only in the PHP code then any error in that code may corrupt the database.
Doing it in both is not needed but does, in my opinion, not offend against the DRY rule. For instance, you might want to check the presence of an email address during registration of a new user, and not only rely on the database reporting an error.

I assume by "DRY" you mean Don't Repeat Yourself. Applying the same logic in more than one place is not intrinsically bad - there's the old adage "measure twice, cut once".
In a more general case, I usually follow the pattern of applying the insert and catching the constraint violation, but for users with email addresses it's a much more complicated story. If your email is the only attribute required to be unique, then we can skip over a lot of discussion about a person having more than one account and working out which attribute is not unique when a constraint violation is reported. That the email is the only unqiue attribute is implied in your question, but not stated.
Based on your comments you appear to be sending this SQL from your PHP code. Given that, there are 2 real issues with polling the record first:
1) Performance and Capacity: it's an extra round trip, parse and read operation on the database
2) Security: Giving your application user direct access (particularly to tables controlling access) is not good for security. Its is much safer to encapsulate this as a stored procedure/function running with definer privileges and returning messages more aligned to the application logic. Even if you still go down the route of implementing poll first / insert if absent, you are eliminating most of the overhead in issue 1.
You might also spend a moment considering the difference between your query and...
SELECT 1 FROM `users` WHERE `email` = 'foo#bar.com'

On top of the database constraint, you should check if the email given already exists in it before trying to insert. Handling it that way is cleaner and allows for better validation and response for the client, without throwing an error.
The same goes for classic constraints such as MIN / MAX (note that such constraints are ignored on MySQL). You should check, validate and return a validation error message to the client before committing any change to the database.

Related

Fastest vs best - unique index or check if exists

I'm sorry if this question is stupid or already asked, but I couldn't find much about it.
What is fastest / best method of unique storing in SQL
Option 1: Create unique index, and use a try -> catch block with PHP?
Option 2: Query to check if exists, and then act on that?
I would think option 2 is the best, but with option 1 I only have 1 query, vs 2 queries if not exists.
And since I need to minimize DB queries the best I can, I would go for option 1, but not sure if it a good option to bypass it with the try block?
Thanks

As with all optimisation related question, the anser is: well, it depends.
But let's get one thing straight: if you do not want duplicate value in a field or combination of fields, then use primary key or unique index constraints just to make sure that the integrity of data is not compromised under any circumstances.
So, your question is really: shall I check before inserting a record in a table if the data would violate uniqueness constraints.
If you do not really care whether the insert is successful or not, then do not check. Moreover, use insert ignore, so you do not even get an error message if the insert violates uniqueness constraints. Such situation could be, if you want to log if a user logs in within a certain period at least once or not.
Consider how costly it is to check before each and every insert and update, if the data violates any constraint and how often do you think it would occur. If it is a costly operation, then rely on the indexes to prevent the inserts with duplicate data and find out what data violates the constraints after you know that the query has failed.
Do not only consider the cost of the select, but also take into account if the insert is part of a larger transaction which may have to be rolled back in case an insert fails. Checking before the transaction even starts for constraint violations may have a positive impact on your db's performance.

In my opinion
Always use unique property for the field you want to be, actually - unique !
If by any means you can not do so, and also want to know before hand if the desired value already exists in the table / collection then additionally use if-Exists functionality.
Why not any one?
May be because improper sharding key in mongodb allows non-unique value in each shards!
Cannot offer knowledge in SQL but I think methodology goes same - use unique indexing where possible.
Cost effectiveness
Both methods cost server resources and hits the db server minimum twice.
So what's the big deal?
In the unknown universe -- you sent a request to know if a value exists and response was false to let you know the uniqueness! By the time you recognize it in the application server and requests the insertion may be someone else has already inserted the same value, by any chance in a busy server!
The server may never bother telling you the discrepancy as the indexing was unavailable!
For this point of view if - uniqueness is that important then you should enable indexing in table or collection level first!

Create unique auto incremented primary key index.
and then insert the data to SQL without entering the auto incremented primary key index value.
The inserting will never duplicate the data.

Is it redundant to check if a value is unique in both the application and database?

So let's say a user registers for an account, I would like to check if the email being used is already associated with another account...
On the database side, I put a unique constraint on the email column. Now on the application side, should I run a query to check whether that email is already in use and then if it isn't, run another query to insert the user? Or should I ignore that step and since I already have a unique constraint in the database column, I should just attempt to insert the user and if I get an error, I know the email is already in use?
Is running a query just to check for the email being redundant or is it a necessary step and why?
I am using PHP and MySQL.

Yes, it's redundant, but you might want to do it.
You have two choices, really:
Check in app, then add to database. There is a race condition there, so you need to Wrap the check-set in an application level mutex, or
Just push into the database and catch any exception raised from the database layer and handle (taking care to distinguish between constraint violation on column exceptions from all other kinds of possible run time exceptions).
Depending upon the relative ease of these two approaches, decide which works for you.

Should I check a unique constraint with php?

Maybe this question has already been asked, but I don't really know how to search for it:
I have the postgres-table "customers", and each customer has it's own unique name.
In order to achieve this, I added an unique-constraint to this column.
I access the table with php.
When the user now tries to create a new customer, with a name that has already been taken, the database says "Integrity Constraint Violation", and php throws an error.
What I want to do is to show an error in the html-input-field: "Customer-Name already taken" when this happens.
My question is how I should do this.
Should I catch the PDO-Exception, check if the error-Code is "UNIQUE VIOLATION", and than display a message according to the Exception-Message, or should I check for duplicate names with an additional statement before I even try to insert a new row?
What is better practice? Making a further sql-statement, or catching and analyzing error-codes.
EDIT:
I'm using transactions, and I'm catching any exception in order to rollback.
The question is, if I should filter out Unique-violations so they don't lead to a rollback.
EDIT2:
If I'm using the exception-method, I would have to analyse the exception-message in order to ensure that the unique-constraint really belongs to the "name"-column.
This is everything I get from the exception:
["23505",7,"FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint <customers_name_unique>\nDETAIL: Schlüssel <(name)=(test)> existiert bereits."]
The only way to get information about the column is to check if "customers_name_unique" exists (it's the name of the unique-constraint).
But as you can also see, the message is in german, so the output depends on the system / might be able to change.

You should catch the PDO exception.
It quicker to let the database fail, than to look up and see if the record already exists.
This also makes the application "less aware" of the business logic in the database. When you tell the database about the unique index that's really a business logic, and since the database is handling that particular logic it's better to skip the same check in the other layers (the application).
Also when the database layer is handling the exception you avoid race conditions. If your application is checking for consistency then you may risk that another user adds the same record after the first application has checked that it's available.

The question doesn't really belong here but I'll answer you.
Exceptions are situations when something exceptional happens. It means that you shouldn't use them to handle situation that may happen oftenly. If you do it then it's like GOTO code. The better solution is to check previosly if there is any duplicate row. However, the solution with exceptions is easier so you need to decide if you want something to work or if you want to have something that works written as it should be.

I would catch the exception, because (thanks to concurrency) that can happen anyway, even if you check with an extra query beforehand.

Errors are bad, I'd rather check if name does not exist before adding it. Well you should still check if no errors on insert, to avoid situation when concurrent scripts are trying to insert same name (there is a little time between checking for existance and insert, since its not a transaction).

On SAVE check if Exists (by a simple field, in your case: the Constraint column).
If affirmative - show notification to the user about duplication. But don't force the DB server to return you exceptions.

Should a PHP application perform error handling on incorrect database values?

Imagine this... I have a field in the database titled 'current_round'. This may only be in the range of 0,1,2,3.
Through the application logic, it is impossible to get a number above 3 or less than 0 into the database.
Should there be error checking in place to see if the value is malformed (i.e. not in the range 0-3)? Or is this just unnecessary overhead? Is it OK to assume values in a database are correctly formatted/ranged etc (assuming you sanatise/evaluate correctly all user input?)

I generally don't validate all data from the database. Instead I try to enforce constraints on the database. In your case depending on the meaning of 0, 1, 2, 3 I might use a lookup table with a foreign key constraint or if they are just numeric values I might use a check constraint (differs from DB vendor to the next).
This helps protect against changes made to the DB by someone with direct access and/or future applications that may use the same DB but not share your input validation process.

Wherever you decide to place validation prior to insertion in the database is where you should catch these things.
The process of validation should take place in one place and one place only. Depending on how your application is structured:
Is it procedural or object oriented?
If it is object oriented, then are you using an Active Record pattern, Gateway pattern or Data Mapper pattern to handle your database mapping?
Do you have domain objects that are separate from your database abstraction layer?
Then you will need to decide on where to place this logic in your application.
In my case, domain objects contain the validation logic and functions with data mappers that actually perform the insert and update functions to the database. So before I ever attempt to save information to the database, I confirm that there are valid values.

Get the database to do this for you. Most advanced DBMS (check out free DB2 Express-C at http://FreeDB2.com) allow you to define constraints. This way you are getting the database to ensure semantic integrity of your data. Getting this done in application code will work at the beginning but you will invariably find down the line that it will stop working for various reasons. You may have additional applications populate data in to the database or you may get a bug creeping in to existing app. The thing that happens most often is you get new people to work on the application and they will add code that will fail to perform the same level of checking that you have done.

In general, you should check for what you're expecting, either value or type. And act appropriately. Only after it fails all checks should maybe some code think about working out what to do with the 'wrong' value and how to fix things. This applies with a state value, like what you have, or with an input type that needs to be the correct type.

The constraints should be put on the database, just remember to catch any exceptions thrown if your application would by any chance try to insert/update an invalid value

How important are constraints like NOT NULL and FOREIGN KEY if I'll always control my database input with PHP?

I am trying to create a column in a table that's a foreign key, but in MySQL that's more difficult than it should be. It would require me to go back and make certain changes to an already-in-use table. So I wonder, how necessary is it for MySQL to be sure that a certain value is appropriate? Couldn't I just do that with a language like PHP, which I'm using to access this database anyway?
Similarly with NOT NULL. If I only access this database with PHP, couldn't I simply have PHP ensure that no null value is entered?
Why should I use MySQL to do enforce these constraints, when I could just do it with PHP?
I realize that NOT NULL is a very stupid part to neglect for the above reasons. But MySQL doesn't enforce foreign keys without a serious degree of monkeying around.
In your opinion, would it still be bad to use the "fake" foreign keys, and simply check if the values to be entered are matched in other tables, with PHP?

You are going to make mistakes with PHP, 100% guaranteed. PHP is procedural. What you want are declarative constraints. You want to tell the entire stack: "These are the constraints on the data, and these constraints cannot be violated." You don't want to much around with "Step 1 ... Step 2 ... Step 3 ... Step 432 ...", as your method of enforcing constraints on data, because
you're going to get it wrong
when you change it later, you will forget what you did now
nobody else will know all of these implicit constraints like you know them now, and that includes your future self
it takes a lot of code to enforce constraints properly and all the time - the database server has this code already, but are you prepared to write it?
The question should actually be worded, "Why should I use PHP to enforce these constraints, when I could just do it with MySQL?"

You can't "just" do it with PHP for the same reason that programmers "just" can't write bug-free code. It's harder than you think. Especially if you think it's not that hard.

If you can swear for the life of you that nothing will ever access the DB though any other means then your (of course bug-free) PHP page, then doing it with PHP alone will be fine.
Since real-world scenarios always contain some uncertainty, it is good to have the DB server watching the integrity of your data.
For simple databases, referential integrity constraints might not be an absolute requirement, but a nice-to-have. The more complex the application gets, the more benefit can you draw from them. Planning them in early makes your life easier later.
Additionally, referential integrity does it's part in forcing you to design the database in a more by-the-book manner, because not every dirty hack is possible anymore. This is also a good thing.

They are quite important. You don't want to define your model entirely through PHP. What if there is a bug in your PHP code? You could easily have null'ed columns where your business rules state you should not. By defining it at the database level, you at least get that check for free. You're going to really hate it when there are bugs in your PHP or if any other tool ever uses your database. You're just asking for problem, IMHO.
Be advised, this is the very short version of the story.

It's important to implement constraints in the database because it's impossible to predict the future! You just never know when your requirements will change.
Also consider the possibility that you may have multiple developers working on the same application. You may know what all the constraints are, but a junior developer may not. With constraints on the database, the junior developer's code will generate an error, and he'll know that something needs to be fixed. Without the constraints, the code may not fail, and the data could get corrupt.

I'm usually in favor of declaring constraints in the database. Arguments for constraints:
Declarative code is easier to make bug-free than Imperative code. Constraints are enforced even if app code contains bugs.
Supports the "Don't Repeat Yourself" principle, if you have multiple applications or code modules accessing the same database and you need business rules to be enforced uniformly. If you need to change the constraint, you can do it in one place, even if you have many apps.
Enforces data integrity even when people try to bypass the application, using ad hoc query tools to tinker with the database.
Enforces consistency which means that you can always be certain the data is in a valid state before and after any data update. If you don't use constraints, you may need to run periodic queries to check for broken references and clean them up.
You can model cascading update/delete easily with constraints. Doing the same thing in application code is complex and inefficient, cannot apply changes atomically (though using transaction isolation is recommended), and is susceptible to bugs.
Constraints help databases be more self-documenting, just as column names and SQL data types help.
Arguments against constraints:
More complex business rules cannot be modeled by declarative constraints, so you have to implement some in application space anyway. Given that, why not implement all business rules in one place (your app) and in the same language? This makes it easier to debug, test, and track code revisions.
Constraints often involve indexes, which incur some amount of overhead during inserts/updates. On the other hand, even if you don't declare a constraint, you probably need an index anyway, because the column may be used in search criteria or join conditions frequently.
Constraints can complicate your attempts to "clean up" mistakes in the data.
In your current project, the incompatibility of MyISAM vs. InnoDB with respect to referential constraints is causing some grief.

Enabling these constraints in MySQL takes almost zero time. If they save you from even a single bug due to faulty PHP or other code, isn't that worth it?
Keep in mind that the sorts of bugs you'll save yourself from can be rather nasty. Finding and fixing the bug itself may not be hard; the nasty part is that once you've fixed the bug you'll be left with a bunch of faulty data that may not even be salvageable.
I wouldn't even approach this problem from the "well, something other than PHP might access your data someday" angle. That's true, but even more important in my mind are the the headaches, time (money) and data loss that you can save yourself simply by adding a few simple constraints.

Use the database for structural data integrity, and use the BR layer for the rest. And catch errors as early as possible. They work together.
With luck, when your code as matured, you won't experience databse RI errors; and you can proudly announce yourself to be the first.

Even if your PHP code is perfectly bug-free, it may stop mid-script (out of memory error, segfault in some library, etc), leaving half-inserted stuff in the database, hence the importance of using InnoDB and transactions.
Same for constraints, of course you should have proper form validation, and database constraints behind it to catch bugs.
Database constraints are easy to specify, finding bugs in the application is hard, and even harder without constraints.
My experience has been that improperly constrained databases, and anything that uses MyISAM, WILL have inconssitent data after a few months of use, and it is very hard to find where it came from.

Having your data tier enforce data consistency through constraints helps to ensure your data remains consistent and provides cheap runtime bug checking within your application.
If you think constraints are not worthwhile you either have a small/non mission critical system or you are passing up a huge opportunity to improve the quality of your system. This cannot be understated.
Choices include: choosing a different RDBMS, reinvent your own metadata system or manually manage constraints. Manual management in queries without a metadata system quickly becomes infeasible to maintain and audit properly as schema/system complexity grows and unecessarily complicates an evolving schema.
My recommendation is to choose a different RDBMS.
Consistency checking is much harder than you may think. For example MySQL uses transactional read consistency which means the values you are checking against may not be the same values in the scope of another transaction. Consistency scemantics for concurrent access are very very difficult to get right if not bound directly to the data tier.
When all is said and done, even with a modest amount of effort put into manual checking, the likely outcome is that one would still be able to drive a truck through the corner cases you have failed to consider or committed an error in forming.
On your NOT NULL question... The obvious data field requirements are a good starting point. Here are a couple of other things to consider when defining column nullability.
It provides a guarantee that can very helpful when writing queries. Various joins may use NULL conditions to show a non-match of a table row separate from a NULL value that cannot be assumed if the condition allows nulls. (If NULLs are allowed, a match can mean either the row did not match or the row did match but the column value is null.)
The use of NOT NULL also helps define the rules for simpler queries matching values. Since you cannot say "WHEN value1 = value2" if both value1 and value2 are NULL the result of the evaluation is still false.

The most important thing about using NOT NULL to me, is more the documentation part. When I return to the project after a few months I forget which columns it is acceptable to have nulls in. If the column says NOT NULL, then I know I will never ever have to deal with potential null values from it. And if it allows null, then I know for sure I have to deal with them.
The other thing is, as others have noted: You may miss something somewhere, and cleaning up data sucks, or may be entirely impossible. It's better to know for sure that all data in your database is consistent.

I don't think you can be certain that your database will only be accessed by PHP and if so, by developers who will use it to respect those constraints for the entire lifecyle of your database.
If you include these constraints in your schema, then one can get a good idea of how the data is used and related by investigating your schema. If you only put all that in the code, then someone would have to look in both the database and the PHP code.
But shouldn't that stuff be in the design documentation, data dictionary, and logical database design?
Yes, but these documents are notorious for getting out of date and stale. I know you would never allow that to happen, but some people who have experience with projects with less discipline may assume this about your project, and want to consult the actual code and schema rather than documentation.

I highly appreciate your question, as I am deeply convinced that default-value rules should be implemented on the code-side, not on the database-side, and this for a very simple reason: when users are the one that initiate database changes (INSERTS, SELECTS and UPDATES), these changes shall integrate all business rules, and default values are basically business rules:
There is no invoice without invoice number
There is no invoice line without a quantity, and 0 or nulls are not acceptable
There is no incoming mail without date of reception
etc
We have decided a few years ago to get rid of all these "database-side" artefacts like "not null", "(do not) allow empty strings", and other "default value" tricks, and it works perfectly. Arguments in favor of the default value mainly refer to a kind of "security" principle ("do it on the database side because you will forget to to it on the code side / your language is not made for that/it's easier to do it on the database side") that does not make any sense once you have chosen not to implement any default value on the database side: just check that your business rules are properly implemented while debugging.
For the last 2 years, nobody in the team ever thought of declaring a default value in a table. I guess that our younger trainee does not even know about something that is called "default value".
EDIT: rereading some of the answers here, my final comment would be: do it on any side, either DB or code, but make your choice and do it on one side only! There is nothing more dangerous than having such controls on both sides, because eventually (1) you'll never know if both sides are really implementing the same rule, meaning that (2) checking the rules will mean checking both sides, which can really become a mess! The worst situation is of course when one part of the job is done on the database side (ie the rules that were identified when the database was created) and the other part (ie the newly identitified rules) done on the client side ... nightmare ....

Implement default values and constraints at the database level; rules that will result in acceptable data to any consuming application. This insulates you from integrity issues.
Then, implement better default values and constraints at the application level. If you are prohibited technically (access to APIs external to the database) from implementing a constraint, or generating a default value at the database level, the application is where you'll need to do it. This will result in a better default value. However a segfault, or general failure of the application will not result in unacceptable data being persisted.

I'm afraid this is a religious topic.
From a puristic point-of-view, you want the database to do the referential integrity. This is ideal when you have a multiplicity of applications accessing the database, because the constraints are in one place. Unfortunately, the real world is not ideal.
If you have to enforce some sort of referential integrity, in my experience, your application will need to know how to do this. This is regardless of whether it is the final arbiter, or the database checks it as well. And even if the database does do the referential integrity, then the application has to know what to do if the database rejects an update, because referential integrity would be violated...
As a sidenote, setting up MySQL to support foreign key constraints is a bit of a process because you need to shift to InnoDB. If you do just that, you can get a lot of performance back by setting innodb_flush_log_at_tx_commit to 1. But it probably would be better if you can instead re-engineer your site to be transaction-aware. Then you get two benefits of InnoDB.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.