Recover from failed INSERT query without additional db interrogations - php

I have to perform some INSERT upon creating users in a web application.
In the case that the query fails I want to know if it did because of a duplicate entry for the PRIMARY or for one of the other two indexes.
I'd like to avoid testing for each one with additional queries because I need to create around a half hundred users at a time, and it may take too long.
I searched the PHP manual looking for MYSQLI's error handling, but I only found $errno (code 1062 in my case) and $sqlstate (code 25000);
This info does not tell me which one of the indexes is the culprit.
Since the $error string reports the value that caused the failure (for example it says "Duplicate entry 'someValue' for key 'indexName_UNIQUE'") I was wondering if I can get 'someValue' somehow and therefore identify the culprit.
Running a strpos() on the string message doesn't look like good practice.
We are using MariaDB.

Related

Should I rely on MySQL errors in PHP code?

I was wondering if logic duplication can be reduced on this one. Let's say I have a users table and email column, which should be unique per record. What I normally do, is having a unique index on the column and validation code that checks if the value is already used:
SELECT EXISTS (SELECT * FROM `users` WHERE `email` = 'foo#bar.com')
Is it possible and practical to skip the explicit check and just rely on the database error when trying to put non-unique value? If we repeat the logic of uniqueness in two layers (database and application code), it's not really DRY.
I do that pretty often. In my custom database class I throw a specific exception for violated restrictions (this can be easily deduced from the numeric error code returned by MySQL) and I catch such exception upon insert.
It has the benefit of simplicity and it also prevents race conditions—MySQL takes care of data integrity in both variants, data itself and concurrent accesses.
The only drawback is that it can be tricky to figure out which index is failing when you have more than one (for instance, you may want to have a unique email and a unique username). MySQL drivers only report the violated key name in the text of the error message, there's no specific API for it.
In any case, you may want to inform the user about available names in an earlier stage, so a prior check can also make sense.
It makes sense to enforce the uniqueness of the email address in the database. Only that way you can be sure it is really unique.
If you do it only in the PHP code then any error in that code may corrupt the database.
Doing it in both is not needed but does, in my opinion, not offend against the DRY rule. For instance, you might want to check the presence of an email address during registration of a new user, and not only rely on the database reporting an error.
I assume by "DRY" you mean Don't Repeat Yourself. Applying the same logic in more than one place is not intrinsically bad - there's the old adage "measure twice, cut once".
In a more general case, I usually follow the pattern of applying the insert and catching the constraint violation, but for users with email addresses it's a much more complicated story. If your email is the only attribute required to be unique, then we can skip over a lot of discussion about a person having more than one account and working out which attribute is not unique when a constraint violation is reported. That the email is the only unqiue attribute is implied in your question, but not stated.
Based on your comments you appear to be sending this SQL from your PHP code. Given that, there are 2 real issues with polling the record first:
1) Performance and Capacity: it's an extra round trip, parse and read operation on the database
2) Security: Giving your application user direct access (particularly to tables controlling access) is not good for security. Its is much safer to encapsulate this as a stored procedure/function running with definer privileges and returning messages more aligned to the application logic. Even if you still go down the route of implementing poll first / insert if absent, you are eliminating most of the overhead in issue 1.
You might also spend a moment considering the difference between your query and...
SELECT 1 FROM `users` WHERE `email` = 'foo#bar.com'
On top of the database constraint, you should check if the email given already exists in it before trying to insert. Handling it that way is cleaner and allows for better validation and response for the client, without throwing an error.
The same goes for classic constraints such as MIN / MAX (note that such constraints are ignored on MySQL). You should check, validate and return a validation error message to the client before committing any change to the database.

Simulate MySQL connection to analyze queries to rebuild table structure (reverse-engineering tables)

I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

How to debug AJAX (PHP) code that calls SQL statements?

I'm not sure if this is a duplicate of another question, but I have a small PHP file that calls some SQL INSERT and DELETE for an image tagging system. Most of the time both insertions and deletes work, but on some occasions the insertions don't work.
Is there a way to view why the SQL statements failed to execute, something similar to when you use SQL functions in Python or Java, and if it fails, it tells you why (example: duplicate key insertion, unterminated quote etc...)?
There are two things I can think of off the top of my head, and one thing that I stole from amitchhajer:
pg_last_error will tell you the last error in your session. This is awesome for obvious reasons, and you're going to want to log the error to a text file on disk in case the issue is something like the DB going down. If you try to store the error in the DB, you might have some HILARIOUS* hi-jinks in the process of figuring out why.
Log every query to this text file, even the successful ones. Find out if the issue affects identical operations (an issue with your DB or connection, again) or certain queries every time (issue with your app.)
If you have access to the guts of your server (or your shared hosting is good,) enable and examine the database's query log. This won't help if there's a network issue between the app and server, though.
But if I had to guess, I would imagine that when the app fails it's getting weird input. Nine times out of ten the input isn't getting escaped properly or - since you're using PHP, which murders variables as a matter of routine during type conversions - it's being set to FALSE or NULL or something and the system is generating a broken query like INSERT INTO wizards (hats, cloaks, spell_count) VALUES ('Wizard Hat', 'Robes', );
*not actually hilarious
Start monitoring your SQL queries by starting the log. There you can look what all queries are fired and errors if any.
This tutorial to start the logger will help.
Depending on which API your PHP file uses (let's hope it's PDO ;) you could check for errors in your current transaction with s.th. like
$naughtyPdoStatement->execute();
if ($naughtyPdoStatement->errorCode() != '00000')
DebuggerOfChoice::log( implode (' ', $naughtyPdoStatement->errorInfo() );
When using the legacy-APIs there's equivalents like mysql_errno, mysql_error, pg_last_error, etc... which should enable to do the same. DebuggerOfChoice::Log of course can be whatever log function you'd like to utilise

mysql surpress dupe key error

When i need to know if something in unique before it gets inserted. i usually just attempt to insert it and then if it fails, check if the mysql_errno() is 1062. If it is i know it failed as a duplicate key and i can do whatever i need to do.
The most common place for this is in a user table. I set the email as unique as thats the "username" for logging in. Instead of running additional queries to check uniqueness when processing registration forms, i just compile the query, execute it and check for the 1062 error number. If it fails with 1062 i tell the user nicely that the email is registered and all is good.
However i recently set up a very basic MITM sql query function which gives myself and other developers on the system access to query times, a log of all the sql queries at the bottom of the page, and most importantly, a function which establishes the mysql connection to the correct database on demand (rather than having to do the connect and pass link identifiers manually).
In the sql error query log this function creates on disk, is all my duplicate entries. This obviously doesn't look good to other people seeing errors (even though there handled and expected). Is there a way of surpressing errors somehow for this but still being able to check the mysql_errno() ?
Whilst doing a bit of housework on my account here at SO, I thought it best to answer this with my findings so i can close it. This is basically a conclusion from my last comment above.
If you (like me) use certain error codes in mysql in your application to reduce validation queries or code (duplicate key being the most common i find). The only way to stop an error being thrown is to catch the error inside mysql and handle it. I wont go into the how-to here but a good place to get started is:
http://dev.mysql.com/doc/refman/5.0/en/declare-handler.html
Note: just for the new dev's out there, also dont forget to check out "ON DUPLICATE KEY" (google it). It was something blindly suggested to me elsewhere. It doesn't fit in this example but i've used it for year's to save checking for duplicate records before insertion (it does not return a failure on duplicate entries, so its only good if you were thinking of using a duplicate error handler to instead perform an update... hence finding your way here)

How to find similar messages in a larggest database

I have a database with 2.000.000 messages. When an user receipt a message I need find relevant messages in my database based in occurrence of words.
I had tried run a batch process to summarize my database:
1 - Store all words(except an, a, the, of, for...) of all messages.
2 - Create association between all messages and the words contained therein (I also store the frequence of this word appears in the message.)
Then, when I receipt a message:
1 - I parse words (it looks like with the first step of my batch process.)
2 - Perform query in database to fetching messages sorted by numbers of coincident words.
However, the process of updating my words base and the query to fetching similar messages are very heavy and slow. The word base update lasts ~1.2111 seconds for a message of 3000 bytes. The query similar messages lasts ~9.8 seconds for a message with same size.
The database tuning already been done and the code works fine.
I need a better algorithm to do it.
Any ideas?
I would recommend using setting up Apache Solr (http://lucene.apache.org/solr/). It is very easy to setup and index millions of documents. Solr handles all the optimization necessary (although it is open source so you can tweak it if you feel you need to).
You can then query using available APIs, I prefer the Java API SolrJ (http://wiki.apache.org/solr/Solrj). I typically see results returned in under one second.
Solr typically outperforms MySQL for text indexing.
Similarity Matchings is still a particularly complicated field, but you might take a look at full text matching in the MySQL Reference, particularly some of the more complex examples.
It should be possible for you to run a one-off job to build a similarity matrix for all your current messages, then just run a nightly batch to add new messages to the similarity matrix.

Categories