Improving SQL Queries

Improving SQL Queries - php

I have recently designed a website that used a lot of queries. During the time I was developing my website I came across an issues which was very time consuming and frustrating.
So the problem was that at a certain point I wanted to add an additional feature to the website where it would affect most on my queries and I needed to change most of them to make the feature work. So let me give an example: Lets says I have a users table, and I didn't add a column to check if a user is banned. Now I added the column "banned", and now the problem was that, I need to arrange all the other queries to check if the user is banned first. I hope that makes sense.
So my question is, is there a way I could minimize that work and instead of going through all the queries and revising them (To add the is user banned), I could instead add that feature once and the queries would work? Basically how can I improve?
I hope this makes sense and if not I would try my best to explain it further.
Any help would be greatly appreciated if it could help improve my coding knowledge.
PS. I am using SQL and PHP. If there is something better than SQL that would fix this problem, suggest away.
Thank you

I understand your problem. You have some column-rule. Which is globally used in all app. For example in my case there was 'status' column, and there were some logical meaning called 'important', which worked if column had one of the certain values in the set.
So, everywhere, to check if status is 'important', I needed to write:
WHERE `status` IN('INCIDENT', 'ERROR')
If I needed to add for example 'FLAGGED' to list of important statuses, I needed to rewrite all SQL queries:
WHERE `status` IN('INCIDENT', 'ERROR', 'FLAGGED')
Once I got tired of this. I decided to write a MySQL function which did this work. Called it IS_STATUS_IMPORTANT(status).
But this solution failed the test because it slowed down performance - it did not allow MySQL to use indexes properly.
I finally solved this problem by creating some set of app-global conditions, lets say:
class DbHelper {
public static function importanceCondition($column_name) {
return $column_name . " IN('INCIDENT', 'ERROR') ";
}
}
And now all over app I write:
$sql = 'SELECT * FROM blah .... WHERE ... AND ' . DbHelper::importanceCondition('x.status');
If I need to change some the logical condition I do it in one place and it applies all over the application.
In your case you could add some function
class DbHelper {
...
public static function validUserCondition($user_alias) {
return " ({$user_alias}.deleted = 0 AND {$user_alias}.banned = 0) ";
}
}

Why don't you check this during the login of the user? If (s)he is banned, the logon fails and all further queries are impossible to be executed in the first place.
Generally you should never spread the same logic to multiple code locations because of this multiple effort this will cause whenever you want to adjust something.
Create reuse methods where you have reuse. This could be even reuse to enhace a given SQL (prepared) statement with another WHERE condition or a method that performs the SQL request itself.

I understand your problem there are some easy solution for this using ORM (Object Relational Mapping) to query which supports multiple DB like (SQL, mysql, Oracle DB, NoSql...)
PHP ORM's like - Doctrine, Propel
ORM Better support for most of PHP Frameworks
Which has the better way you work with queries
Less complex in coding which splits up functionalities into classes
Easy to manage relations in table
I hope this will help you for modifying queries with less time and great performance

I assume you have code included in every page to make sure that the user has logged in sucessfully.
If that is the case then all you should need to do is change the login script to reject banned users, every other page will therefore work as it is and reject any users that are not logged in and none of the queries on these other pages would need to be changed.

Related

When to use stored procedures and triggers vs the applicative layer

I have a dilemma, which I hope you will have some expert opinions on.
I have a table called CARDS with a column STATUS. If a record's status changes from 'download' to 'publish', I have to insert the record reference into another table called CARD_ASSIGNMENTS. Additionally, the record needs to be added into CARD_ASSIGNMENTS as many times as there are active records in SCANNERS.
In other words, if there are two active scanners, I will end up with two records in CARD_ASSIGNMENTS as below:
ID CARD_ID SCANNER_ID STATUS_ID
1 1 1 4
2 1 2 4
My dilemma is that I'm not quite sure what would be the most efficient way to execute the above. I've considered the following options:
From PHP - Do one UPDATE query and then the INSERT queries.
Create a stored procedure, which will take care of updating the CARDS record and adding records into the CARD_ASSIGNMENTS. Then, just call that stored procedure from PHP.
Create an ON UPDATE trigger for the CARDS table which will take care of processing INSERTS into the CARD_ASSIGNMENTS table.
PS. A simplified version of my database is available on MySQL Fiddle
Thanks,
Kate

Interesting question.
I'm going to give you clues about how to approach the problem.
So, you have to start by defining precisely three things:
the expected functionality
the access policy to the functionality
the technical upgrade policy
Here I'll detail these points.
So, the first point is that you have to define your functionality. By doing so, you will be able to tell whether adding a card implies always, in all the possible paradigms (sorry for the pedantic word I can't find a more proper one) of your information system, that this card MUST exist in the other table according to the specifications you provided. This 1-1 functional link must be said TRUE or FALSE. This is really important.
Said with other words, if there's at least one possibility that one day you don't want to copy that record to the other table, it means the trigger is a wrong solution, or at least it should be thought with an emergency mode (for example a variable inside that allows it to not get executed in some conditions) setup on.
Then comes the second point, about the access policy. You have to know whether the allowed accessing systems will do so by using your application layer or if they could develop their own (SAAS style). If so, your php layer will be useless and the stored procedure is an excellent option, since every single technical and business layer will go trough it yes or yes.
The last thing to know is whether you're possibly going to upgrade your php layer one day. In most of the cases the answer is yes. If so, you might have to modify the part containing this sql logic you're talking about. Then, having everything into a stored procedure vs storing it hardcoded into the php will definitely save you time, and improve stability.
Left brain right brain, I'm going to tell you my personal opinion afterall. I really love going with stored procedures but not using any triggers. If the environment allows it, I would go for an underlying batch, calling a set of defined stored procedures, concentrating the activity outside of the online scope.
The advantages are the following:
none or less risks of interruption of the online workflow since you reduce the number of operations
different schedule to alliviate the database load
more secure policy since executing the stored procedure requires only one grant, while using the same sql with php would require insert/update grants
better logging quality: you can have a log per job
better emergency response: when a job fails (if well thought) you can restart it, and that's it.
Long post, but that was interesting and I really wanted to share these ideas.
Cheers!

I would use triggers. Some developers say, that if you have too many triggers and stored procedures, the database lives its own life, that means you never know what is going to happen on insert, update etc. But in my opinion, triggers may help you a lot to keep database consistent, so even if someone inserts data directly from some administration tool, the integrity is still kept, because all necessary commands are executed. If you choose stored procedures, you would still have to know, that you need to call this procedure to insert any new data.

What is the correct way of handling duplicate errors in db

I'm implementing a subscription in a DB. The email must be unique, so I have a UNIQUE index in the database. I have this code in my page init:
$f = $p->add('MVCForm');
$f->setModel('Account',array('name','surname','email'));
$f->elements['Save']->setLabel('Subscribe');
if($f->isSubmitted())
{
try
{
$f->update();
//More useful code to execute when all is ok :)
}
catch(Exception_ValidityCheck $v)
{
//Handles validity constraint from the model
$f->getElement($v->getField())->displayFieldError($v->getMessage());
}
catch(SQLException $se)
{
//If I'm here there is a problem with the db/query or a duplicate email
}
}
The only information in SQLException is a formatted HTML message, is this the only way to detect if the error is from a duplicated entry?

Here is one way to do it:
https://github.com/atk4/atk4-web/blob/master/lib/Model/ATK/User.php#L95
Although if you want to perform custom action on duplication, you should move getBy outside of the model, into page's logic.

As #Col suggested, we want to use "insert ignore".
$form->update() relies on Model->update() which then relies on DSQL class for building query. DSQL does support options, but model would generate fresh SQL for you.
Model->dsql() builds a Query for the Model. It can function with several "instances", where each instance has a separate query. I don't particularly like this approach and might add new model class, but it works for now.
Take a look here:
https://github.com/atk4/atk4-addons/blob/master/mvc/Model/MVCTable.php#L933
insertRecord() function calls dsql('modify',false) several times to build the query. The simplest thing you could do, probably, is:
function insertRecord($data=array()){
$this->dsql('modify',false)->option('IGNORE');
return parent::insertRecord($data);
}
after record is inserted, Agile Toolkit will automatically attempt to load newly added record. It will, however, use the relevant conditions. I think than if record is ignored, you'll get exception raised anyway. If possible, avoid exceptions in your workflow. Exceptions are CPU intensive since they capture backtrace.
The only way might be for you to redefine insertRecord completely. It's not ideal, but it would allow you to do a single query like you want.
I prefer to manually check the condition with loadBy (or getBy) because it takes model conditions and joins into account. For example, you might have soft delete on your table and while MySQL key would not let you enter, Model would and the model-way makes more sense too for business logic.

Why don't you want to run simple select to check if email is already taken?
Or make it INSERT IGNORE and then check affected_rows

Trigger or multiple queries ? Performance efficiency?

I am designing a web application using php and mysql. I have a little doubt in database.
The application is like
Users get themselves registered.
Users input workload (after login ofcourse :) ).
User logs out.
Now there are multiple types of inputs which i accept on a same form. Say there are 3 types of inputs and they are stored in 7 different tables (client requirement :( )
Now my question is what is the best way to fire a query after inputs are done ?
For now i can think of following ways.
Fire 7 different queries from php
Write a trigger to propagate inputs in appropriate tables ?
Just guide me which approach is performance efficient ?
Thanks :)

Generally you want to stay away from triggers because you will be penalized later if you have to load a lot of data. Stored procedures are the way to go. You can have different conditions set to propagate inputs into different tables if needed.

I think you need to re-think your situation. You already know how awesome it would be to have fewer tables to deal with? Well, why not simulate that situation with a properly constructed view. Then, the client (are you sure it is the client? Sometimes ops says "client", when they mean, "report which we need to provide later") can have as many tables as your database can handle. And, by the way, you can still fire inserts and updates on a view.
Because it seems like your database does not have a clear relationship with PHP data structures, my instinct will be to separate the two more, not less. This would mean actually favoring stored procedures and triggers (assuming the above is not workable), which can be harder to debug, but it also means that PHP only has to think about
"I am inserting into this thing called <thing name>"
Instead of
"OMG, so this is like, totally intense first I have to talk to <table 1>, but I can't forget <table 2>, especially since those two might have... wait, did I miss my turn?"
OK, PHP isn't a ditz (I actually like the language), but it also should also be acting as dumb as possible when it comes to actually storing things -- that's' not its business.

You probably want to write a stored procedure that runs the seven queries. Think hard about how many transactions you need to run those seven queries.

How often do you think you will have to change which queries to run?
Do you have access to the database server?
Do you know which circumstance should trigger your triggers?
Are there other processes/applications writing data to the database?
If your queries change very often, I would go for code in PHP to just run the queries for you.
If you don't have access to the database server you may actually have to go for that method! You need permissions to write stored procedures and triggers.
If other processes are writing to the same database you have to discuss your requirements with the respective process owners! Otherwise data may appear/change in your database that was unwanted.
I personally tend to stay away from triggers unless they call very simple stored procedures and I'm 100% certain that nobody else is going to be bothered by the trigger!

What should I do if anything related to database communication goes wrong?

Here's the problem: When a script starts modifying the database and something goes wrong, the database is usually corrupted. For example, lets say we have a User table and a Photos table.
A script creates a user dataset and in the next lines it attempts to create a photo dataset. The photo has a user_id column. Now lets assume something goes wrong and PDO's lastInserId() doesn't return the id of the user. So what happens in worst case: We get a user with no photo, and a photo with no valid user_id. Broken reference. 3 weeks to debug.
Are there any good strategies to follow, to prevent exactly this kind of problems? In my code below, you can see that I at least try to log that to a file and quit the script execution to prevent more damage and db curruption.
public function lastInsertId() {
$id = $this->dbh->lastInsertId();
if (!is_numeric($id)) {
$this->logError("DB::lastInsertId() did not return an id as expected!");
die();
}
return $id;
}
Maybe I have to use Transactions all over the place, at any time where an query B depends on a query A, and so forth? Is that the solution to go?
Should I do a "precaution rollback" before the die() call? I guess it would not hurt much at this point, would it? I'm not sure...

The solution would be to use transactions each time you have several queries for which it should be "all or none", yes -- that's the A of ACID : Atomicity.
You can do a rollback before your die, if you want ; it won't change much (a transaction that is not commited will automatically be rolled-back by the DB engine), but it will make your code more clear, and easier to understand.
As a sidenote : using die this way is probably not the "right" way to deal with errors : it'll prevent you from displaying any kind of "nice" error page, for instance.
A solution that's more often used is to have some kind of exception be thrown when there is such kind of problem -- and in a higher layer of your application (in one single place) deal with those exceptions, to display an error page.

Outside of using a transactional engine (InnoDB if you're using MySQL, or just use PostgreSQL, etc.) and wrapping the relevant atomic activities there's not a great deal you can do.
As #Seb says, you can create a transactional log and you could even use a master/slave database setup, but this won't really add much in terms of coverage.

You should keep a log of all transactions, so if the automated process goes wrong (even your rollbacks, fallback procedures, etc), you still can revert all effects back manually.

Functions in MySQL or PHP

Is it generally better to run functions on the webserver, or in the database?
Example:
INSERT INTO example (hash) VALUE (MD5('hello'))
or
INSERT INTO example (hash) VALUE ('5d41402abc4b2a76b9719d911017c592')
Ok so that's a really trivial example, but for scalability when a site grows to multiple websites or database servers, where is it best to "do the work"?

I try to think of the database as the place to persist stuff only, and put all abstraction code elsewhere. Database expressions are complex enough already without adding functions to them.
Also, the query optimizer will trip over any expressions with functions if you should ever end up wanting to do something like "SELECT .... WHERE MD5(xxx) = ... "
And database functions aren't very portable in general.

I try to use functions in my scripting language whenever calculations like that are required. I keep my SQL function useage down to a minimum, for a number of reasons.
The primary reason is that my one SQL database is responsible for hosting multiple websites. If the SQL server were to get bogged down with requests from one site, it would adversely affect the rest. This is even more important to consider if you are working on a shared server for example, although in this case you have little control over what the other users are doing.
The secondary reason is that I like my SQL code to be as portable as possible. I don't even want to try to count the different flavors of SQL that exist, so I try to keep functions (especially non-standard extensions) out of my SQL code, except for things like SUM or MIN/MAX.
I guess what I'm saying is, SQL is designed to store and retrieve data, and it should be kept to that purpose. Use your serving language of choice to perform any calculations beforehand, and keep your SQL code portable.

Personally, I try to keep the database as simple (to the minimum) with Insert, Update, Delete without having too much function that can be used in code. Stored Proc is the same, contain only task that are very close to persistence data and not business logic related.
I would put the MD5 outside. This will let met have this "data manipulation" outside the storage scope of the database.
But, your example is quite "easy" and I do not think it's bad to have it inside...

Use your database as means of persisting and mantaining data integrity. And leave business logic outside of it.
If you put business logic, any of it, in your database, you are making it more complex to manage and mantain in the future.

I think most of the time, you're going to want to leave the data manipulation to the webserver but, if you want to process databases with regards to tables, relations, etc., then go for the DB.
I'm personally lobbying my company to upgrade our MySQL server to 5.0 so that I can start taking advantage of procedures (which is killing a couple of sites we administer).

Like the other answers so far, I prefer to keep all the business logic in one place. Namely, my application language. (More specifically, in the object model, if one is present, but not all code is OO.)
However, if you look around StackOverflow for (my)sql-tagged questions about whether to use inline SQL or stored procedures, you'll find that most of the people responding to those are strongly in favor of using stored procs whenever and whereever possible, even for the most trivial queries. You may want to check out some of those questions to see some of the arguments favoring the other approach.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.