How to manage SQL Statements in Data Layer

How to manage SQL Statements in Data Layer - php

in a PHP project we already have separated business logic from database access. All database tasks are encapsulated in different database classes grouped by database and topic.
Theses classes look very horrible, half the source are SQL strings, that get filled with params and so on. We thought of putting the SQL in "other" locations like resource files or something. What is considered best practise for this and do you know any supporting tools/libs for PHP?
Kind Regards
Stephan

You should use stored procedures wherever it is possible. That way you enhance performance, security and code maintenance. This should be your first approach.
If you still want to separate the SP queries from the DAL, why not store them in a database? It may seem odd to store SQL queries in the database for abstraction, since a query is needed to extract other queries. This is actually a quite common approach, where you can select queries matching a certain criteria and possibly (if necessary) to build up the queries dynamically.
Another approach may be to create Query-classes where queries are built up dynamically;
class FruitQuery {
...
public function addTypeCriteria($type) {
$this->internalSQLCriterias[] = "fruit=:type";
$this->internalSQLParameters[] = array(':type', $type);
}
...
public function create() {
$this->internalSQLQuery = "SELECT ... FROM Fruits";
if (sizeof($this->internalSQLCriterias) > 0) {
$this->internalSQLQuery .= " WHERE ";
$moreThanOne = '';
foreach ($this->internalSQLCriterias as $criteria) {
$this->internalSQLQuery .= $moreThanOne . $criteria;
$moreThanOne = " AND ";
}
}
}
...
public function execute() {
/* Bind the parameters to the internalSQLQuery, execute and return results (if any) */
}
...
This class is absolutely not complete in any way, and you might want to rethink the structure of it - but you probably get the point I'm trying to make. :) Of course you have to filter the input to the Query-builder to avoid security breaches!

I don't know PHP, but from my experience with other languages I can tell you this much: data access layers are a prime target of architecture astronauts. There are so many "best practices" that none of them are really best. When designing a DAL, it's very easy to fall into the trap of over-abstracting. Just go as far as you need.
I almost always use stored procedures in order to avoid spaghetti code and simplify authorization in the database, not for performance reasons; performance gains from stored procs can be hard to pin down because of the complexities of when and how database engines prepare them. On the other hand, if I need to code a very flexible database operation (like on a search screen with many inputs), I will sometimes just put the SQL right in the code. Sometimes it's going to be an unreadable mess no matter where you put it. You have to do the work somewhere.
If you're not (unnecessarily) mixing SQL and procedural code, put the SQL wherever it makes the most sense for the scope and scale of your application. Sorry I can't answer your question about tools and libs for PHP, but I hope this is helpful.

Well, you could always use PDO for a consistent API across different databases and write portable SQL statements.
Another option would be to use a database abstraction layer such as Zend_DB.

You shouldn't have almost any SQL in your database access layer as it should be merely abstract the communication with the database regardless of what actual SQL it's communicating.
In the now famous MVC pattern, your business logic is what typically contains the SQL which forms the Model layer.
Putting all these "religious" definitions aside, what you have now is moderately normal, to end up with piles of SQL. SQL has to exist, somewhere. Depending on your priorities and performance requirements here is what I would do (ordered by performance compromise):
If there is noticeable repetition in the SQL, I'd throw in a quick refactoring iteration to hide all the common SQL inside methods. The methods do not necessarily have to execute it, but just build it. It all depends on your application and in which way the SQL is complex. If you don't have an underlying layer which does the actual communication with the database, maybe part of the refactoring could be to add it.
I'd consider a Query builder. Which is a very good balance between performance and flexibility. You can only find a query builder as part of an ORM or Database Access layer (like Zend_Db and its sub components), Propel and/or Doctrine. So you can either port a query builder off one of those to your project without using the whole layer (which really shouldn't be hard as all of them are PDO-based). This shouldn't add any noticeable performance issues.
I'd consider the Doctrine ORM. This has considerable effect in performance though. You will end up with very maintainable code however.
Finally, I'd never consider putting the SQLs into Resources or something like that.

Related

Chaining MySQL commands Vs. Raw queries

I have been building a lot of website in the past using my own cms/framework and I have developed a simple way of executing queries. Recently I have started playing with other frameworks such as code igniter. They offer raw query imputs such as…
$this->db->query(“SELECT * FROM news WHERE newsId=1;”);
But they also offer chaining of MySQL command via PHP methods.
$this->db->select("*")->from("news")->where("newsId=?");
The question is; what is the main difference and of benefits of each option.
I know the latter options prevents MySQL injection but to be honest you can do exactly the same from using $this->db->escape().
So in the end from what I can see the latter option only serves to make you use more letters on your keyboard, this you would think would slow you down.

I think the implementation of activerecord in codeigniter is suitable for small and easy queries.
When you need complex queries with lots of joins, it is more clear to just write the query itself.
I don't think that an extra layer of abstraction will ever give you better performance, if you have a certain skill in SQL.

Most recent php framework developers are uses AR(active record)/DAO(database access object) Pattern. Because it's really faster then raw query. Nowadays AR technique originally built from PDO(php data object).
why active record is really faseter?
its true query writing is the best habit for a developer. But some problem make it tough
1. When we write insert and update large query, sometime it's hard to match every row value.. but AR make it easy. you just add array first and then execute easily.
2. Doesn't matter what DB you use.
3. Sometimes it's really hard read or write query if it has many condition. But in AR you can cascade many object for 1 query.
4. AR save your time to repeating statement

I can't speak for CodeIgniter (what I've seen of it seems rather slung-together, frankly), but there are a few reasons such systems may be used:
as part of an abstraction layer which supports different DBMS back-ends, where for instance ->offset(10)->limit(10) would automatically generate the correct variant of OFFSET, LIMIT, and similar clauses for MySQL vs PostgreSQL etc
as part of an "ORM" system, where the result of the query is automatically mapped into Model objects of an appropriate class based on the tables and columns being queried
to abstract away from the exact names of tables and columns for backwards-compatibility, or installation requirements (e.g. the table "news" might actually be called "app1_news" in a particular install to avoid colliding with another application)
to handle parameterised queries, as in your example; although largely unrelated to this kind of abstraction, they provide more than just escaping, as the DBMS (MySQL or whatever is in use) knows which parts of the query are fixed and which are variable, which can be useful for performance as well as security

SQL statements vs MVC data access layer in PHP

Modern MVC frameworks have their own implementation of data access layers that do not require SQL statements to be written. In terms of performance and scalability, are there any drawbacks, for instance, when using
$user = User::where('email', '=', $email)->first();
instead of using prepared statements in raw SQL like
$user = DB::connection()->pdo->prepare("SELECT * from users where `email` = ? " ) ;
Since MVC frameworks like Laravel and Cakephp also allow the latter approach, I am not sure which of the two method is better in terms of performance and scalability.

Rant:What you call "modern MVC frameworks" (with few exceptions) are nowhere close implementing MVC. And those "layers that do not require SQL statements" are actually extremely harmful in large scale projects(where MVC should be actually used).
My advice would be to avoid use of any built-in ORM or query-builder. The ORMs that so-called "mvc frameworks" are bundled with are usually implementations of active record, which has extremely limited use-case. Basically, AR based implementations for domain entities are pragmatic only if you are using just the basic CRUD operations (no JOINs or other above-beginner level sql queries) and only simple attribute validation (no cross-checked fields or interactions with other entities). Technically you can use active record instances in more complicated cased, but then you ill start to incur technical debt.
The best option would be to separate the domain logic from storage logic and implement domain objects and data mappers for each of the aspect of model layer respectively.

Yes, there are drawbacks both in terms of performance and scalability.
All these ORMs and ARs are quite good only with basic queries.
But when it comes to some complex issues, they become either unbearable complex or merely helpless.
There is no way to inject "USE INDEX", "DELAYED" or the like performance-boosting commands in these sleek operators.
Same goes for scalability.
Every time you're going to use whatever non-standard operator, you gonna scratch your head.
There is also a portability issue.
SQL is a lingua franca for the web-dewelopers, everuone could read and write it.
While proprietary ORM can put them in a fix.
Nevertheless, your second code is no less ugly and unusable.
$user = DB::connection()->pdo->prepare("SELECT * from users where email=?");
DB::connection()->pdo->prepare() does not return no users. It returns a statement handle which have to be used in the following several lines to get the actual user info.
Adding tons of useless code in your scripts.
And it's ordinal case with select from scalar. Try it with INSERT or a mere IN() statement and your code will be blown up to several screens high.
Why not to make it to really get user info?
$user = DB::conn()->getRow("SELECT * from users where email=?s",$email);
Look - you keep your SQL yet made it usable.

of course you will always have the overhead of running through a class and assembling the query.
Yet it helps you to prevent errors. Typos like "were id =" cant happen(or shouldnt). Except from that those layers already do a lot of stuff for you.
Like escaping, parsing, validating etc... so take the overhead but be sure a lot of failures or security issues wont happen

Interacting with the database using layers of separation (PHP and WordPress)

I quite often see in PHP, WordPress plugins specifically, that people write SQL directly in their plugins... The way I learnt things, everything should be handled in layers... So that if one day the requirements of a given layer change, I only have to worry about changing the layer that everything interfaces.
Right now I'm writing a layer to interface the database so that, if something ever changes in the way I interact with databases, all I have to do is change one layer, not X number of plugins I've created.
I feel as though this is something that other people may have come across in the past, and that my approach my be inefficient.
I'm writing classes such as
Table
Column
Row
That allow me to create database tables, columns, and rows using given objects with specific methods to handle all their functions:
$column = new \Namespace\Data\Column ( /* name, type, null/non-null, etc... */ );
$myTable = new \Namespace\Data\Table( /* name, column objects, and indexes */ );
\Namespace\TableModel.create($myTable);
My questions are...
Has someone else already written something to provide some separation between different layers?
If not, is my approach going to help at all in the long run or am I wasting my time; should I break down and hard-code the sql like everyone else?
If it is going to help writing this myself, is there any approach I could take to handle it more efficiently?

You seem to be looking for an ORM.
Here is one : http://www.doctrine-project.org/docs/orm/2.0/en/tutorials/getting-started-xml-edition.html

To be honest, I'd just hard-code the SQL, because:
Everyone else does so too. Big parts of WordPress would need to be rewritten, if they would ever wish to change from MySQL to something else. It would just be a waste of time to write your perfect layer for your plugin, if the rest of the whole system still only works with hard-coded SQL.
We don't live in a perfect world. Too much abstraction will - soon or late - end up in performance and other issues, which I don't even think of yet. Keep it simple. Also, using SQL you can benefit from some performance "hacks", which maybe won't work for other systems.
SQL is a widely accepted standard and can already be seen as abstraction layer. for example there's even the possibility to access Facebook's Graph via SQL-like syntax (see FQL). If you want to change to another data-source, you'll probably find some layer wich supports SQL-syntax anyways! In that sense, you could even say SQL already is some kind of abstraction layer.
But: if you decide to use SQL, be sure to use WordPress' $wpdb. Using that, you're on the safe side, as WordPress takes care of connecting to the database, forming the queries, etc. If, one day, WordPress will decide to change from databases to something else, they'll need to create a $wpdb-layer to that new source - for backwards compatibility. Also, many general requests already are in $wpdb as functions (such as $wpdb->insert()), so there's no direct need to hard-code SQL.
If however, you decide to use such an abstraction layer: Wikipedia has more information.
Update: I just found out that the CMS Drupal uses a database abstraction layer - but they still use SQL to form their queries, for all the different databases! I think that shows pretty clearly, how SQL can already be used as an abstraction layer.

how to separate sql from php code

I have a class that helps me to handle users.
For example:
$user = new User("login","passw");
$name = $user->getName();
$surname = $user->getSurname();
$table = $user->showStats();
All these methods have SQL queries inside. Some actions require only one sql queries, some - more than one. If database structure changes - it will be difficult to change all queries (class is long). So I thought to keep SQL queries away from this class. But how to do this?
After reading this question I've known about Stored Procedures. Does it mean, that now one action requires only one SQL query (call of Stored Procedure)? But how to organize separation sql from php? Should i keep sql-queries in an array? Or may be it should be an sql-queries class. If yes, how to organise this class (maybe what pattern I should learn)

This is a surprisingly large topic, but I have a few suggestions to help you on your way:
You should to look into object-relational mapping, in which an object automatically generates SQL queries. Have a look at the Object-Relational Mapping and Active Record articles for an overview. This will keep your database code minimal and make it easier if your table structure changes.
But there is no silver bullet here. If your schema changes you will have to change your queries to match. Some people prefer to deal with this by encapsulating their query logic within database views and stored procedures. This is also a good approach if you are consistent, but keep in mind that once you start writing stored procedures, they are going to be tied heavily to the particular database you are using. There is nothing wrong with using them, but they are going to make it much more difficult for you to switch databases down the road - usually not an issue, but an important aspect to keep in mind.
Anyway, whatever method you choose, I recommend that you store your database logic within several "Model" classes. It looks like you are doing something similar to this already. The basic idea is that each model encapsulates logic for a particular area of the database. Traditionally, each model would map to a single table in the DB - this is how the Ruby on Rails active record class works. It is a good strategy as it breaks down your database logic into simple little "chunks". If you keep all of the database query logic within a single file it can quickly grow out of control and become a maintenance nightmare - trust me, I've been there!
To get a better understanding of the "big picture", I recommend you spend some time reading up on web Model-View-Controller (MVC) architecture. You will also want to look at the established PHP MVC frameworks, such as CodeIgniter, Kohaha, CakePHP, etc. Even if you do not use one - although I recommend you do - it would be helpful to see how these frameworks organize your code.

I would say you should look into implementing the "repository" design pattern in your code.
A good answer to how to implement this would be too long for this space, so I'll post a couple of PHP-oriented references:
travis swicegood -- Repository Pattern in PHP
Jon Lebensold -- A Repository Pattern in PHP

You are on the right lines if you use separation of concerns to separate your business logic from your data access logic you will be in a better place.

Judging by your "there are already 2K lines of code" statement, you're either maintaining something, or midway through developing something.
Both Faust and Justin Ethier make good recommendations - "how should I separate my database access from my application code" is one of the oldest, and most-answered, questions in web development.
Personally, I like MVC - it's pretty much the default paradigm for web development, it balances maintainability with productivity, and there are a load of frameworks to support you while you're doing it.
You may, of course, decide that re-writing your app from scratch is too much effort - in which case the repository pattern is a good halfway house.
Either way, you need to read up on refactoring - getting from where you are to where you want to be is going to be tricky. I recommend the book by Fowler, as a starter.
Could you explain more about why your database schema may change? That's usually a sign of trouble ahead.....

Is it a good idea to use CodeIgniters Active Record library to manipulate MySQL databases or should I just use SQL?

I'm starting to get to grips with CodeIgniter and came across it's support for the Active Record pattern.
I like the fact that it generates the SQL code for you so essentially you can retrieve, update and insert data in to a database without tying your application to a specific database engine.
It makes simple queries very simple but my concern is that it makes complex queries more complex if not impossible (e.g. if need for engine specific functions).
My Questions
What is your opinion of this pattern especially regarding CodeIgniters implementation?
Are there any speed issues with wrapping the database in another layer?
Does it (logic) become messy when trying to build very complex queries?
Do the advantages out way the disadvantages?

Ok, First of all 99% of your queries will be simple select/insert/update/delete. For this active record is great. It provides simple syntax that can be easily changed. For more complex queries you should just use the query method. Thats what its for.
Second, It provides escaping & security for those queries. Face it, your application probably will have hundreds if not thousands of places where queries take place. Your bound to screw up and forget to properly escape some of them. Active record does not forget.
Third, performance in my experience is not dramatically affected. Of course it is but its probably around .00001 per query. I think that is perfectly acceptable for the added security and sanity checks it does for you.
Lastly, I think its clear that i believe the advantages are far greater than the disadvantages. Having secure queries that even your most junior developer can understand and not screw up is a great thing.

What is your opinion (sic) of this pattern especially regarding CodeIgniters implementation?
Can't say much about CI's implementation. Generally I avoid AR for anything but the simplest applications. If the table does not match 1:1 to my business objects, I don't use AR, as it will make modeling the application difficult. I also don't like the idea of coupling the persistence layer to my business objects. It's a violation of separation of concerns. Why should a Product know how to save itself? Futher reading: http://kore-nordmann.de/blog/why_active_record_sucks.html
EDIT after the comment of #kemp, I looked at the CI User Guide to see how they implemented AR:
As you can see in PoEAA an AR is an object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. This is not what CI does though. It just provides an API to build queries. I understood that there is a Model class which extends AR and which can be used to build business objects, but that would be more like a Row Data Gateway then. Check out PHPActiveRecord for an alternate implementation.
Are there any speed issues with wrapping the database in another layer?
Whenever you abstract or wrap something into something else, you can be sure this comes with a performance impact over doing it raw. The question is, is it acceptable for your application. The only way to find out is by benchmarking. Further Reading: https://stackoverflow.com/search?q=orm+slow
EDIT In case of CI's simple query building API, I'd assume the performance impact to be neglectable. Assembling the queries will logically take some more time than just using passing a raw SQL string to the db adapter, but that should be microseconds only. And you as far as I have seen it in the User Guide, you can also cache query strings. But when in doubt, benchmark.
Does it (logic) become messy when trying to build very complex queries?
Depends on your queries. I've seen pretty messy SQL queries. Those don't get prettier when expressed through an OO interface. Depending on the API, you might find queries you won't be able to express through it. But then again, that depends on your queries.
Do the advantages out way the disadvantages?
That only you can decide. If it makes your life as a programmer easy, sure why not. If it fits your programming needs, yes. Ruby on Rails is build heavily on that (AR) concept, so it can't be all that bad (although we could argue about this, too :))

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.