I quite often see in PHP, WordPress plugins specifically, that people write SQL directly in their plugins... The way I learnt things, everything should be handled in layers... So that if one day the requirements of a given layer change, I only have to worry about changing the layer that everything interfaces.
Right now I'm writing a layer to interface the database so that, if something ever changes in the way I interact with databases, all I have to do is change one layer, not X number of plugins I've created.
I feel as though this is something that other people may have come across in the past, and that my approach my be inefficient.
I'm writing classes such as
Table
Column
Row
That allow me to create database tables, columns, and rows using given objects with specific methods to handle all their functions:
$column = new \Namespace\Data\Column ( /* name, type, null/non-null, etc... */ );
$myTable = new \Namespace\Data\Table( /* name, column objects, and indexes */ );
\Namespace\TableModel.create($myTable);
My questions are...
Has someone else already written something to provide some separation between different layers?
If not, is my approach going to help at all in the long run or am I wasting my time; should I break down and hard-code the sql like everyone else?
If it is going to help writing this myself, is there any approach I could take to handle it more efficiently?
You seem to be looking for an ORM.
Here is one : http://www.doctrine-project.org/docs/orm/2.0/en/tutorials/getting-started-xml-edition.html
To be honest, I'd just hard-code the SQL, because:
Everyone else does so too. Big parts of WordPress would need to be rewritten, if they would ever wish to change from MySQL to something else. It would just be a waste of time to write your perfect layer for your plugin, if the rest of the whole system still only works with hard-coded SQL.
We don't live in a perfect world. Too much abstraction will - soon or late - end up in performance and other issues, which I don't even think of yet. Keep it simple. Also, using SQL you can benefit from some performance "hacks", which maybe won't work for other systems.
SQL is a widely accepted standard and can already be seen as abstraction layer. for example there's even the possibility to access Facebook's Graph via SQL-like syntax (see FQL). If you want to change to another data-source, you'll probably find some layer wich supports SQL-syntax anyways! In that sense, you could even say SQL already is some kind of abstraction layer.
But: if you decide to use SQL, be sure to use WordPress' $wpdb. Using that, you're on the safe side, as WordPress takes care of connecting to the database, forming the queries, etc. If, one day, WordPress will decide to change from databases to something else, they'll need to create a $wpdb-layer to that new source - for backwards compatibility. Also, many general requests already are in $wpdb as functions (such as $wpdb->insert()), so there's no direct need to hard-code SQL.
If however, you decide to use such an abstraction layer: Wikipedia has more information.
Update: I just found out that the CMS Drupal uses a database abstraction layer - but they still use SQL to form their queries, for all the different databases! I think that shows pretty clearly, how SQL can already be used as an abstraction layer.
Related
I have a class that helps me to handle users.
For example:
$user = new User("login","passw");
$name = $user->getName();
$surname = $user->getSurname();
$table = $user->showStats();
All these methods have SQL queries inside. Some actions require only one sql queries, some - more than one. If database structure changes - it will be difficult to change all queries (class is long). So I thought to keep SQL queries away from this class. But how to do this?
After reading this question I've known about Stored Procedures. Does it mean, that now one action requires only one SQL query (call of Stored Procedure)? But how to organize separation sql from php? Should i keep sql-queries in an array? Or may be it should be an sql-queries class. If yes, how to organise this class (maybe what pattern I should learn)
This is a surprisingly large topic, but I have a few suggestions to help you on your way:
You should to look into object-relational mapping, in which an object automatically generates SQL queries. Have a look at the Object-Relational Mapping and Active Record articles for an overview. This will keep your database code minimal and make it easier if your table structure changes.
But there is no silver bullet here. If your schema changes you will have to change your queries to match. Some people prefer to deal with this by encapsulating their query logic within database views and stored procedures. This is also a good approach if you are consistent, but keep in mind that once you start writing stored procedures, they are going to be tied heavily to the particular database you are using. There is nothing wrong with using them, but they are going to make it much more difficult for you to switch databases down the road - usually not an issue, but an important aspect to keep in mind.
Anyway, whatever method you choose, I recommend that you store your database logic within several "Model" classes. It looks like you are doing something similar to this already. The basic idea is that each model encapsulates logic for a particular area of the database. Traditionally, each model would map to a single table in the DB - this is how the Ruby on Rails active record class works. It is a good strategy as it breaks down your database logic into simple little "chunks". If you keep all of the database query logic within a single file it can quickly grow out of control and become a maintenance nightmare - trust me, I've been there!
To get a better understanding of the "big picture", I recommend you spend some time reading up on web Model-View-Controller (MVC) architecture. You will also want to look at the established PHP MVC frameworks, such as CodeIgniter, Kohaha, CakePHP, etc. Even if you do not use one - although I recommend you do - it would be helpful to see how these frameworks organize your code.
I would say you should look into implementing the "repository" design pattern in your code.
A good answer to how to implement this would be too long for this space, so I'll post a couple of PHP-oriented references:
travis swicegood -- Repository Pattern in PHP
Jon Lebensold -- A Repository Pattern in PHP
You are on the right lines if you use separation of concerns to separate your business logic from your data access logic you will be in a better place.
Judging by your "there are already 2K lines of code" statement, you're either maintaining something, or midway through developing something.
Both Faust and Justin Ethier make good recommendations - "how should I separate my database access from my application code" is one of the oldest, and most-answered, questions in web development.
Personally, I like MVC - it's pretty much the default paradigm for web development, it balances maintainability with productivity, and there are a load of frameworks to support you while you're doing it.
You may, of course, decide that re-writing your app from scratch is too much effort - in which case the repository pattern is a good halfway house.
Either way, you need to read up on refactoring - getting from where you are to where you want to be is going to be tricky. I recommend the book by Fowler, as a starter.
Could you explain more about why your database schema may change? That's usually a sign of trouble ahead.....
OOP principles were difficult for me to grasp because for some reason I could never apply them to web development. As I developed more and more projects I started understanding how some parts of my code could use certain design patterns to make them easier to read, reuse, and maintain so I started to use it more and more.
The one thing I still can't quite comprehend is why I should abstract my data layer. Basically if I need to print a list of items stored in my DB to the browser I do something along the lines of:
$sql = 'SELECT * FROM table WHERE type = "type1"';'
$result = mysql_query($sql);
while($row = mysql_fetch_assoc($result))
{
echo '<li>'.$row['name'].'</li>';
}
I'm reading all these How-Tos or articles preaching about the greatness of PDO but I don't understand why. I don't seem to be saving any LoCs and I don't see how it would be more reusable because all the functions that I call above just seem to be encapsulated in a class but do the exact same thing. The only advantage I'm seeing to PDO are prepared statements.
I'm not saying data abstraction is a bad thing, I'm asking these questions because I'm trying to design my current classes correctly and they need to connect to a DB so I figured I'd do this the right way. Maybe I'm just reading bad articles on the subject :)
I would really appreciate any advice, links, or concrete real-life examples on the subject!
Think of a abstracting the data layer as a way to save time in the future.
Using your example. Let's say you changed the names of the tables. You would have to go to each file where you have a SQL using that table and edit it. In the best case, it was a matter of search and replace of N files. You could have saved a lot of time and minimized the error if you only had to edit one file, the file that had all your sql methods.
The same applies to column names.
And this is only considering the case where you rename stuff. It is also quite possible to change database systems completely. Your SQL might not be compatible between Sqlite and MySQL, for example. You would have to go and edit, once again, a lot of files.
Abstraction allows you to decouple one part from the other. In this case, you can make changes to the database part without affecting the view part.
For very small projects this might be more trouble than it is worth. And even then, you should still do it, at least to get used to it.
I'm NOT a php person but this is a more general question so here goes.
You're probably building something small, sometimes though even something small/medium should have an abstracted data layer so it can grow better.
The point is to cope with CHANGE
Think about this, you have a small social networking website. Think about the data you'll store, profile details, pictures, friends, messages. For each of these you'll have pages like pictures.php?&uid=xxx.
You'll then have a little piece of SQL slapped in there with the mysql code. Now think of how easy/difficult it would be to change this? You would change 5-10 pages? When you'll do this, you'll probably get it wrong a few times before you test it thoroughly.
Now, think of Facebook. Think of the amount of pages there will be, do you think it'll be easier to change a line of SQL in each page!?
When you abstract the data access correctly:
Its in one place, its easier to change.
Therefore its easier to test.
Its easier to replace. (Think about what you'd have to do if you had to switch to another Database)
Hope this Helps
One of the other advantage of abstracting the data layer is to be less dependent on the underlying database.
With your method, the day you want to use something else than mysql or your column naming change or the php API concerning mysql change, you will have to rewrite a lot of code.
If all the database access part was neatly abstracted, the needed changes will be minimal and restricted to a few files instead of the whole project.
It is also a lot easier to reuse code concerning sql injection or others utility function if the code is centralized in one place.
Finally, it's easier to do unit testing if everything goes trough some classes than on every pages from your project.
For example, in a recent project of mine (sorry, no code sharing is possible), mysql related functions are only called in one class. Everything from query generation to object instantiation is done here. So it's very for me to change to another database or reuse this class somewhere else.
In my opinion, the data access is one of the most important aspects to separate / abstract out from the rest of your code.
Separating out various 'layers' has several advantages.
1) It neatly organises your code base. If you have to make a change, you'll know immediately where the change needs to be made and where to find the code. This might not be so much of a big deal if you're working on a project on your own but with a larger team the benefits can quickly become obvious. This point is actually pretty trivial but I added it anyway. The real reason is number 2..
2) You should try to separate things that might need to change independently of each other. In your specific example, it is conceivable that you would want to change the DB / data access logic without impacting the user interface. Or, you might want to change the user interface without impacting on the data access. Im sure you can see how this is made impossible if the code is mixed in with each other.
When your data access layer, has a tightly defined interface, you can change its inner workings however you want, and as long as it still adheres to the interface you can be pretty certain it wont have broken anything further up. Obviously this would still need verifying with testing.
3) Reuse. Writing data access code can get pretty repetitive. It's even more repetitive when you have to rewrite the data access code for each page you write. Whenever you notice something repetitive in code, alarm bells should be ringing. Repetitiveness, is prone to errors and causes a maintenance problem.
I'm sure you see the same queries popping up in various different pages? This can be resolved by putting those queries lower down in your data layer. Doing so helps to ease maintenance; whenever a table or column name changes, you only need to correct the one place in your data layer that references it instead of trawling through your entire user interface and potentially missing something.
4) Testing. If you want to use automated tool to carry out unit testing you will need everything nicely separated. How will you test your code to select all Customer records when this code is scattered all throughout your interface? It is much easier when you have a specific SelectAllCustomers function on a data access object. You can test this once here and be sure that it will work for every page that uses it.
There are more reasons that I'll let other people add. The main thing to take away is that separating out layers allows one layer to change without letting the change ripple through to other layers. As the database and user interface are areas of an application / website that change the most frequently it is a very good idea to keep them separate and nicely isolated from everything else and each other.
In my point of view to print just a list of items in a database table, your snippet is the more appropriate: fast, simple and clear.
I think a bit more abstraction could be helpful in other cases to avoid code repetitions with all the related advantages.
Consider a simple CMS with authors, articles, tags and a cross reference table for articles and tags.
In your homepage your simple query will become a more complex one. You will join articles and users, then you will fetch related tag for each article joining the tags table with the cross reference one and filtering by article_id.
You will repeat this query with some small changes in the author profile and in the tag search results.
Using a abstraction tool like this, you can define your relations once and use a more concise syntax like:
// Home page
$articles = $db->getTable('Article')->join('Author a')
->addSelect('a.name AS author_name');
$first_article_tags = $articles[0]->getRelated('Tag');
// Author profile
$articles = $db->getTable('Article')->join('Author a')
->addSelect('a.name AS author_name')->where('a.id = ?', $_GET['id']);
// Tag search results
$articles = $db->getTable('Article')->join('Author a')
->addSelect('a.name AS author_name')
->join('Tag')->where('Tag.slug = ?', $_GET['slug']);
You can reduce the remaining code repetition encapsulating it in Models and refactoring the code above:
// Home page
$articles = Author::getArticles();
$first_article_tags = $articles[0]->getRelated('Tag');
// Author profile
$articles = Author::getArticles()->where('a.id = ?', $_GET['id']);
// Tag search results
$articles = Author::getArticles()
->join('Tag')->where('Tag.slug = ?', $_GET['slug']);
There are other good reasons to abstract more or less, with its pros and cons. But in my opinion for a big part the web projects the main is this one :P
im looking to create a web app using php, mysql. im planning to write it in classes.
i have created 3 classes: CBaseUser, CWaiter, CBaseAuth
so basically CBaseUser will be a parent class to CWaiter and CBaseAuth contains things like GetPasswdLen(), CreatePasswd(), GetToken().
so right now im heading to do the rest of the program which requre insert,delete,update, login, etc
im a little confuse here because im not sure where should I do my sql query function. should i do it in CWaiter?
could someone enlighten me about OOP in PHP? like the best practice to create a PHP web program.
If you are doing all OO you might want to take a look on php's pear db before going deeper into sql transactions. pear::db makes possible to do database agnostic systems, meaning that you can run it on mysql, postgre, etc without changing a single line of code.
See here for a similar question. You really want to avoid repeating structurally similar queries, but too few programmers know how to do that in a 'greenfields' project.
im a little confuse here because im not sure where should I do my sql query function. should i do it in CWaiter?
You could, but it is generally considered good practice to keep your SQL queries in a separate place (ie separate set of source code files) to your business logic.
"MVC" is a pattern people often talk about which involves separating the code for "Model" (ie anything that needs to know your database structure, eg all your SQL) "View" (anything that draws the UI, which typically means the template engine) and "Controller" (all business logic, drawing it all together).
This would mean that you'd have a different group of classes with names such as CBaseUserDB or CWaiterDB (or whatever) for anything that needs to update/query the database. This is a simplified example just to illustrate my point.
The thinking behind separating this is that SQL, template code, and business logic all intertwined can be messy or hard to follow.
You may also want to look at PDO which is a more modern way to access a database than for example mysql_ and mysqli_. It contains some extra features such as prepared statements, but the main benefit in my opinion is that it means your knowledge will be more future proof and you'll be able to adapt to different database APIs easily. Unlike the previous answer I don't much fancy higher level abstraction like pear::db but that's up to your personal opinion.
I'm starting to get to grips with CodeIgniter and came across it's support for the Active Record pattern.
I like the fact that it generates the SQL code for you so essentially you can retrieve, update and insert data in to a database without tying your application to a specific database engine.
It makes simple queries very simple but my concern is that it makes complex queries more complex if not impossible (e.g. if need for engine specific functions).
My Questions
What is your opinion of this pattern especially regarding CodeIgniters implementation?
Are there any speed issues with wrapping the database in another layer?
Does it (logic) become messy when trying to build very complex queries?
Do the advantages out way the disadvantages?
Ok, First of all 99% of your queries will be simple select/insert/update/delete. For this active record is great. It provides simple syntax that can be easily changed. For more complex queries you should just use the query method. Thats what its for.
Second, It provides escaping & security for those queries. Face it, your application probably will have hundreds if not thousands of places where queries take place. Your bound to screw up and forget to properly escape some of them. Active record does not forget.
Third, performance in my experience is not dramatically affected. Of course it is but its probably around .00001 per query. I think that is perfectly acceptable for the added security and sanity checks it does for you.
Lastly, I think its clear that i believe the advantages are far greater than the disadvantages. Having secure queries that even your most junior developer can understand and not screw up is a great thing.
What is your opinion (sic) of this pattern especially regarding CodeIgniters implementation?
Can't say much about CI's implementation. Generally I avoid AR for anything but the simplest applications. If the table does not match 1:1 to my business objects, I don't use AR, as it will make modeling the application difficult. I also don't like the idea of coupling the persistence layer to my business objects. It's a violation of separation of concerns. Why should a Product know how to save itself? Futher reading: http://kore-nordmann.de/blog/why_active_record_sucks.html
EDIT after the comment of #kemp, I looked at the CI User Guide to see how they implemented AR:
As you can see in PoEAA an AR is an object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. This is not what CI does though. It just provides an API to build queries. I understood that there is a Model class which extends AR and which can be used to build business objects, but that would be more like a Row Data Gateway then. Check out PHPActiveRecord for an alternate implementation.
Are there any speed issues with wrapping the database in another layer?
Whenever you abstract or wrap something into something else, you can be sure this comes with a performance impact over doing it raw. The question is, is it acceptable for your application. The only way to find out is by benchmarking. Further Reading: https://stackoverflow.com/search?q=orm+slow
EDIT In case of CI's simple query building API, I'd assume the performance impact to be neglectable. Assembling the queries will logically take some more time than just using passing a raw SQL string to the db adapter, but that should be microseconds only. And you as far as I have seen it in the User Guide, you can also cache query strings. But when in doubt, benchmark.
Does it (logic) become messy when trying to build very complex queries?
Depends on your queries. I've seen pretty messy SQL queries. Those don't get prettier when expressed through an OO interface. Depending on the API, you might find queries you won't be able to express through it. But then again, that depends on your queries.
Do the advantages out way the disadvantages?
That only you can decide. If it makes your life as a programmer easy, sure why not. If it fits your programming needs, yes. Ruby on Rails is build heavily on that (AR) concept, so it can't be all that bad (although we could argue about this, too :))
Can someone please explain "re-usable structures" for me?
I was working on making some db objects in php, but was told I was using too much processing from the computer cause I made stuff to complicated with the below objects:
My DB objects:
$db = new Database;
$db->db_connect();
$post_content = new DbSelect;
$post_content->select('id', 'title', 'firstName', 'created', 'catName', 'tagName');
$post_content->from('content');
$post_content->join('inner');
$post_content->on('category','cat_id','id');
$post_content->where('id','1');
$post_content->order('created');
$db->db_close();
Normal PHP:
mysql_connect();
mysql_db_select();
$query = 'SELECT id, title, s_name, created, cat_name, tag_name
FROM content
JOIN INNER category, cat_id, id
WHERE id=1
ORDER created';
mysql_close();
So to reiterate my questions:
1. A quick explanation of re-usable structures?
2. why is the first method using objects "wrong"?
please note:
I'll be googling this as well as hoping for feedback
I know there a "tools" like Zend and other's that have plenty of db objects built into them, but I'm trying a DIY approach
Don't confuse object-oriented-programmed with "class-oriented" or "object-based" programming. They both, on the surface, can look like OOP but are not.
These are when you take structured code and wrap in a bunch of classes, but don't change the fundamentals of how it operates. When you program with objects in mind, but don't leverage any of the special conventions that OOP affords you (polymorphism, aggregation, encapsulation, etc). This is not OOP.
What you may have here is some of this type of code. It's a little hard to tell. Is the purpose of your DbSelect class to abstract away the raw SQL code, so that you can use connect to any database without having to rewrite your queries? (as many DBAL solutions are wont to do) Or are you doing it "just because" in an effort to look like you've achieved OOP because you turned a basic SQL query into a chain of method calls? If the latter is closer to your motivation for creating this set of classes, you probably need to think about why you're making these classes and objects in the first place.
I should note that, from what I can tell by your simple snippet, you actually have not gained anything here in the way of re-usability. Now, your design may include code that gives flexibility where I cannot see it, but kind of suspect that it isn't there. The procedural/structured snippet is really no more or less reusable than your proposed class-based one.
Whomever you were talking to has a point - developing a complex (or even simple) OOP solution can definitely have many benefits - but to do so without regard to the cost of those benefits (and there's always a cost) is foolish at best, and hazardous at worst.
I'm not sure where to start on this one. Object Oriented design is not a trivial subject, and there are many ways it can go wrong.
Essentially, you want to try to make logical indepedent objects in your application such that you can swap them out for other modules with the same interface, or reuse them in future projects. In your database example, look at PEAR::MDB2. PEAR::MDB2 abstracts the database drivers away from your application so that you don't need to worry about which specific database you're using. Today, you might be using MySQL to run your site. Tomorrow, you might switch to Postgresql. Ideally, if you use a proper OO design, you shoudn't need to change any of your code to make it work. You only need to swap out the database layer for another. (Pear::MDB2 makes this as simple as changing your db connect string)
May I suggest reading Code Complete by Steve McConnell. There's a whole chapter on Classes. While the examples are primarily C++, the concepts can be applied to any programming language, including PHP.
It seems that your solution involves more typing to achieve the same thing as the "normal" way. The string SQL would be more efficient than allocating memory for your object.
You should check out the Active Record pattern if you want to create a data abstraction layer with more features than just a select.
If you are using DbSelect objects to build queries from complicated Forms, then you are doing the right thing.
Query Object pattern