How to add ORM to a PHP legacy project? - php

We're working on a PHP project, which has been in development for more than 2 years, and now the team is ready and feel the willingness to switch the development on to an ORM. Because it really speeds up the development and allow you to operate by Objects and not think in terms of SQL code and database tables most of the time.
We have decided to choose the Doctrine ORM, because it has YAML data fixtures load - we need it very much for our unit-tests.
The main fear I have, is that using of a new ORM framework can slow the site's performance. We can't make a shared connection between current database abstraction layer (which uses pg_connect syntax, not PDO-compatible). The database connection mechanism can't be switched to PDO-compatible, because there are lots of SQL code incompatible with PDO_SQLITE syntax.
So, as I understand it, if we will start using it, it will double the number of database connections. I'm not sure database server will be able to handle this.
What would you recommend us to do in this circumstance?

Of what relevance is PDO_SQLITE?
Unless you actually plan on using the SQLite driver then compatibility is not mandated by PDO.
If you aren't going to use SQLite then I would make the legacy database layer PDO compatible and re-use the connections until you can fully migrate to Doctrine.
That said, the level of connections is not going to be your only performance concern moving to an ORM. They are inherently inefficient so I'd expect slower queries, higher bandwidth use between application servers and the database servers and higher memory use at the application level due to redundant data inevitably getting selected.
Depending on your current setup, the above may or may not be issues.
You should probably take that last paragraph with a pinch of salt though because they are just traits of ORMs in general and not Doctrine in particular, with which I've had no experience.

The obvious thing you can do is not open a database connection until you need it. I personally use code like this:
public function connect() {
if (!defined('CONNECT')) {
mysql_connect(...);
}
}
public function db_query($query) {
connect();
$ret = mysql_query($query);
if (!$ret) {
die(mysql_error());
error_log(mysql_error() . ' - ' . $query);
}
return $ret;
}
to reduce the amount of repetitive and to only open a connection when you need one.
In your case you then need to break off the smallest chunk you can to begin with. Ideally it should be a vertical slice, meaning this slice will do almost all of its database work with the new code and very little with the old. This way you can minimal doubling up of database connections and this lets you build up some skills and get some experience too.
Beware though, ORM is not by any means a panacea. You may hate SQL and find it fiddly and error prone but you are, for the most part, simply trading one set of problems for another. I personally think that while ORM can be useful it has been overhyped and is more of a false economy than many either realize or are willing to admit. I wrote more on this in Using an ORM or plain SQL?
I'm not saying you shouldn't do it. Just don't go in thinking it'll solve all your problems. Also, since this rewrite won't actually change the functionality at all (from what you've described) I'm not sure if the cost of doing so compares favourably with fixing what's there already. Too many unknowns to say which way your situation will go.

Well, yes and no – your DB connections will only be doubled as long as you have both a non-PDO and a PDO connection.
I'm not sure what you mean with the PDO_SQLITE reference, since SQLite is a wholly different database than the PostgreSQL it seems you're using now.
You should be able to run your current queries through PDO::query just as you do today unless you are doing something very wrong :)

Related

Mirror MySQL Database Schema with HTML5 local storage for querying

I've done some research on HTML5 local storage, and it seems plausible that I could mirror a MySQL database's structure for use in an application that needs a lot of data for only one person.
Why would I do this? In my spare time, I'm a web game developer: PHP, MySQL, and all the technologies to dress it up. So far I've built databases that support many players, but my games are intended to be "single-player with multi capabilities". And for games that are only intended to be played single-player, there's no point in even having a DB connection unless they're saving to a web server!
I want to achieve a single-player mode that will never touch my database, and will be available offline. However, the code behind all of this is still going to be making SQL queries. Ideally I imagine that I could set up a sort of abstraction layer of local storage that would respond to queries.
And in short, I'm wondering what's out there. Searching local storage and HTML5 will give you endless posts about the technologies, but I'm not certain that my idea here will work well, or should even be attempted. Likewise there could be frameworks out there already that handle this with ease. I've found nothing yet.
update: The deprecation of web SQL database worries me. It looked very appealing for my situation; as it uses SQL, modifying my queries shouldn't be so difficult. Now with a push toward IndexedDB, I'm not certain it will be as easy.
I've read your question a few times now and continue to wonder 'Why would you do this' ;-) The question brings up more questions for me, so...
What do you mean by 'a lot of data'? Ultimately speaking, is the sql metaphor appropriate in this case? Abstraction layer that can respond to 'sql-like' queries is interesting in and of itself, but sounds extremely complex. Could a simpler solution do the job, like a JSON object? What about persisting the data in cases when users clean cache, history, re-install the browser, etc? Even complex javascript literal or JSON object could provide very straight forward means of occasional persistance/backup and recovery. (It's interesting that just released PostgreSql 9.2 includes JSON as datatype. Could it be that SQL and NoSql will gravitate toward some common ground?) Sorry if this comes off as more of a commentary than an answer, your question comes off as being on a higher plane.
EDIT
googling 'javascript sql interpreter' turns up some interesting stuff:
http://www.terminally-incoherent.com/blog/2009/05/19/sql-emulation-tool-in-javascript-part-2/#comments
https://github.com/forward/sql-parser#readme
Generating a JavaScript SQL parser for SQLite3 (with Lemon? ANTLR3?)
http://jsdb.sourceforge.net/demo.html

Is the PDO Library faster than the native MySQL Functions?

I have read several questions regarding this but I fear they may be out of date as newer versions of the PDO libraries have been released since these questions were answered.
I have written a MySQL class that builds queries and escapes parameters, and then returns results based on the query. Currently this class is using the built-in mysql functions.
I am well aware of the advantages of using the PDO Library, e.g. it is compatible with other databases, stored procedures are easier to execute etc... However, what I would like to know is simply; is using the PDO Library faster then using the mysql built-in functions?
I have just written the equivalent class for MsSQL, so rewriting it to work with all databases would not take me long at all. Is it worth it or is the PDO library slower?
I found PDO in many situation/projects to be even faster than the more native modules.
Mainly because many patterns/building blocks in a "PDO-application" require less php script driven code and more code is executed in the compiled extension and there is a speed penalty when doing things in the script. Simple, synthetic tests without data and error handling often do not cover this part, which is why (amongst other problems like e.g. measuring inaccuracies) I think "10000x SELECT x FROM foo took 10ms longer" conclusions are missing the point more often than not .
I can't provide you with solid benchmarks and the outcome depends on how the surrounding application handles the data but even synthetic tests usually only show differences so negligible that you better spend your time on optimizing your queries, the MySQL server, the network, ... instead of worrying about PDO's raw performance. Let alone security and error handling ...
My observation is that PDO seems to be less tolerate of many consecutive connections - that is connections being created in a loop. I know this is bad practice it the first place. When I was using mysql_* my looped queries seemed to be reasonably fast. However when I switched to PDO I noticed much longer response times for these types of queries.
TL;DR; - If you switch to PDO and you call queries in a PHP loop you may need to rewrite the application to call one single query rather than many consecutive queries.

How expensive an operation is connecting to a Mysql database?

In certain functions of the code, php will execute hundreds or in some cases thousands of queries on the same tables using a loop. Currently, it creates a new database connection for each query. How expensive is that operation? Would I see a significant speed increase by reusing the same connection? It could take quite a bit of refactoring to change this behavior and use the same database.
The php uses mysql_connect to connect to the database.
Just based on what I've said here, are there other obvious optimizations that you would recommend (I've read about locking tables for example...)?
EDIT:
My question is more about the benefit of using a single connection, not how to avoid using more than one.
The documentation for mysql_connect states:
If a second call is made to mysql_connect() with the same arguments, no new link will be established, but instead, the link identifier of the already opened link will be returned.
So, unless you're connecting with different credentials, changing that part of your code will not affect performance.
I use Zend_Framework and my database profiling shows that the connection itself takes nearly 10x longer than most of my queries. I have two different databases that I connect to, and only connect once to each for each request.
I'd say reconnecting for every query is poor design, but the question of refactoring is more complex than that. Questions that need to be asked:
Are there current performance problems?
Have you done code profiling to narrow down where the performance issues are occurring?
How much time will be required for this refactoring? Take into account the testing involved, not just coding time.
The answer to the original question should be obvious. If its not obvious to you then it should still be obvious how to find out for yourself how much impact it has.
are there other obvious optimizations
No - because you've not provided any details of the table's structure nor the queries you are running.

PHP and Databases: Views, Functions and Stored Procedures performance

I'm working on a PHP web application with PostgreSQL. All of the SQL queries are being called from the PHP code. I've seen no Views, Functions or Stored Procedures. From what I've learned, it's always better to use these database subroutines since they are stored in the database with the advantages:
Encapsulation
Abstraction
Access rights (limited to DB Admin) and responsibility
Avoid compilation
I think I read something about performance improvements too. I really don't see why the team hasn't used these yet. In this particular case, I would like to know, from experience, is there any good reason to NOT use them?
Mostly when there are so many "SELECT" queries along the code, why not use Views?
I am planning on refactoring the code and start coding the subroutines on the DB Server. I would like to know opinions in favor or against this. The project is rather big (many tables), and expects lots of data to be stored. The amount of data you would have in a social network with some more stuff in it, so yeah, pretty big.
In my opinion, views and stored procedures are usually just extra trouble with little benefit.
I have written and worked with a bunch of different web apps, though none with bazillions of users. The ones with stored procedures are awkward. The ones with ad-hoc SQL queries are plenty fast (use placeholders and other best practices to avoid SQL injection). My favorite use database abstraction (ORM) so your code deals with PHP classes and objects rather than directly with the database. I have increasingly been turning to the symfony framework for that.
Also: in general you should not optimize for performance prematurely. Optimize for good fast development now (no stored procedures). After it's working, benchmark your app, find the bottlenecks, and optimize them. You just waste time and make complexity when you try to optimize from the start.
I. Views offer encapsulation, but if not carefully designed they can slow down the application. Use with caution.
II. Use functions if needed, no reason to put them in if they are unneeded.
III. Stored Procedures are a godsend, use them everywhere there is a static query!!
In response to the views vs. queries, try to use views with Stored Procedure's, the Stored Procedure's will mitigate some of the performance hit taken with most views.
The advantage of stored procedures is that, because all the processing is done on the database, you do not incur network overhead shunting intermediate result sets back and forth.
The disadvantage is that each RDBMS system out there has its own peculiar syntax for stored procedures. By implementing your business logic in stored procedures, you're pretty much restricting your app to a single database product, something you need to keep in mind if you intend your application to be database independent. Also, as gahooa pointed out, because stored procedures live in the database, your access to them as a developer may be restricted by local policy; some organisations will only let DBAs touch the database.
#WolfmanDragon: I don't know if views inherently make things slower; your mileage may vary, I guess, depending on the complexity of the view and the RDBMS you're using. Plus, some RDBMS allow you to materialise commonly-used views so access to them is as fast as a base table.
We try to use the features you mentioned only where there is a significant benefit
Being part of the "Database", they fall under "schema changes", rather than "source code changes", and are naturally harder to version control.
Whatever you do, just make sure you retain full visibility of who-changed-what-when, so that you can diff, rollback, and recover in the case of problems.

Business Logic in PHP or MySQL?

On a site with a reasonable amount of traffic , would it matter if the application/business logic is written as stored procedures ,triggers and views , instead of inside the PHP code itself?
What would be the best way to go keeping scalability in mind.
I can't provide you statistics, but unless you plan to change PHP for another language in the future, i can say keeping the business logic in PHP is more "scalability friendly".
Its always easier and cheaper to solve web server load problems than having them in the database. Your database will always need to be lighting quick and just throwing mirrors at it won't solve the problem. The more database slaves you have, the more writes you have to do.
In my experience, you should put business logic in PHP code rather than move it onto the database. Assuming your database is on a separate server, you don't want your database to be busy calculating formulas when requests come in.
Keep your database lightning fast to handle selects, inserts and updates.
I think you will have far better scalibility keeping database code in the database where it can be performance tuned as the number of records gets larger. You will also have better data integrity which is critical to the data even being useful. You don't see a lot of terrabyte sized relational dbs with all their code in the application.
Read some books on database performance tuning and then decide if you want to risk your company's data on application code.
There are several things to consider when trying to decide whether to place the business logic in the database or in the application code.
Will the same database be accessed
from different websites / web
applications? Will the sites /
applications be written in the same
language or in a different language?
If the database will be used from a single site, and the site is written in a single language then this becomes a non-issue. Otherwise, you'll need to consider the added complexity of stored procedures, triggers, etc vs trying to maintain database access logic etc in multiple code bases.
What are relational databases in
general good for and what is MySQL
good for specifically? What is PHP
best at?
This consideration is fairly straight-forward. Relational databases across the board and specifically in any variant of SQL are going to do a great job at inserting, updating, and deleting data. Generally they also handle ATOMIC transactions well. However, most variants of SQL (including MySQL) are not good at complex calculations, on-the-fly date handling, file system access etc.
PHP on the other hand is very fast at handling calculations, dates, file system accesses. By taking a little time you can even design your PHP code to work in such a way that records are only retrieved once and then stored when necessary.
What are you most familiar /
comfortable with using?
Obviously it tends to make more sense to use the tool with which you are most familiar.
As a last point consider that just because a drill can be used to cut sheet rock or because a hammer can be used to drive a screw doesn't mean that they should be used for these things. Sometimes I think that programmers do more potential damage by trying to make more powerful tools that do everything rather than making simpler tools that do one thing really, really well.
A well done PHP application should be enought, but keep in mind that it also requires you to do the less calls to the database you can. Store values you'll need later in PHP, shorten queries, cache, etc.
MySQL optimization is always a must, as it will also decrease the amount of databse calls by PHP, and thus getting a better performance. Therefore, there's no way you can't think of stored procedures, etc, if your aim is to increase performance. But MySQL by itself would't be enought if your PHP code isn't well done (lots of unecessary database calls), that's why I think PHP must be well coded, keeping in mind the hole process while developing it, so that unecessary stuff doesn't get in the way. Cache for instance, in "duet" with proper MySQL, is a great boost on performance.
My POV, even not having much experience in developing large applications is to write business logic in the DB for some reasons:
1 - Maintainability, I think that languages deprecate functions and changes many other things in a short time period, so if PHP changes version, you'll need to adapt your code to the new version
2 - DBs tends to be more language stable, so when a new version of a RDBMS comes out, it usually doesn't change many things in the way you write your queries or SPs, or it even doesn't change. Writing your logic in DB will reduce code adaptation because of a new DB version
3 - A RDBMS is more likely to be alive for a long period rather than a programming language. Also, as your data is critical, there is a big worry from the RDBMS developers for automatic migration of your whole data to the new RDBMS version, including your SPs. When clipper died, there were no ways to migrate systems to a new programming language, they had to be completely rewritten.
4 - If you think someday to change completely the language you are writing the application for some reason(language death, for example), the only thing to be rewritten will be the presentation and the SP calls, not business logic.
I'd like to know from other people here if what I pointed out makes sense, and if not, why. I'm on the same situation as Sabeen Malik, I'm thinking to begin my first huge project and I'm tending towards SPs because of what I wrote. So it's time to correct my POV if it's not so correct.
MySQL sucks at using advanced DB techniques, it's simple and fast. PHP, being a dynamic language, makes processing data very easy. Therefore, it usually makes sense to use PHP.

Categories