At the moment my code (PHP) has too many SQL queries in it. eg...
// not a real example, but you get the idea...
$results = $db->GetResults("SELECT * FROM sometable WHERE iUser=$userid");
if ($results) {
// Do something
}
I am looking into using stored procedures to reduce this and make things a little more robust, but I have some concerns..
I have hundreds of different queries in use around the web site, and many of them are quite similar. How should I manage all these queries when they are removed from their context (the code that uses the results) and placed in a stored procedure on the database?
The best course of action for you will depend on how you are approaching your data access. There are three approaches you can take:
Use stored procedures
Keep the queries in the code (but put all your queries into functions and fix everything to use PDO for parameters, as mentioned earlier)
Use an ORM tool
If you want to pass your own raw SQL to the database engine then stored procedures would be the way to go if all you want to do is get the raw SQL out of your PHP code but keep it relatively unchanged. The stored procedures vs raw SQL debate is a bit of a holy war, but K. Scott Allen makes an excellent point - albeit a throwaway one - in an article about versioning databases:
Secondly, stored procedures have fallen out of favor in my eyes. I came from the WinDNA school of indoctrination that said stored procedures should be used all the time. Today, I see stored procedures as an API layer for the database. This is good if you need an API layer at the database level, but I see lots of applications incurring the overhead of creating and maintaining an extra API layer they don't need. In those applications stored procedures are more of a burden than a benefit.
I tend to lean towards not using stored procedures. I've worked on projects where the DB has an API exposed through stored procedures, but stored procedures can impose some limitations of their own, and those projects have all, to varying degrees, used dynamically generated raw SQL in code to access the DB.
Having an API layer on the DB gives better delineation of responsibilities between the DB team and the Dev team at the expense of some of the flexibility you'd have if the query was kept in the code, however PHP projects are less likely to have sizable enough teams to benefit from this delineation.
Conceptually, you should probably have your database versioned. Practically speaking, however, you're far more likely to have just your code versioned than you are to have your database versioned. You are likely to be changing your queries when you are making changes to your code, but if you are changing the queries in stored procedures stored against the database then you probably won't be checking those in when you check the code in and you lose many of the benefits of versioning for a significant area of your application.
Regardless of whether or not you elect not to use stored procedures though, you should at the very least ensure that each database operation is stored in an independent function rather than being embedded into each of your page's scripts - essentially an API layer for your DB which is maintained and versioned with your code. If you're using stored procedures, this will effectively mean you have two API layers for your DB, one with the code and one with the DB, which you may feel unnecessarily complicates things if your project does not have separate teams. I certainly do.
If the issue is one of code neatness, there are ways to make code with SQL jammed in it more presentable, and the UserManager class shown below is a good way to start - the class only contains queries which relate to the 'user' table, each query has its own method in the class and the queries are indented into the prepare statements and formatted as you would format them in a stored procedure.
// UserManager.php:
class UserManager
{
function getUsers()
{
$pdo = new PDO(...);
$stmt = $pdo->prepare('
SELECT u.userId as id,
u.userName,
g.groupId,
g.groupName
FROM user u
INNER JOIN group g
ON u.groupId = g.groupId
ORDER BY u.userName, g.groupName
');
// iterate over result and prepare return value
}
function getUser($id) {
// db code here
}
}
// index.php:
require_once("UserManager.php");
$um = new UserManager;
$users = $um->getUsers();
foreach ($users as $user) echo $user['name'];
However, if your queries are quite similar but you have huge numbers of permutations in your query conditions like complicated paging, sorting, filtering, etc, an Object/Relational mapper tool is probably the way to go, although the process of overhauling your existing code to make use of the tool could be quite complicated.
If you decide to investigate ORM tools, you should look at Propel, the ActiveRecord component of Yii, or the king-daddy PHP ORM, Doctrine. Each of these gives you the ability to programmatically build queries to your database with all manner of complicated logic. Doctrine is the most fully featured, allowing you to template your database with things like the Nested Set tree pattern out of the box.
In terms of performance, stored procedures are the fastest, but generally not by much over raw sql. ORM tools can have a significant performance impact in a number of ways - inefficient or redundant querying, huge file IO while loading the ORM libraries on each request, dynamic SQL generation on each query... all of these things can have an impact, but the use of an ORM tool can drastically increase the power available to you with a much smaller amount of code than creating your own DB layer with manual queries.
Gary Richardson is absolutely right though, if you're going to continue to use SQL in your code you should always be using PDO's prepared statements to handle the parameters regardless of whether you're using a query or a stored procedure. The sanitisation of input is performed for you by PDO.
// optional
$attrs = array(PDO::ATTR_PERSISTENT => true);
// create the PDO object
$pdo = new PDO("mysql:host=localhost;dbname=test", "user", "pass", $attrs);
// also optional, but it makes PDO raise exceptions instead of
// PHP errors which are far more useful for debugging
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$stmt = $pdo->prepare('INSERT INTO venue(venueName, regionId) VALUES(:venueName, :regionId)');
$stmt->bindValue(":venueName", "test");
$stmt->bindValue(":regionId", 1);
$stmt->execute();
$lastInsertId = $pdo->lastInsertId();
var_dump($lastInsertId);
Caveat: assuming that the ID is 1, the above script will output string(1) "1". PDO->lastInsertId() returns the ID as a string regardless of whether the actual column is an integer or not. This will probably never be a problem for you as PHP performs casting of strings to integers automatically.
The following will output bool(true):
// regular equality test
var_dump($lastInsertId == 1);
but if you have code that is expecting the value to be an integer, like is_int or PHP's "is really, truly, 100% equal to" operator:
var_dump(is_int($lastInsertId));
var_dump($lastInsertId === 1);
you could run into some issues.
Edit: Some good discussion on stored procedures here
First up, you should use placeholders in your query instead of interpolating the variables directly. PDO/MySQLi allow you to write your queries like:
SELECT * FROM sometable WHERE iUser = ?
The API will safely substitute the values into the query.
I also prefer to have my queries in the code instead of the database. It's a lot easier to work with an RCS when the queries are with your code.
I have a rule of thumb when working with ORM's: if I'm working with one entity at a time, I'll use the interface. If I'm reporting/working with records in aggregate, I typically write SQL queries to do it. This means there's very few queries in my code.
I had to clean up a project wich many (duplicate/similar) queries riddled with injection vulnerabilities.
The first steps I took were using placeholders and label every query with the object/method and source-line the query was created.
(Insert the PHP-constants METHOD and LINE into a SQL comment-line)
It looked something like this:
-- #Line:151 UserClass::getuser():
SELECT * FROM USERS;
Logging all queries for a short time supplied me with some starting points on which queries to merge. (And where!)
I'd move all the SQL to a separate Perl module (.pm) Many queries could reuse the same functions, with slightly different parameters.
A common mistake for developers is to dive into ORM libraries, parametrized queries and stored procedures. We then work for months in a row to make the code "better", but it's only "better" in a development kind of way. You're not making any new features!
Use complexity in your code only to address customer needs.
Use a ORM package, any half decent package will allow you to
Get simple result sets
Keep your complex SQL close to the data model
If you have very complex SQL, then views are also nice to making it more presentable to different layers of your application.
We were in a similar predicament at one time. We queried a specific table in a variety of ways, over 50+.
What we ended up doing was creating a single Fetch stored procedure that includes a parameter value for the WhereClause. The WhereClause was constructed in a Provider object, we employed the Facade design pattern, where we could scrub it for any SQL injection attacks.
So as far as maintenance goes, it is easy to modify. SQL Server is also quite the chum and caches the execution plans of dynamic queries so the the overall performance is pretty good.
You'll have to determine the performance drawbacks based on your own system and needs, but all and all, this works very well for us.
There are some libraries, such as MDB2 in PEAR that make querying a bit easier and safer.
Unfortunately, they can be a bit wordy to set up, and you sometimes have to pass them the same info twice. I've used MDB2 in a couple of projects, and I tended to write a thin veneer around it, especially for specifying the types of fields. I generally make an object that knows about a particular table and its columns, and then a helper function in that fills in field types for me when I call an MDB2 query function.
For instance:
function MakeTableTypes($TableName, $FieldNames)
{
$Types = array();
foreach ($FieldNames as $FieldName => $FieldValue)
{
$Types[] = $this->Tables[$TableName]['schema'][$FieldName]['type'];
}
return $Types;
}
Obviously this object has a map of table names -> schemas that it knows about, and just extracts the types of the fields you specify, and returns an matching type array suitable for use with an MDB2 query.
MDB2 (and similar libraries) then handle the parameter substitution for you, so for update/insert queries, you just build a hash/map from column name to value, and use the 'autoExecute' functions to build and execute the relevant query.
For example:
function UpdateArticle($Article)
{
$Types = $this->MakeTableTypes($table_name, $Article);
$res = $this->MDB2->extended->autoExecute($table_name,
$Article,
MDB2_AUTOQUERY_UPDATE,
'id = '.$this->MDB2->quote($Article['id'], 'integer'),
$Types);
}
and MDB2 will build the query, escaping everything properly, etc.
I'd recommend measuring performance with MDB2 though, as it pulls in a fair bit of code that might cause you problems if you're not running a PHP accelerator.
As I say, the setup overhead seems daunting at first, but once it's done the queries can be simpler/more symbolic to write and (especially) modify. I think MDB2 should know a bit more about your schema, which would simpify some of the commonly used API calls, but you can reduce the annoyance of this by encapsulating the schema yourself, as I mentioned above, and providing simple accessor functions that generate the arrays MDB2 needs to perform these queries.
Of course you can just do flat SQL queries as a string using the query() function if you want, so you're not forced to switch over to the full 'MDB2 way' - you can try it out piecemeal, and see if you hate it or not.
This other question also has some useful links in it...
Use a ORM framework like QCodo - you can easily map your existing database
I try to use fairly generic functions and just pass the differences in them. This way you only have one function to handle most of your database SELECT's. Obviously you can create another function to handle all your INSERTS.
eg.
function getFromDB($table, $wherefield=null, $whereval=null, $orderby=null) {
if($wherefield != null) {
$q = "SELECT * FROM $table WHERE $wherefield = '$whereval'";
} else {
$q = "SELECT * FROM $table";
}
if($orderby != null) {
$q .= " ORDER BY ".$orderby;
}
$result = mysql_query($q)) or die("ERROR: ".mysql_error());
while($row = mysql_fetch_assoc($result)) {
$records[] = $row;
}
return $records;
}
This is just off the top of my head, but you get the idea. To use it just pass the function the necessary parameters:
eg.
$blogposts = getFromDB('myblog', 'author', 'Lewis', 'date DESC');
In this case $blogposts will be an array of arrays which represent each row of the table. Then you can just use a foreach or refer to the array directly:
echo $blogposts[0]['title'];
Related
There is a long living large project in PHP, in the development of which at different times were involved a number of different developers. Potentially, there may have been places where duplicate database queries were made during bug fixing. Some developer to solve a local problem could write and execute a query returning data that were previously obtained. Now on the project there is a problem of DB performance and in light of it a have a question:
Are there any tools (besides General Log) that allow you to see what database queries were made as part of a single PHP script execution?
The question is related to the fact that there are a lot of API endpoints in the project and it will take a long time to check them all just by reading the code (which is sometimes very ornate).
Most PHP frameworks use a single connection interface - some kind of wrapper - that makes it possible to log all queries through that interface.
If that does not exist another approach would be to enable logging at the MySQL-level. But for some, probably legit, reason you don't want to do that either. If you want to prevent downtime you can actually enable query logging without restarting the MySQL server.
If none of the above solutions is possible you have to get your hands dirty :-)
One way could be to add your own logging in the PHP code.
Writing new lines to a file is not a performance issue if you write to the end of the file.
I'm not sure how you query the database, but if you..
A) ..use the procedural mysqli functions
If you use the procedural mysqli functions like mysqli_query() to call the database you could create your own global custom function which writes to the log file and then call the real function.
An example:
Add the new global function:
function _mysqli_query($link, $query, $resultmode = MYSQLI_STORE_RESULT )
{
// write new line to end of log-file
file_put_contents('queries.log', $query . "\n", FILE_APPEND);
// call real mysqli_query
return mysqli_query($link, $query, $resultmode);
}
The do a project-wide search and replace for mysqli_query( with _mysqli_query(.
Then a line like this:
$result = mysqli_query($conn, 'select * from users');
Would look like this after the search and replace:
$result = _mysqli_query($conn, 'select * from users');
B) ..use the mysqli class (object oriented style)
If you use the object oriented style by instantiation the mysqli class and then calling the query methods e.g. $mysqli->query("select * from users") an approach could be to create your own database class which extends the mysqli class, and inside the specific methods (e.g. the query function) you add the logging like in the above example.
In the highly recommendable book "High Performance MySQL" from O'Reilly they go through code examples how to do this and add extra debug information (timing etc.). Some of the pages are accessible at google books.
In general
I would consider using this as an opportunity to refactor some of the codebase to the better.
If you have written automated tests that is a good way to ensure that the refactoring did not break anything.
If you do not practice automated testing you have to test the whole application manually.
You said that you are afraid that your performance issues comes from duplicate database queries, which might very well be the case.
In most cases I find it to be missing indexes though or a need to rewrite single queries.
The performance schema in MySQL can help you debug these things if you find it realistic that it could be other causes to the performance issues.
I've created following code & I can't seem to find out what's wrong with it, any help would be greatly appreciated.
<?php
$con = mysql_connect("host","user","password");
mysql_select_db("Stores", $con);
$street = mysql_query("SELECT Street FROM Sports WHERE Name = 'Nike'");
if (mysql_query("CREATE DATABASE $street", $con))
{
echo "Success";
}
else
{
echo "Fail";
}
mysql_close($con);
i keep getting fail, can anyone point me in the right direction?
There are many problems with this code:
this may not be very relevant at first, but the php mysql extension is marked deprecated, please switch to mysqli or better still, PDO as is also indicated in the official documentation. Deprecated means that it's only there to support legacy code (and that implies that it should not be used in new code) and that it will disappear at some point in the future. I suggest PDO as it's a much more mature, flexible and universal database interface, if you are going to invest time in learning something, PDO is definitely the way to go. Pay special attention to prepared statements, they will help you avoid all kinds of problems without having to fumble with the various escape functions.
mysql_query returns a query result (as a resource) that you should read eg with some kind of fetch command - see the doc for PDO's query() method here
I haven't got the slightest idea why you would want to create a new database on your server named after the street you found via a query? You should probably explain what your intention is, I'm quite sure this isn't going to do what you want it to do.
EDIT based on the comments:
For me the main advantages of switching to PDO are
future-proofness: with the mysql_ extension you'll upgrade php one day and the extension will be gone, forever. That won't happen with PDO anytime soon.
support for multiple databases: traditionally each database brand was served by its own proprietary extension in mysql. PDO unifies all these extensions which means that with 1 API you can work with most common databases in existence. Note that it doesn't iron out dialect differences between these databases, but having one API for them all is definitely a big plus.
PDO actively encourages you to use prepared statements, which are generally recognized as a bulletproof yet simple protection against all kinds of security issues like SQL injection - no need for all that escaping nonsense that you'll need to do to make regular SQL statements somewhat safe.
Regarding the remark about creating a database for each user, I really think you need to go through some introductory material wrt relational databases. Most commonly an application is backed by one database containing in most cases a fixed set of tables (eg store, customer, order, orderitems) with relations between them. Eg a store has many customers, each of which have one or more orders, each of which contain one or more items. All data is fetched by utilizing these relations via queries, using joins to extract for example all items belonging to one order, or to list all orders associated with one customer. The important thing here is that in all but the most exceptional cases there 's just one table per data type. That is, all customers are stored in just one table, all orders are stored in just one table, and so on.
I have no time to read all of it, but databaseprimer.com might be helpful to get you started with these concepts.
mysql_query will return a resource. You have to fetch the data first using mysql_fetch_assoc($query)
Add this line after your query:
$fetch = mysql_fetch_assoc($streets);
Then in your second query change $streets to $fetch['street'].
I want to access the DB only once and store the results in local memory/cache for future actions.
for Eg: I want to execute this query first;
$query = "select * from songs";
and store the results locally so that I can use it later to execute queries like this;
$query1 = "select * from songs order by song_id asc";
$query2 = "select song_name from songs where song_id='5'";
I am using PHP to access the DB.
Here are some ideas:
Use a "local" in-process SQL-database such as SQLite. SQLite is very fast (in many [trivial] cases it performs faster than MySQL, PostgreSQL or Firebird) and supports a good subset of SQL-92 syntax, but there is extra overhead to populate the "data-set" (SQLite DB) and building any indices, etc. This approach allows for great flexibility and allows persistence.
Use memcached or similar. However, memcached won't fit the model envisioned. In the models where memcached does work, an approach like this can work really well.
One could "just" store the objects/rows and then perform the query manually over the in-memory array (without the SQL syntax).
There may be some transparent caching systems such as the MTCache research project. I have no experience with these and do not know if there is a viable solution [for PHP].
All of the above suggestions depend heavily upon exact context/situation and may actual interfere with performance and/or 'correct results'. Remember that MySQL actually performs a good bit of internal work to ensure the data pages/cache are 'hot' and may perform other caching as it sees fit; coupled with good indices I could not actually recommend any of the above except for "offline" operations or cases where a performance analysis indicated it was required to meet functional requirements.
Happy coding.
I'm working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset. I've tended to use SPs only to do updates/deletes/inserts but the sheer size and complexity of this query seems like it needs to be broken down.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Im working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
Ok, but why a stored procedure? Why not create a view instead?
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
Again - excellent use case for a view.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Well, I don't want to start a religous war, and I do not want to suggest the arguments against apply to your case. But here goes:
one reason why I tend to avoid stored procedures is portability - by that I mean mostly database portability. Stored procedure languages are notoriously unportable across dbs, and built-in libs like Oracle packages make things worse in that respect.
stored procedures take some additional processing power from your database server. this makes it harder to scale the application as a whole: if the capacity of your db server is exhausted due to stored procedures, and you need to upgrade harware or even buy an extra oracle software license because of that, I would not be a happy camper, especially if I could have bought cheap webserver/php boxes instead to do the computing.
Reasons where I would go for stored procedures:
language portability. If database portability is not so much an issue, but you do want to reuse logic across multiple applications, or have to ability to code in different languages, then stored procedures may save you writing language specific database invocation code.
complex permission scenarios. stored procedures give you uan extra level of permissions, since you can execute the procedure with the privileges of the definer or owner of the stored procedure. Sometimes this solves problems where a user needs to work with some tables, but cannot be allowed direct access to them.
saving rountrips: if you have to deal with complex, multistatement transactions, putting them in a stored procedures saves rountrips between the app and the db, because there is only one rountrip needed to execute the stored procedure. sometimes this can get you more performance.
I want to stress again that in all these scenarios, I would still advise to not put all your procedural logic in stored procedures. databases are best at storing and retrieving data, languages like php/java/perl/pick your poison are better at processing it.
If you are using the same inline view many times, its a good candidate for with clause
PHP can handle resultsets returned from stored procedures, by using Ref Cusrors. The Oracle+PHP Cookbook has an example.
So there are no technical impediments but as you can see from the various answers there are some philosophical aspects to your question. I think we can agree that if you are already wrapping some SQL statements in stored procedures - which you are - then you are not drastically compromising the portability of your system by extending "updates/deletes/inserts" to include selects.
The pertinent question then becomes "should you embed use a stored procedure for this particular query?" The answer to which hinges on precisely what you mean by:
the sheer size and complexity of this
query seems like it needs to be broken
down.
Deconstructing a big query into several smaller queries and then stitching results together in PL/SQL is seductive, but should be approached with caution. This can degrade the performance of your application, because PL/SQL has more overheads than SQL. Making your query more readable is not a good enough reason: you need to be certain that the complexity has a real and adverse effect on the running of your code.
A good reason for using a stored procedure rather than a view might be if you want to extend the applicability of the query by using bind variables or dynamic SQL in the body of the query.
A definitive answer to your question requires more details regarding the nature of your query and the techniques you are thinking of using to simplify it.
You could look at subquery factoring which may improve the readability of the query.
One risk of breaking up a single SQL query into a more procedural solution is you lose read consistency. As such you want to be pretty sure that someone changing data while your procedure runs won't break it. You may want to lock a table fore the duration of the procedure call. It seems drastic, but if you are pretty sure that the data is static and if there would be ugly side-effects if it wasn't, then it is a solution.
Generally if an SQL statement is complex enough, it probably isn't portable between databases anyway, so I wouldn't worry about that aspect.
Views can be a good option to hide complexity, but the downside to hiding complexity is that people start doing things that seem 'simple' but are really complex and don't work as desired. You also get another object to consider for grants etc. [Edit: As Roland commented, this applies equally to stored procedures, views, object types etc.]
If you expect to return a large resultset, you should consider a pipelined table function. That way you can avoid having the entire resultset in the Oracle session at the same time.
Is it generally better to run functions on the webserver, or in the database?
Example:
INSERT INTO example (hash) VALUE (MD5('hello'))
or
INSERT INTO example (hash) VALUE ('5d41402abc4b2a76b9719d911017c592')
Ok so that's a really trivial example, but for scalability when a site grows to multiple websites or database servers, where is it best to "do the work"?
I try to think of the database as the place to persist stuff only, and put all abstraction code elsewhere. Database expressions are complex enough already without adding functions to them.
Also, the query optimizer will trip over any expressions with functions if you should ever end up wanting to do something like "SELECT .... WHERE MD5(xxx) = ... "
And database functions aren't very portable in general.
I try to use functions in my scripting language whenever calculations like that are required. I keep my SQL function useage down to a minimum, for a number of reasons.
The primary reason is that my one SQL database is responsible for hosting multiple websites. If the SQL server were to get bogged down with requests from one site, it would adversely affect the rest. This is even more important to consider if you are working on a shared server for example, although in this case you have little control over what the other users are doing.
The secondary reason is that I like my SQL code to be as portable as possible. I don't even want to try to count the different flavors of SQL that exist, so I try to keep functions (especially non-standard extensions) out of my SQL code, except for things like SUM or MIN/MAX.
I guess what I'm saying is, SQL is designed to store and retrieve data, and it should be kept to that purpose. Use your serving language of choice to perform any calculations beforehand, and keep your SQL code portable.
Personally, I try to keep the database as simple (to the minimum) with Insert, Update, Delete without having too much function that can be used in code. Stored Proc is the same, contain only task that are very close to persistence data and not business logic related.
I would put the MD5 outside. This will let met have this "data manipulation" outside the storage scope of the database.
But, your example is quite "easy" and I do not think it's bad to have it inside...
Use your database as means of persisting and mantaining data integrity. And leave business logic outside of it.
If you put business logic, any of it, in your database, you are making it more complex to manage and mantain in the future.
I think most of the time, you're going to want to leave the data manipulation to the webserver but, if you want to process databases with regards to tables, relations, etc., then go for the DB.
I'm personally lobbying my company to upgrade our MySQL server to 5.0 so that I can start taking advantage of procedures (which is killing a couple of sites we administer).
Like the other answers so far, I prefer to keep all the business logic in one place. Namely, my application language. (More specifically, in the object model, if one is present, but not all code is OO.)
However, if you look around StackOverflow for (my)sql-tagged questions about whether to use inline SQL or stored procedures, you'll find that most of the people responding to those are strongly in favor of using stored procs whenever and whereever possible, even for the most trivial queries. You may want to check out some of those questions to see some of the arguments favoring the other approach.