Is it generally better to run functions on the webserver, or in the database?
Example:
INSERT INTO example (hash) VALUE (MD5('hello'))
or
INSERT INTO example (hash) VALUE ('5d41402abc4b2a76b9719d911017c592')
Ok so that's a really trivial example, but for scalability when a site grows to multiple websites or database servers, where is it best to "do the work"?
I try to think of the database as the place to persist stuff only, and put all abstraction code elsewhere. Database expressions are complex enough already without adding functions to them.
Also, the query optimizer will trip over any expressions with functions if you should ever end up wanting to do something like "SELECT .... WHERE MD5(xxx) = ... "
And database functions aren't very portable in general.
I try to use functions in my scripting language whenever calculations like that are required. I keep my SQL function useage down to a minimum, for a number of reasons.
The primary reason is that my one SQL database is responsible for hosting multiple websites. If the SQL server were to get bogged down with requests from one site, it would adversely affect the rest. This is even more important to consider if you are working on a shared server for example, although in this case you have little control over what the other users are doing.
The secondary reason is that I like my SQL code to be as portable as possible. I don't even want to try to count the different flavors of SQL that exist, so I try to keep functions (especially non-standard extensions) out of my SQL code, except for things like SUM or MIN/MAX.
I guess what I'm saying is, SQL is designed to store and retrieve data, and it should be kept to that purpose. Use your serving language of choice to perform any calculations beforehand, and keep your SQL code portable.
Personally, I try to keep the database as simple (to the minimum) with Insert, Update, Delete without having too much function that can be used in code. Stored Proc is the same, contain only task that are very close to persistence data and not business logic related.
I would put the MD5 outside. This will let met have this "data manipulation" outside the storage scope of the database.
But, your example is quite "easy" and I do not think it's bad to have it inside...
Use your database as means of persisting and mantaining data integrity. And leave business logic outside of it.
If you put business logic, any of it, in your database, you are making it more complex to manage and mantain in the future.
I think most of the time, you're going to want to leave the data manipulation to the webserver but, if you want to process databases with regards to tables, relations, etc., then go for the DB.
I'm personally lobbying my company to upgrade our MySQL server to 5.0 so that I can start taking advantage of procedures (which is killing a couple of sites we administer).
Like the other answers so far, I prefer to keep all the business logic in one place. Namely, my application language. (More specifically, in the object model, if one is present, but not all code is OO.)
However, if you look around StackOverflow for (my)sql-tagged questions about whether to use inline SQL or stored procedures, you'll find that most of the people responding to those are strongly in favor of using stored procs whenever and whereever possible, even for the most trivial queries. You may want to check out some of those questions to see some of the arguments favoring the other approach.
Related
I don't know if this is a dumb question but I have this two doubts maybe you can help me clear out:
If my database and web server are on the same host, is there any relevant benefit on putting my procedures for conditionally selecting (using more than one SQL query) elements from a table in a SQL database procedure instead of just implementing them in a webserver-side script (in my case PHP) method with the rest of the web application code?
Secondly, and maybe even more important: Am I breaking any design rules doing / not doing this?
More specifically, I made a PHP script to select a random row from a table according to a probability density function determined by the number of previous selections of each row, which goes like this:
function acceptation_rejection_method($link,$tablename,$column,$condition="")
{
$max=get_col("max(".$column.")",$tablename,$link,$condition);
$min=get_col("min(".$column.")",$tablename,$link,$condition);
$bar_value=mt_rand($min,$max);
$count=get_nelements($tablename, $link,"where ".$column."<=".$bar_value);
$selected_row=get_row(mt_rand(0,$count-1), $tablename, $link,"where "
.$column."<=".$bar_value);
return $selected_row;
}
My function implements the acceptance rejection method (http://en.wikipedia.org/wiki/Acceptance-rejection_method), and my question is: Taking on account that my database and my web server are on the same host, is it of any improvement to rewrite that script as SQL code returning the row? (Assuming that all users of my app are using it constantly, like almost once in every request)
If I'm interpreting your question correctly, you want to know whether you should encode your acceptance/rejection algorithm into a pure database function, or whether what you're doing here is "right", from both an architectural and a performance point of view.
From a performance point of view, if there were a way to represent the query as a single SQL statement, it would likely be faster than your current implementation, but (assuming the column is indexed), probably not all that much faster.
You could, of course, create a stored procedure - but it looks like you're running this on multiple tables and columns, so you'd end up with lots of stored procedures.
Stored procedures have benefits and drawbacks, but in this case I'd say they make the application more fragile. Again, I doubt whether you'd see a huge performance impact.
Architecturally, I think what you're doing is likely the cleanest solution - you're abstracting the algorithm behind a single method.
I doubt that the form of requesting the same data from the same source does matter anything.
Assuming that all users of my app are using it constantly, like almost once in every request
Then you may want to think of the changing the approach.
Sorry, I am contradicts with myself.
First of all you have to profile your code and see, if it makes any trouble.
Only if so, then you may want to think of the changing the approach.
Say, you can request all the numbers at once, randomize them and store in the memory cache. and then just request one by one, deleting after use. refresh on exhaust.
In a simple MVC architecture design where database and web server are on the same host
Eh? MVC is a design pattern not a system/service architecture.
is there any relevant benefit on putting my procedures for conditionally selecting (using more than one SQL query) elements from a table in a SQL database procedure instead of just implementing them in a webserver-side script
Firstly, for the same population, you shouldn't need "more than one SQL query" regardless if you are looking at the entire sample or just a subset. i.e. your algorithm is flawed regardless of how you implement it.
Secondly, using the script you are hauling large amounts of data between the database and the PHP script which is an overhead. You are processing large amounts of data in PHP. PHP is not explicitly designed form manipulating large data sets - SQL and PL/SQL are. If you do as much processing as is practical on the database, then your application should run faster with less code.
I am designing a web application using php and mysql. I have a little doubt in database.
The application is like
Users get themselves registered.
Users input workload (after login ofcourse :) ).
User logs out.
Now there are multiple types of inputs which i accept on a same form. Say there are 3 types of inputs and they are stored in 7 different tables (client requirement :( )
Now my question is what is the best way to fire a query after inputs are done ?
For now i can think of following ways.
Fire 7 different queries from php
Write a trigger to propagate inputs in appropriate tables ?
Just guide me which approach is performance efficient ?
Thanks :)
Generally you want to stay away from triggers because you will be penalized later if you have to load a lot of data. Stored procedures are the way to go. You can have different conditions set to propagate inputs into different tables if needed.
I think you need to re-think your situation. You already know how awesome it would be to have fewer tables to deal with? Well, why not simulate that situation with a properly constructed view. Then, the client (are you sure it is the client? Sometimes ops says "client", when they mean, "report which we need to provide later") can have as many tables as your database can handle. And, by the way, you can still fire inserts and updates on a view.
Because it seems like your database does not have a clear relationship with PHP data structures, my instinct will be to separate the two more, not less. This would mean actually favoring stored procedures and triggers (assuming the above is not workable), which can be harder to debug, but it also means that PHP only has to think about
"I am inserting into this thing called <thing name>"
Instead of
"OMG, so this is like, totally intense first I have to talk to <table 1>, but I can't forget <table 2>, especially since those two might have... wait, did I miss my turn?"
OK, PHP isn't a ditz (I actually like the language), but it also should also be acting as dumb as possible when it comes to actually storing things -- that's' not its business.
You probably want to write a stored procedure that runs the seven queries. Think hard about how many transactions you need to run those seven queries.
How often do you think you will have to change which queries to run?
Do you have access to the database server?
Do you know which circumstance should trigger your triggers?
Are there other processes/applications writing data to the database?
If your queries change very often, I would go for code in PHP to just run the queries for you.
If you don't have access to the database server you may actually have to go for that method! You need permissions to write stored procedures and triggers.
If other processes are writing to the same database you have to discuss your requirements with the respective process owners! Otherwise data may appear/change in your database that was unwanted.
I personally tend to stay away from triggers unless they call very simple stored procedures and I'm 100% certain that nobody else is going to be bothered by the trigger!
I'm working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset. I've tended to use SPs only to do updates/deletes/inserts but the sheer size and complexity of this query seems like it needs to be broken down.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Im working on what is turning out to be a fairly complex SELECT query. I have several hierarchical queries being nested in a single SELECT and it is getting to be quite difficult to manage.
Ok, but why a stored procedure? Why not create a view instead?
I'm running into a few places where my inline views need to be executed in more than one place, so it seems like a reasonable idea to execute those once at the beginning of a stored procedure and then do some iteration over the results as needed.
Again - excellent use case for a view.
I'm wondering if there are any reasons why I should not, or could not, execute an Oracle Stored Procedure, called via my PHP code, and return as an OUT parameter the resultset.
If there aren't any technical problems with this, any comments on whether it is good or bad practice?
Well, I don't want to start a religous war, and I do not want to suggest the arguments against apply to your case. But here goes:
one reason why I tend to avoid stored procedures is portability - by that I mean mostly database portability. Stored procedure languages are notoriously unportable across dbs, and built-in libs like Oracle packages make things worse in that respect.
stored procedures take some additional processing power from your database server. this makes it harder to scale the application as a whole: if the capacity of your db server is exhausted due to stored procedures, and you need to upgrade harware or even buy an extra oracle software license because of that, I would not be a happy camper, especially if I could have bought cheap webserver/php boxes instead to do the computing.
Reasons where I would go for stored procedures:
language portability. If database portability is not so much an issue, but you do want to reuse logic across multiple applications, or have to ability to code in different languages, then stored procedures may save you writing language specific database invocation code.
complex permission scenarios. stored procedures give you uan extra level of permissions, since you can execute the procedure with the privileges of the definer or owner of the stored procedure. Sometimes this solves problems where a user needs to work with some tables, but cannot be allowed direct access to them.
saving rountrips: if you have to deal with complex, multistatement transactions, putting them in a stored procedures saves rountrips between the app and the db, because there is only one rountrip needed to execute the stored procedure. sometimes this can get you more performance.
I want to stress again that in all these scenarios, I would still advise to not put all your procedural logic in stored procedures. databases are best at storing and retrieving data, languages like php/java/perl/pick your poison are better at processing it.
If you are using the same inline view many times, its a good candidate for with clause
PHP can handle resultsets returned from stored procedures, by using Ref Cusrors. The Oracle+PHP Cookbook has an example.
So there are no technical impediments but as you can see from the various answers there are some philosophical aspects to your question. I think we can agree that if you are already wrapping some SQL statements in stored procedures - which you are - then you are not drastically compromising the portability of your system by extending "updates/deletes/inserts" to include selects.
The pertinent question then becomes "should you embed use a stored procedure for this particular query?" The answer to which hinges on precisely what you mean by:
the sheer size and complexity of this
query seems like it needs to be broken
down.
Deconstructing a big query into several smaller queries and then stitching results together in PL/SQL is seductive, but should be approached with caution. This can degrade the performance of your application, because PL/SQL has more overheads than SQL. Making your query more readable is not a good enough reason: you need to be certain that the complexity has a real and adverse effect on the running of your code.
A good reason for using a stored procedure rather than a view might be if you want to extend the applicability of the query by using bind variables or dynamic SQL in the body of the query.
A definitive answer to your question requires more details regarding the nature of your query and the techniques you are thinking of using to simplify it.
You could look at subquery factoring which may improve the readability of the query.
One risk of breaking up a single SQL query into a more procedural solution is you lose read consistency. As such you want to be pretty sure that someone changing data while your procedure runs won't break it. You may want to lock a table fore the duration of the procedure call. It seems drastic, but if you are pretty sure that the data is static and if there would be ugly side-effects if it wasn't, then it is a solution.
Generally if an SQL statement is complex enough, it probably isn't portable between databases anyway, so I wouldn't worry about that aspect.
Views can be a good option to hide complexity, but the downside to hiding complexity is that people start doing things that seem 'simple' but are really complex and don't work as desired. You also get another object to consider for grants etc. [Edit: As Roland commented, this applies equally to stored procedures, views, object types etc.]
If you expect to return a large resultset, you should consider a pipelined table function. That way you can avoid having the entire resultset in the Oracle session at the same time.
I'm working on a PHP web application with PostgreSQL. All of the SQL queries are being called from the PHP code. I've seen no Views, Functions or Stored Procedures. From what I've learned, it's always better to use these database subroutines since they are stored in the database with the advantages:
Encapsulation
Abstraction
Access rights (limited to DB Admin) and responsibility
Avoid compilation
I think I read something about performance improvements too. I really don't see why the team hasn't used these yet. In this particular case, I would like to know, from experience, is there any good reason to NOT use them?
Mostly when there are so many "SELECT" queries along the code, why not use Views?
I am planning on refactoring the code and start coding the subroutines on the DB Server. I would like to know opinions in favor or against this. The project is rather big (many tables), and expects lots of data to be stored. The amount of data you would have in a social network with some more stuff in it, so yeah, pretty big.
In my opinion, views and stored procedures are usually just extra trouble with little benefit.
I have written and worked with a bunch of different web apps, though none with bazillions of users. The ones with stored procedures are awkward. The ones with ad-hoc SQL queries are plenty fast (use placeholders and other best practices to avoid SQL injection). My favorite use database abstraction (ORM) so your code deals with PHP classes and objects rather than directly with the database. I have increasingly been turning to the symfony framework for that.
Also: in general you should not optimize for performance prematurely. Optimize for good fast development now (no stored procedures). After it's working, benchmark your app, find the bottlenecks, and optimize them. You just waste time and make complexity when you try to optimize from the start.
I. Views offer encapsulation, but if not carefully designed they can slow down the application. Use with caution.
II. Use functions if needed, no reason to put them in if they are unneeded.
III. Stored Procedures are a godsend, use them everywhere there is a static query!!
In response to the views vs. queries, try to use views with Stored Procedure's, the Stored Procedure's will mitigate some of the performance hit taken with most views.
The advantage of stored procedures is that, because all the processing is done on the database, you do not incur network overhead shunting intermediate result sets back and forth.
The disadvantage is that each RDBMS system out there has its own peculiar syntax for stored procedures. By implementing your business logic in stored procedures, you're pretty much restricting your app to a single database product, something you need to keep in mind if you intend your application to be database independent. Also, as gahooa pointed out, because stored procedures live in the database, your access to them as a developer may be restricted by local policy; some organisations will only let DBAs touch the database.
#WolfmanDragon: I don't know if views inherently make things slower; your mileage may vary, I guess, depending on the complexity of the view and the RDBMS you're using. Plus, some RDBMS allow you to materialise commonly-used views so access to them is as fast as a base table.
We try to use the features you mentioned only where there is a significant benefit
Being part of the "Database", they fall under "schema changes", rather than "source code changes", and are naturally harder to version control.
Whatever you do, just make sure you retain full visibility of who-changed-what-when, so that you can diff, rollback, and recover in the case of problems.
On a site with a reasonable amount of traffic , would it matter if the application/business logic is written as stored procedures ,triggers and views , instead of inside the PHP code itself?
What would be the best way to go keeping scalability in mind.
I can't provide you statistics, but unless you plan to change PHP for another language in the future, i can say keeping the business logic in PHP is more "scalability friendly".
Its always easier and cheaper to solve web server load problems than having them in the database. Your database will always need to be lighting quick and just throwing mirrors at it won't solve the problem. The more database slaves you have, the more writes you have to do.
In my experience, you should put business logic in PHP code rather than move it onto the database. Assuming your database is on a separate server, you don't want your database to be busy calculating formulas when requests come in.
Keep your database lightning fast to handle selects, inserts and updates.
I think you will have far better scalibility keeping database code in the database where it can be performance tuned as the number of records gets larger. You will also have better data integrity which is critical to the data even being useful. You don't see a lot of terrabyte sized relational dbs with all their code in the application.
Read some books on database performance tuning and then decide if you want to risk your company's data on application code.
There are several things to consider when trying to decide whether to place the business logic in the database or in the application code.
Will the same database be accessed
from different websites / web
applications? Will the sites /
applications be written in the same
language or in a different language?
If the database will be used from a single site, and the site is written in a single language then this becomes a non-issue. Otherwise, you'll need to consider the added complexity of stored procedures, triggers, etc vs trying to maintain database access logic etc in multiple code bases.
What are relational databases in
general good for and what is MySQL
good for specifically? What is PHP
best at?
This consideration is fairly straight-forward. Relational databases across the board and specifically in any variant of SQL are going to do a great job at inserting, updating, and deleting data. Generally they also handle ATOMIC transactions well. However, most variants of SQL (including MySQL) are not good at complex calculations, on-the-fly date handling, file system access etc.
PHP on the other hand is very fast at handling calculations, dates, file system accesses. By taking a little time you can even design your PHP code to work in such a way that records are only retrieved once and then stored when necessary.
What are you most familiar /
comfortable with using?
Obviously it tends to make more sense to use the tool with which you are most familiar.
As a last point consider that just because a drill can be used to cut sheet rock or because a hammer can be used to drive a screw doesn't mean that they should be used for these things. Sometimes I think that programmers do more potential damage by trying to make more powerful tools that do everything rather than making simpler tools that do one thing really, really well.
A well done PHP application should be enought, but keep in mind that it also requires you to do the less calls to the database you can. Store values you'll need later in PHP, shorten queries, cache, etc.
MySQL optimization is always a must, as it will also decrease the amount of databse calls by PHP, and thus getting a better performance. Therefore, there's no way you can't think of stored procedures, etc, if your aim is to increase performance. But MySQL by itself would't be enought if your PHP code isn't well done (lots of unecessary database calls), that's why I think PHP must be well coded, keeping in mind the hole process while developing it, so that unecessary stuff doesn't get in the way. Cache for instance, in "duet" with proper MySQL, is a great boost on performance.
My POV, even not having much experience in developing large applications is to write business logic in the DB for some reasons:
1 - Maintainability, I think that languages deprecate functions and changes many other things in a short time period, so if PHP changes version, you'll need to adapt your code to the new version
2 - DBs tends to be more language stable, so when a new version of a RDBMS comes out, it usually doesn't change many things in the way you write your queries or SPs, or it even doesn't change. Writing your logic in DB will reduce code adaptation because of a new DB version
3 - A RDBMS is more likely to be alive for a long period rather than a programming language. Also, as your data is critical, there is a big worry from the RDBMS developers for automatic migration of your whole data to the new RDBMS version, including your SPs. When clipper died, there were no ways to migrate systems to a new programming language, they had to be completely rewritten.
4 - If you think someday to change completely the language you are writing the application for some reason(language death, for example), the only thing to be rewritten will be the presentation and the SP calls, not business logic.
I'd like to know from other people here if what I pointed out makes sense, and if not, why. I'm on the same situation as Sabeen Malik, I'm thinking to begin my first huge project and I'm tending towards SPs because of what I wrote. So it's time to correct my POV if it's not so correct.
MySQL sucks at using advanced DB techniques, it's simple and fast. PHP, being a dynamic language, makes processing data very easy. Therefore, it usually makes sense to use PHP.