I am designing a web application using php and mysql. I have a little doubt in database.
The application is like
Users get themselves registered.
Users input workload (after login ofcourse :) ).
User logs out.
Now there are multiple types of inputs which i accept on a same form. Say there are 3 types of inputs and they are stored in 7 different tables (client requirement :( )
Now my question is what is the best way to fire a query after inputs are done ?
For now i can think of following ways.
Fire 7 different queries from php
Write a trigger to propagate inputs in appropriate tables ?
Just guide me which approach is performance efficient ?
Thanks :)
Generally you want to stay away from triggers because you will be penalized later if you have to load a lot of data. Stored procedures are the way to go. You can have different conditions set to propagate inputs into different tables if needed.
I think you need to re-think your situation. You already know how awesome it would be to have fewer tables to deal with? Well, why not simulate that situation with a properly constructed view. Then, the client (are you sure it is the client? Sometimes ops says "client", when they mean, "report which we need to provide later") can have as many tables as your database can handle. And, by the way, you can still fire inserts and updates on a view.
Because it seems like your database does not have a clear relationship with PHP data structures, my instinct will be to separate the two more, not less. This would mean actually favoring stored procedures and triggers (assuming the above is not workable), which can be harder to debug, but it also means that PHP only has to think about
"I am inserting into this thing called <thing name>"
Instead of
"OMG, so this is like, totally intense first I have to talk to <table 1>, but I can't forget <table 2>, especially since those two might have... wait, did I miss my turn?"
OK, PHP isn't a ditz (I actually like the language), but it also should also be acting as dumb as possible when it comes to actually storing things -- that's' not its business.
You probably want to write a stored procedure that runs the seven queries. Think hard about how many transactions you need to run those seven queries.
How often do you think you will have to change which queries to run?
Do you have access to the database server?
Do you know which circumstance should trigger your triggers?
Are there other processes/applications writing data to the database?
If your queries change very often, I would go for code in PHP to just run the queries for you.
If you don't have access to the database server you may actually have to go for that method! You need permissions to write stored procedures and triggers.
If other processes are writing to the same database you have to discuss your requirements with the respective process owners! Otherwise data may appear/change in your database that was unwanted.
I personally tend to stay away from triggers unless they call very simple stored procedures and I'm 100% certain that nobody else is going to be bothered by the trigger!
Related
I have a dilemma, which I hope you will have some expert opinions on.
I have a table called CARDS with a column STATUS. If a record's status changes from 'download' to 'publish', I have to insert the record reference into another table called CARD_ASSIGNMENTS. Additionally, the record needs to be added into CARD_ASSIGNMENTS as many times as there are active records in SCANNERS.
In other words, if there are two active scanners, I will end up with two records in CARD_ASSIGNMENTS as below:
ID CARD_ID SCANNER_ID STATUS_ID
1 1 1 4
2 1 2 4
My dilemma is that I'm not quite sure what would be the most efficient way to execute the above. I've considered the following options:
From PHP - Do one UPDATE query and then the INSERT queries.
Create a stored procedure, which will take care of updating the CARDS record and adding records into the CARD_ASSIGNMENTS. Then, just call that stored procedure from PHP.
Create an ON UPDATE trigger for the CARDS table which will take care of processing INSERTS into the CARD_ASSIGNMENTS table.
PS. A simplified version of my database is available on MySQL Fiddle
Thanks,
Kate
Interesting question.
I'm going to give you clues about how to approach the problem.
So, you have to start by defining precisely three things:
the expected functionality
the access policy to the functionality
the technical upgrade policy
Here I'll detail these points.
So, the first point is that you have to define your functionality. By doing so, you will be able to tell whether adding a card implies always, in all the possible paradigms (sorry for the pedantic word I can't find a more proper one) of your information system, that this card MUST exist in the other table according to the specifications you provided. This 1-1 functional link must be said TRUE or FALSE. This is really important.
Said with other words, if there's at least one possibility that one day you don't want to copy that record to the other table, it means the trigger is a wrong solution, or at least it should be thought with an emergency mode (for example a variable inside that allows it to not get executed in some conditions) setup on.
Then comes the second point, about the access policy. You have to know whether the allowed accessing systems will do so by using your application layer or if they could develop their own (SAAS style). If so, your php layer will be useless and the stored procedure is an excellent option, since every single technical and business layer will go trough it yes or yes.
The last thing to know is whether you're possibly going to upgrade your php layer one day. In most of the cases the answer is yes. If so, you might have to modify the part containing this sql logic you're talking about. Then, having everything into a stored procedure vs storing it hardcoded into the php will definitely save you time, and improve stability.
Left brain right brain, I'm going to tell you my personal opinion afterall. I really love going with stored procedures but not using any triggers. If the environment allows it, I would go for an underlying batch, calling a set of defined stored procedures, concentrating the activity outside of the online scope.
The advantages are the following:
none or less risks of interruption of the online workflow since you reduce the number of operations
different schedule to alliviate the database load
more secure policy since executing the stored procedure requires only one grant, while using the same sql with php would require insert/update grants
better logging quality: you can have a log per job
better emergency response: when a job fails (if well thought) you can restart it, and that's it.
Long post, but that was interesting and I really wanted to share these ideas.
Cheers!
I would use triggers. Some developers say, that if you have too many triggers and stored procedures, the database lives its own life, that means you never know what is going to happen on insert, update etc. But in my opinion, triggers may help you a lot to keep database consistent, so even if someone inserts data directly from some administration tool, the integrity is still kept, because all necessary commands are executed. If you choose stored procedures, you would still have to know, that you need to call this procedure to insert any new data.
I don't know if this is a dumb question but I have this two doubts maybe you can help me clear out:
If my database and web server are on the same host, is there any relevant benefit on putting my procedures for conditionally selecting (using more than one SQL query) elements from a table in a SQL database procedure instead of just implementing them in a webserver-side script (in my case PHP) method with the rest of the web application code?
Secondly, and maybe even more important: Am I breaking any design rules doing / not doing this?
More specifically, I made a PHP script to select a random row from a table according to a probability density function determined by the number of previous selections of each row, which goes like this:
function acceptation_rejection_method($link,$tablename,$column,$condition="")
{
$max=get_col("max(".$column.")",$tablename,$link,$condition);
$min=get_col("min(".$column.")",$tablename,$link,$condition);
$bar_value=mt_rand($min,$max);
$count=get_nelements($tablename, $link,"where ".$column."<=".$bar_value);
$selected_row=get_row(mt_rand(0,$count-1), $tablename, $link,"where "
.$column."<=".$bar_value);
return $selected_row;
}
My function implements the acceptance rejection method (http://en.wikipedia.org/wiki/Acceptance-rejection_method), and my question is: Taking on account that my database and my web server are on the same host, is it of any improvement to rewrite that script as SQL code returning the row? (Assuming that all users of my app are using it constantly, like almost once in every request)
If I'm interpreting your question correctly, you want to know whether you should encode your acceptance/rejection algorithm into a pure database function, or whether what you're doing here is "right", from both an architectural and a performance point of view.
From a performance point of view, if there were a way to represent the query as a single SQL statement, it would likely be faster than your current implementation, but (assuming the column is indexed), probably not all that much faster.
You could, of course, create a stored procedure - but it looks like you're running this on multiple tables and columns, so you'd end up with lots of stored procedures.
Stored procedures have benefits and drawbacks, but in this case I'd say they make the application more fragile. Again, I doubt whether you'd see a huge performance impact.
Architecturally, I think what you're doing is likely the cleanest solution - you're abstracting the algorithm behind a single method.
I doubt that the form of requesting the same data from the same source does matter anything.
Assuming that all users of my app are using it constantly, like almost once in every request
Then you may want to think of the changing the approach.
Sorry, I am contradicts with myself.
First of all you have to profile your code and see, if it makes any trouble.
Only if so, then you may want to think of the changing the approach.
Say, you can request all the numbers at once, randomize them and store in the memory cache. and then just request one by one, deleting after use. refresh on exhaust.
In a simple MVC architecture design where database and web server are on the same host
Eh? MVC is a design pattern not a system/service architecture.
is there any relevant benefit on putting my procedures for conditionally selecting (using more than one SQL query) elements from a table in a SQL database procedure instead of just implementing them in a webserver-side script
Firstly, for the same population, you shouldn't need "more than one SQL query" regardless if you are looking at the entire sample or just a subset. i.e. your algorithm is flawed regardless of how you implement it.
Secondly, using the script you are hauling large amounts of data between the database and the PHP script which is an overhead. You are processing large amounts of data in PHP. PHP is not explicitly designed form manipulating large data sets - SQL and PL/SQL are. If you do as much processing as is practical on the database, then your application should run faster with less code.
Let's say there are two users trying to obtain the same information from a database. I red somewhere that in some languages this can be handled with something named threads, the result being time efficiency.
What is the best practice for solving this problem in PHP.
Thanks!
Reading (or obtaining) the information doesn't create a problem anyway. A problem will arise when two users will try to edit the same information at the same time.
In your case, you don't need to do anything - the webserver will take care of everything.
Moreover, there are no threads in PHP, assuming you want to "use" threads, what you can do is create a job queue, which is again, unnecessary in this case.
Okay, so I'm sure plenty of you have built crazy database intensive pages...
I am building a page that I'd like to pull all sorts of unrelated database information from. Here are some sample different queries for this one page:
article content and info
IF the author is a registered user, their info
UPDATE the article's view counter
retrieve comments on the article
retrieve information for the authors of the comments
if the reader of the article is signed in, query for info on them
etc...
I know these are basically going to be pretty lightning quick, and that I could combine some; but I wanted to make sure that this isn't abnormal?
How many fairly normal and un-heavy queries would you limit yourself to on a page?
As many as needed, but not more.
Really: don't worry about optimization (right now). Build it first, measure performance second, and IFF there is a performance problem somewhere, then start with optimization.
Otherwise, you risk spending a lot of time on optimizing something that doesn't need optimization.
I've had pages with 50 queries on them without a problem. A fast query to a non-large (ie, fits in main memory) table can happen in 1 millisecond or less, so you can do quite a few of those.
If a page loads in less than 200 ms, you will have a snappy site. A big chunk of that is being used by latency between your server and the browser, so I like to aim for < 100ms of time spent on the server. Do as many queries as you want in that time period.
The big bottleneck is probably going to be the amount of time you have to spend on the project, so optimize for that first :) Optimize the code later, if you have to. That being said, if you are going to write any code related to this problem, write something that makes it obvious how long your queries are taking. That way you can at least find out you have a problem.
I don't think there is any one correct answer to this. I'd say as long as the queries are fast, and the page follows a logical flow, there shouldn't be any arbitrary cap imposed on them. I've seen pages fly with a dozen queries, and I've seen them crawl with one.
Every query requires a round-trip to your database server, so the cost of many queries grows larger with the latency to it.
If it runs on the same host there will still be a slight speed penalty, not only because a socket is between your application but also because the server has to parse your query, build the response, check access and whatever else overhead you got with SQL servers.
So in general it's better to have less queries.
You should try to do as much as possible in SQL, though: don't get stuff as input for some algorithm in your client language when the same algorithm could be implemented without hassle in SQL itself. This will not only reduce the number of your queries but also help a great deal in selecting only the rows you need.
Piskvor's answer still applies in any case.
Wordpress, for instance, can pull up to 30 queries a page. There are several things you can use to stop MySQL pull down - one of them being memchache - but right now and, as you say, if it will be straightforward just make sure all data you pull is properly indexed in MySQL and don't worry much about the number of queries.
If you're using a Framework (CodeIgniter for example) you can generally pull data for the page creation times and check whats pulling your site down.
As other have said, there is no single number. Whenever possible please use SQL for what it was built for and retrieve sets of data together.
Generally an indication that you may be doing something wrong is when you have a SQL inside a loop.
When possible Use joins to retrieve data that belongs together versus sending several statements.
Always try to make sure your statements retrieve exactly what you need with no extra fields/rows.
If you need the queries, you should just use them.
What I always try to do, is to have them executed all at once at the same place, so that there is no need for different parts (if they're separated...) of the page to make database connections. I figure it´s more efficient to store everything in variables than have every part of a page connect to the database.
In my experience, it is better to make two queries and post-process the results than to make one that takes ten times longer to run that you don't have to post-process. That said, it is also better to not repeat queries if you already have the result, and there are many different ways this can be achieved.
But all of that is oriented around performance optimization. So unless you really know what you're doing (hint: most people in this situation don't), just make the queries you need for the data you need and refactor it later.
I think that you should be limiting yourself to as few queries as possible. Try and combine queries to mutlitask and save time.
Premature optimisation is a problem like people have mentioned before, but that's where you're crapping up your code to make it run 'fast'. But people take this 'maxim' too far.
If you want to design with scalability in mind, just make sure whatever you do to load data is sufficiently abstracted and calls are centralized, this will make it easier when you need to implement a shared memory cache, as you'll only have to change a few things in a few places.
Is it generally better to run functions on the webserver, or in the database?
Example:
INSERT INTO example (hash) VALUE (MD5('hello'))
or
INSERT INTO example (hash) VALUE ('5d41402abc4b2a76b9719d911017c592')
Ok so that's a really trivial example, but for scalability when a site grows to multiple websites or database servers, where is it best to "do the work"?
I try to think of the database as the place to persist stuff only, and put all abstraction code elsewhere. Database expressions are complex enough already without adding functions to them.
Also, the query optimizer will trip over any expressions with functions if you should ever end up wanting to do something like "SELECT .... WHERE MD5(xxx) = ... "
And database functions aren't very portable in general.
I try to use functions in my scripting language whenever calculations like that are required. I keep my SQL function useage down to a minimum, for a number of reasons.
The primary reason is that my one SQL database is responsible for hosting multiple websites. If the SQL server were to get bogged down with requests from one site, it would adversely affect the rest. This is even more important to consider if you are working on a shared server for example, although in this case you have little control over what the other users are doing.
The secondary reason is that I like my SQL code to be as portable as possible. I don't even want to try to count the different flavors of SQL that exist, so I try to keep functions (especially non-standard extensions) out of my SQL code, except for things like SUM or MIN/MAX.
I guess what I'm saying is, SQL is designed to store and retrieve data, and it should be kept to that purpose. Use your serving language of choice to perform any calculations beforehand, and keep your SQL code portable.
Personally, I try to keep the database as simple (to the minimum) with Insert, Update, Delete without having too much function that can be used in code. Stored Proc is the same, contain only task that are very close to persistence data and not business logic related.
I would put the MD5 outside. This will let met have this "data manipulation" outside the storage scope of the database.
But, your example is quite "easy" and I do not think it's bad to have it inside...
Use your database as means of persisting and mantaining data integrity. And leave business logic outside of it.
If you put business logic, any of it, in your database, you are making it more complex to manage and mantain in the future.
I think most of the time, you're going to want to leave the data manipulation to the webserver but, if you want to process databases with regards to tables, relations, etc., then go for the DB.
I'm personally lobbying my company to upgrade our MySQL server to 5.0 so that I can start taking advantage of procedures (which is killing a couple of sites we administer).
Like the other answers so far, I prefer to keep all the business logic in one place. Namely, my application language. (More specifically, in the object model, if one is present, but not all code is OO.)
However, if you look around StackOverflow for (my)sql-tagged questions about whether to use inline SQL or stored procedures, you'll find that most of the people responding to those are strongly in favor of using stored procs whenever and whereever possible, even for the most trivial queries. You may want to check out some of those questions to see some of the arguments favoring the other approach.