I will try to make this question as clear as I can but bare in mind that English is not my first language. I have a web application written in PHP using a MySQL database. In a table I might have thousands of entries and in each entry I am storing this data:
$hourly_rate,
$minutes
when I process my table through a loop, I calculate the net value using the following formula:
$net_value = $minutes*($hourly_rate/60);
now the question, should I instead add a $net_value field on my table, calculate the net value on the client side using JQUERY and then upload the result of the calculation in the $net_value field? Which one do you think is the best approach considering I might have 1000 users accessing the system at the same time?
Thank you for you help,
Donato
It depends on how important the value is. If it's accessed all the time and by a lot of people it may be worth storing in the database.
But I don't suggest using jQuery to do the calculation, do it server-side for better security.
Generally I wouldn't store such simple calculated values in the database. Doing that calculation in PHP takes so little time that it isn't even worth thinking about.
There is a good reason to store the calculation in the database. The good reason is that this calculation may be used in more than one place in the application.
I would recommend that you create a view, something like:
create view vw_rates as
select t.*, minutes*(hourly_rate/60) as net_value
from t
By putting in a view, everyone will be using the same definition. So a report that summarizes by region or time, for example, would use the same definition. In other databases, you can do the same thing using computed columns, but MySQL does not support them.
From a performance perspective, such a simple calculation on such a small amount of data probably does not make a difference. Do remember, though, that the database can do these types of calculations in parallel if you have multiple threads/processes.
Related
Should I store number of comments/likes/dislikes in my image table by updating them after every new like/dislike/comment record
or
should I just query something like Vote::where('image_id', $this->id)->where('vote', true)->count(); every time I need to know the count?
This really depends on the actual case (e.g. how much data you are storing and expecting) and also there are two colliding philosophies which have been seen both.
Argument one: Do not save calculable data. If you can save a chain of events (e.g. all votes), you can always deduct the totals from that. However, the drawback of that is that it might become slow on applications with much data.
Argument two: Do save the data that you need. Depending on what kind of persistency layer you are using, it might be a good idea to save the data whenever it is changed. E.g. MongoDB does have a "just store what you need" approach. Advantage: Even on applications with much data this would be with good performance as you do not have to calculate anything on the fly and can just output the number.
TL;DR: It really depends on which factors your application likely struggles more with.
I am creating a very simple web site, where you can go and register your workouts, so other people can see it and comment.
The problem is that, for example if you train 6 days a week, and do 10 exercises, I will have to store 60 exercises on Mysql.
I was thinking about creating a table with 60 rows, and then I would store all that info in there, but for some reason, this does not seems to by the best way.
So what should I do? I was looking here in Stack Overflow, and I saw something about storing this using an array and serializing it using PHP, but I'm not really sure about that.
You should consider what a database is before worrying about this kind of problem. It is common for mysql to handle millions of entries. It is not likely you will exceed this in a normal scenario. If you did get a lot of traffic and your database grew to a point where expansion and upgrading made sense, you would consult then. For now, mysql is going to be your champion.
Serialization has its uses, but when you want to individually examine results, you will want to go with a relational design. That way you can store every bit of info on a specific workout session and give better stats to the user. Afterall, stats are addicting. Many users on this site keep coming back to build stats.
The way I read your question was that you thought storing 60 workouts individually was a lot of data. That you have a form with many fields and you want to know how to store those fields in a database in a way that makes sense. Not just for you, but for anyone that comes in and registers. Never the less, this is still a relatively small task for any database. A relational design is definitely the way to go either way.
You should first decide logically what you want to store, and not how you store them.
Once you know what to store, then you should normalized the data to remove redundant data. This is done usually by converting the data to 3nf. See http://en.m.wikipedia.org/wiki/Database_normalization
That way you can be sure that all the required data is captured.
I suggest you using the database in any case aimply because you are "storing" data. Array serializzation is not really storing data, its more of storing a piece of process for later use.
With the database you can do many more things, you might not have the fantasy right now but if one day you will have a new plan on using the data you collected you will have more space for expansion.
QUESTION: Is there any advantage to calculating the duration in MySQL as opposed to calculating duration in PHP (and then storing in MySQL)?
I had intended on calculating duration for each time an activity is done. Duration would be calculated in PHP then inserted into a MySQL DB (along with other data such as start time, end time, user, activity, etc).
But, based on this question Database normalization: How can I tabulate the data? I got the impression that rather than record duration at the time of insert, I should calculate it based on the start and end values saved in the MySQL DB.
Does this sound right? If yes, can someone explain why? Are there any recommended ways of calculating duration for values stored in MySQL?
EDIT:
After a user completes an activity online, the start and finish time for that activity is inserted into the DB. I was going to use these values to calculate duration (either in MySQL or prior to insertion (using PHP). Duration would later be used for other calculations.
I assume you have a start_time and an end_time as basis for your duration, both of which will be stored in the database anyway? Then yes, there's hardly an advantage to storing the duration in the database as well. It's only duplicated data that you are storing already anyway (duration = end - start, which is a really quick calculation), so why store it again? Furthermore, that only allows for the data to go out of sync. Say some bug causes the end_time to be updated, but not the duration. Now you have inconsistent data with no real way to know which is correct.
I think that it depends on the size of the database, server load, etc... I have had instances where processing in PHP is faster, whereas other times processing in MySQL. There are lots of factors that could affect performance.
However, the thing to keep in mind is that you want to avoid multiple database calls. If you are going to try this in PHP, and loop through each record and do an update per record, I think that the number of mysql calls could hinder performance. However, if you calculate the duration in PHP prior to the insert, then it makes sense. If the data is already in the database, then perhaps a single update statement would be the best option.
Just my 2c
In my opinion this depends mostly on the situation, so maybe add a little more details to your post in order to better understand what you're aiming at.
If your program has alot of database-related actions, and the
database server is slower than your PHP server, and it is about
thousands and thousands of calculations, it may be better to
calculate this in your PHP code.
If your program doesn't leaves the
database very much alone, and your code is already doing alot of
work, maybe then it would be slightly better to let the database do
the job.
If you've already stored start- and end-time in your table,
storing the duration would be a usually not necessary overhead (could
be done anyway for the reason to improve performance if database
space ain't an issue).
But, taking all of this into consideration, I don't think this decision is critical for most applications, it is most likely more a question of personal flavour and preference.
I think, that it should be better to create 2 separate fields in MySQL rahter than calculate the duration in PHP.
And the reasons
While it may be true, that MySQL will have to calculate it upon every retrieval, it is also true, that MySQL is very good at this. With a creation of a well made index, this should have no negative performance side-effects.
It gives you more data to work with. Lets say, you want to find out when users finished their particular action. If you kept only the duration, you would have to calculate this time again, thus making it prone to errors. Keeping another date may come in handy.
Also true, if you want to calculate some difference between activities of multiple users. In this case, a pre calculated value would be a pain in the a*s, since it would make you do more reverse calculations.
So in my opinion - add the separate fields. It is not a normalization problem, since you are not duplicating any data. Duration however would.
I've got a database (MySQL) table with three fields : id, score, and percent.
Long story short, I need to do a calculation on each record that looks like this:
(Score * 10) / (1 - percent) = Value
And then I need to use that value both in my code and as the ORDER BY field. Writing the SQL isn't my issue - I'm just worried about the efficiency of this statement. Is doing that calculation in my SQL statement the most efficient use of resources, or would I be better off grabbing the data and then doing math via PHP?
If SQL is the best way to do it, are there any tips I can keep in mind for keeping my SQL pulls as speedy as possible?
Update 1: Just to clear some things up, because it seems like many of the answers are assuming differently : Both the Score and the Percent will be changing constantly. Actually, just about every time a user interacts with the app, those fields will change (those fields are actually linked to a user, btw).
As far as # of records, right now it's very small, but I would like to be scaling for a target set of about 2 million records (users). At any given time I will only need 20ish records, but I need them to be the top 20 records sorted by this calculated value.
It sounds like this calculated value is of inherent meaning in your business domain; if this is the case, I would calculate it once (e.g. at the time the record is created), and use it just like any normal field. This is by far the most efficient way to achieve what you want - the extra calculation on insert or update has minimal performance impact, and from then on you don't have to worry about who does the calculation where.
Drawback is that you do have to update your "insert" and "update" logic to perform this calculation. I don't usually like triggers - they can be the source of impenetrable bugs - but this is a case where I'd consider them (http://dev.mysql.com/doc/refman/5.0/en/triggers.html).
If for some reason you can't do that, I'd suggest doing it on the database server. This should be pretty snappy, unless you are dealing with very large numbers of records; in that case the "order by" will be a real performance problem. It will be a far bigger performance problem if you execute the same logic on the PHP side, of course - but your database tends to be the bottleneck from a performance point of view, so the impact is larger.
If you're dealing with large numbers of records, you may just have to bite the bullet and go with my first suggestion.
If it weren't for the need to sort by the calculation, you could also do this on the PHP side; however, sorting an array in PHP is not something I'd want to do for large result sets, and it seems wasteful not to do sorting in the database (which is good at that kinda thing).
So, after all that, my actual advice boils down to:
do the simplest thing that could work
test whether it's fast enough within the constraints of your
project
if not, iteratively refactor to a faster solution, re-test
once you reach "good enough", move on.
Based on edit 1:
You've answered your own question, I think - returning (eventually) 2 million rows to PHP, only to find the top 20 records (after calculating their "value" one by one) will be incredibly slow. So calculating in PHP is really not an option.
So, you're going to be calculating it on the server. My recommendation would be to create a view (http://dev.mysql.com/doc/refman/5.0/en/create-view.html) which has the SQL to perform the calculation; benchmark the performance of the view with 200, 200K and 2M records, and see if it's quick enough.
If it isn't quick enough at 2M users/records, you can always create a regular table, with an index on your "value" column, and relatively little needs to change in your client code; you could populate the new table through triggers, and the client code might never know what happened.
doing the math in the database will be more efficient because sending the data back and forth from the database to the client will be slower than that simple expression no matter how fast the client is and how slow the database is.
Test it out and let us know the performance results. I think it is going to depend on the volume of data in your result set. For the SQL bit, just make sure your where clause has a covered index.
Where you do the math shouldn't be too important. It's the same fundamental operation either way. Now, if MySQL is running on a different server than your PHP code, then you may care which CPU does the calculation. You may wish that the SQL server does more of the "hard work", or you may wish to leave the SQL server doing "only SQL", and move the math logic to PHP.
Another consideration might be bandwidth usage (if MySQL isn't running on the same machine as PHP)--you may wish to have MySQL return whichever form is shorter, to use less network bandwidth.
If they're both on the same physical hardware, though, it probably makes no noticeable difference, from a sheer CPU usage standpoint.
One tip I would offer is to do the ORDER BY on the raw value (percent) rather than on the calculated value--this way MySQL can use an index on the percent column--it can't use indexes on calculated values.
If you have a growing number of records, your script (and its memory) will reach its limits faster than mysql would. Are you planning to fetch all records anyway?
Mysql would be quicker in general.
I don't get how you would use the value calculated in php in an ORDER BY afterwards. If you are planning to sort in php, it would become even slower but it all depends on the number of records you're dealing with.
I am confronted with a new kind of problem which I haven't encountered yet in my very young programming "career" and would like to know your opinion about how to tackle it best.
The situation
A research application (php/mysql) gathers stress related health data from users. User gets a an analyses after filling in the questionnaire. Value for each parameter is transformed into a percentile value using a benchmark (mean and standard devitation of existing data set).
The task
Since more and more ppl are filling in the questionnaire, there is the potential to make the benchmark values (mean/SD) more accurate by recalculating them using the new user data. I would like the database to regularly run a script that updates the benchmark values.
The question
I've never used stored precedures so far and I only have a slight notion of what they are but somehow I have a feeling they could maybe help me with this? Or should I write the script as php and then set up a cron job?
[edit]After the first couple of answers it looks like cron is clearly the way to go.[/edit]
What you're considering could be done in a number of ways.
You could setup a trigger in your DB to recalculate the values whenever a new record is updated. You could store the code needed to update the values in a sproc if necessary.
You could write a PHP script and run it regularly via cron.
#1 will slow down inserts to your database but will make sure your data is always up to date. #2 may lock the tables while it updates the new values, and your data will only be accurate until the next update. #2 is much easier to back up, as the script can easily be stored in your versioning system, whereas you'd need to store the trigger and sproc creation scripts in whatever backup you'd make.
Obviously you'll have to weigh up your requirements before you pick a method.
PHP set up as a cron job lets you keep it in your source code management system, and if you're using a database abstraction layer it'll be portable to other databases if you ever decide to switch. For those reasons, I tend to go with scripts over stored procedures.
The easiest way to make this work is probably to write a script in the same language your website is using (sounds like PHP) and call it from cron.
No need to make it more complicated than it needs to be by putting the logic in two places (your existing calculations and a stored procedure).
If the volume of data is big enough that calculating it on the fly is too much, then either:
Cron job with php script to denormalise the totals
Trigger on inserts that increments totals
Go with the cron job way. Simple, solid, works. In the PHP/MySQL world I would say stored procedures are no-go.