What's faster? php calculation or mysql query - php

I do have a SQL database with about 20 columns containing percentage values as decimals, like0.096303533707682 for example.
On my website I need to get these values, multiply with 100 and round them up so that 0.096303533707682 will be shown as 10% when the page is opened by the user.
Now my question is : is it faster/cheaper to calculate the 10% in advance and save the value to the database, so there is nothing to calculate after the query or doesn't it make much sense or difference ?
Thanks for your help!

For the individual operation the way to know is: Test it and be aware that performance on both sides can vary between versions and configurations.
On the larger system-level approach mind the following:
If you transfer data from the database to PHP to then do calculation you probably have extra cost due to networking, thus using SQL and calculating there has benefits.
Logic can be put into the database, using virtual columns, views or stored procedures/functions, thus multiple applications can share the logic
However for performance under scale it is simpler to add a new PHP host in front of a database than adding an extra database host.
For this specific question you also have to mind:
If you have to do the calculations every time maybe you can do this already while storing he data, thus taking more disk space but saving calculation time
Depending on the amount of data those costs could be quite neglectable and you should rather put it where it makes logically sense. (did you measure and see any problem at all or are you doing premature optimization?) Is the calculation more like "data retrival" or "business logic"? - This is a subjective choice.

Related

Which database for dealing with very large result-sets?

I am currently working on a PHP application (pre-release).
Background
We have the a table in our MySQL database which is expected to grow extremely large - it would not be unusual for a single user to own 250,000 rows in this table. Each row in the table is given an amount and a date, among other things.
Furthermore, this particular table is read from (and written to) very frequently - on the majority of pages. Given that each row has a date, I'm using GROUP BY date to minimise the size of the result-set given by MySQL - rows contained in the same year can now be seen as just one total.
However, a typical page will still have a result-set between 1000-3000 results. There are also places where many SUM()'s are performed, totalling many tens - if not hundreds - of thousands of rows.
Trying MySQL
On a usual page, MySQL was usually taking around around 600-900ms. Using LIMIT and offsets weren't helping performance and the data has been heavily normalised, and so it doesn't seem like further normalisation would help.
To make matters worse, there are parts of the application which require the retrieval of 10,000-15,000 rows from the database. The results are then used in a calculation by PHP and formatted accordingly. Given this, the performance of MySQL wasn't acceptable.
Trying MongoDB
I have converted the table to MongoDB, and it's speed is faster - it usually takes around 250ms to retrieve 2,000 documents. However, the $group command in the aggregation pipeline - needed to aggregate fields depending on the year they fall in - slows things down. Unfortunately, keeping a total and updating that whenever a document is removed/updated/inserted is also out of the question, because although we can use a yearly total for some parts of the app, in other parts the calculations require that each amount falls on a specific date.
I've also considered Redis, although I think the complexity of the data is beyond what Redis was designed for.
The Final Straw
On top of all of this, speed is important. So performance is up there it terms of priorities.
Questions:
What is the best way to store data which is frequently read/written and rapidly growing, with the knowledge that most queries will retrieve a very large result-set?
Is there another solution to the problem? I'm totally open to suggestions.
I'm a little stuck at the moment, I haven't been able to retrieve such a large result-set in an acceptable amount of time. It seems most datastores are great for small retrieval sizes - even on large amounts of data - but I haven't been able to find anything on retrieving large amounts of data from an even larger table/collection.
I only read the first two lines but you are using aggregation (GROUP BY) and then expecting it to just do realtime?
I will say you are new to the internals of databases not to undermine you but to try and help you.
The group operator in both MySQL and MongoDB is in-memory. In other words it takes whatever data structure you povide, whether it be an index or a document (row) and it will go through each row/document taking the field and grouping it up.
This means that you can speed it up in both MySQL and MongoDB by making sure you are using an index for the grouping, but still this only goes so far, even with housing the index in your direct working set in MongoDB (memory).
In fact using LIMIT with a OFFSET as well is probably just slowing things down even further frankly. Since after writing out the set MySQL then needs to query again to get your answer.
Once done it will write out the result, MySQL will write it out to a result set (memory and IO being used here) and MongoDB will reply inline if you have not set $out, the maximum size of the inline output being 16MB (the maximum size of a document).
The final point to take away here is: aggregation is horrible
There is no silver bullet that will save you here, some databases will attempt to boast about their speed etc etc but fact is most big aggregators use something called "pre-aggregated reports". You can find a quick introduction within the MongoDB documentation: http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/
This means that you put the effort of aggregating and grouping onto some other process which could do it easily enough allowing your reading thread, the one that needs to be realtime to do it's thang in realtime.

Setting up and configuring MySQL Database to store information about billion unique URLs

I am creating an app that stores new information every week consists of 10 X 12 digit integers for about millions of unique URLs. I need to extract information for particular week or for a particular week range for the given URL. I am going to use MySQL as a database.
Tip: To simplify, grouping the URLs by domain will reduce the amount of data to processed while querying.
I need advice about structuring a database for fast querying that takes optimal processing power and disk space.
Since no-one else has had a go, here's my advice.
To make a start, ignore 'fast querying that takes optimal processing power and disk space.' Looking for that at the start won't get you anywhere. Design and create a sensible database to meet your function requirements. Bung in random data until you've got approximately the volume you expect. Run queries against it and time them.
If your database is normalised properly, the disc space it takes will also be approximately minimised. Queries may be slow: use execution plans to see why they're slow, and add indexes to help their performance. Once you get acceptable performance, you're there.
The main point is a standard saying: don't optimise until you know you have a problem and you've measured it.

Is there any advantage to calculating the duration in MySQL as opposed to calculating duration in PHP (and then storing in MySQL)?

QUESTION: Is there any advantage to calculating the duration in MySQL as opposed to calculating duration in PHP (and then storing in MySQL)?
I had intended on calculating duration for each time an activity is done. Duration would be calculated in PHP then inserted into a MySQL DB (along with other data such as start time, end time, user, activity, etc).
But, based on this question Database normalization: How can I tabulate the data? I got the impression that rather than record duration at the time of insert, I should calculate it based on the start and end values saved in the MySQL DB.
Does this sound right? If yes, can someone explain why? Are there any recommended ways of calculating duration for values stored in MySQL?
EDIT:
After a user completes an activity online, the start and finish time for that activity is inserted into the DB. I was going to use these values to calculate duration (either in MySQL or prior to insertion (using PHP). Duration would later be used for other calculations.
I assume you have a start_time and an end_time as basis for your duration, both of which will be stored in the database anyway? Then yes, there's hardly an advantage to storing the duration in the database as well. It's only duplicated data that you are storing already anyway (duration = end - start, which is a really quick calculation), so why store it again? Furthermore, that only allows for the data to go out of sync. Say some bug causes the end_time to be updated, but not the duration. Now you have inconsistent data with no real way to know which is correct.
I think that it depends on the size of the database, server load, etc... I have had instances where processing in PHP is faster, whereas other times processing in MySQL. There are lots of factors that could affect performance.
However, the thing to keep in mind is that you want to avoid multiple database calls. If you are going to try this in PHP, and loop through each record and do an update per record, I think that the number of mysql calls could hinder performance. However, if you calculate the duration in PHP prior to the insert, then it makes sense. If the data is already in the database, then perhaps a single update statement would be the best option.
Just my 2c
In my opinion this depends mostly on the situation, so maybe add a little more details to your post in order to better understand what you're aiming at.
If your program has alot of database-related actions, and the
database server is slower than your PHP server, and it is about
thousands and thousands of calculations, it may be better to
calculate this in your PHP code.
If your program doesn't leaves the
database very much alone, and your code is already doing alot of
work, maybe then it would be slightly better to let the database do
the job.
If you've already stored start- and end-time in your table,
storing the duration would be a usually not necessary overhead (could
be done anyway for the reason to improve performance if database
space ain't an issue).
But, taking all of this into consideration, I don't think this decision is critical for most applications, it is most likely more a question of personal flavour and preference.
I think, that it should be better to create 2 separate fields in MySQL rahter than calculate the duration in PHP.
And the reasons
While it may be true, that MySQL will have to calculate it upon every retrieval, it is also true, that MySQL is very good at this. With a creation of a well made index, this should have no negative performance side-effects.
It gives you more data to work with. Lets say, you want to find out when users finished their particular action. If you kept only the duration, you would have to calculate this time again, thus making it prone to errors. Keeping another date may come in handy.
Also true, if you want to calculate some difference between activities of multiple users. In this case, a pre calculated value would be a pain in the a*s, since it would make you do more reverse calculations.
So in my opinion - add the separate fields. It is not a normalization problem, since you are not duplicating any data. Duration however would.

Database Size Vs PHP Processing Speed

Currently within my application I have a setup where upon submission of data, PHP Processes this data using several functions and then places in the database, for example
The quick brown fox jumps over the lazy dog
Becomes an SEO Title of
the-quick-brown-fox-jumps-over-the-lazy-dog
This string is stored in my database under title_seo, but my question is as follows.
What is more important:
The size of the database for storing this extra parsed strings
Or the resources used converting them
Now when I say "the resources used converting them", I mean that if I was to remove the column from the database and then parse the general title every time I output the contents.
Obviously when parsing every time the content get's called each request then the PHP usage increase but the database size decreases.
What should I be more worried about ?
Neither of them.
In this case the computational cost is minimal. But storing the seo_title in your table could allow you to change the url of your article title to whatever you want.
So I would keep the title_seo in the db
Relatively speaking hard drive space is considered cheaper then processing time. Therefore only having to waste processing time converting the title to the SEO title one time, and storing both of them in the database is the best option.
I don't have much to add to #yes123's answer, but in the end, the whole idea is that you should look if you can store more data in the database to prevent making unwanted calculations, don't take it as a rule, but mostly I favor storing more Db data vs making more calculations.
In your case, the calculations to convert a string into a SEO string look quite simple, so it wouldn't matter, but sometimes, you have a table with a few things like prices, unit quantity, discount and so on..., it's better to calculate the price when adding the rows than having to calculate it everytime you want to display it.
Hope I can help!
This is a question that is often replied by "depending on your own needs". Any software must have a balance between computing power versus memory used.
However, as many people says these days, "disk space is cheap".
So, going back to square one, ask yourself if you're going to have lots of data, and where are you going to store it (your own server, Amazon S3, etc) but I'll go with the "store only once" option.
if your have millions of pages, seo_title in DB - is bad idea. Good one - is have it as cached value text only.
If your have less than 10 millons pages, but more that 1 mln, - db will maybe need have separate table/s for se title
If your have less records, - no difference at all.
PS. Am thinking of corresponding site size <-> visitors number.

Speed of calculations in SQL statement

I've got a database (MySQL) table with three fields : id, score, and percent.
Long story short, I need to do a calculation on each record that looks like this:
(Score * 10) / (1 - percent) = Value
And then I need to use that value both in my code and as the ORDER BY field. Writing the SQL isn't my issue - I'm just worried about the efficiency of this statement. Is doing that calculation in my SQL statement the most efficient use of resources, or would I be better off grabbing the data and then doing math via PHP?
If SQL is the best way to do it, are there any tips I can keep in mind for keeping my SQL pulls as speedy as possible?
Update 1: Just to clear some things up, because it seems like many of the answers are assuming differently : Both the Score and the Percent will be changing constantly. Actually, just about every time a user interacts with the app, those fields will change (those fields are actually linked to a user, btw).
As far as # of records, right now it's very small, but I would like to be scaling for a target set of about 2 million records (users). At any given time I will only need 20ish records, but I need them to be the top 20 records sorted by this calculated value.
It sounds like this calculated value is of inherent meaning in your business domain; if this is the case, I would calculate it once (e.g. at the time the record is created), and use it just like any normal field. This is by far the most efficient way to achieve what you want - the extra calculation on insert or update has minimal performance impact, and from then on you don't have to worry about who does the calculation where.
Drawback is that you do have to update your "insert" and "update" logic to perform this calculation. I don't usually like triggers - they can be the source of impenetrable bugs - but this is a case where I'd consider them (http://dev.mysql.com/doc/refman/5.0/en/triggers.html).
If for some reason you can't do that, I'd suggest doing it on the database server. This should be pretty snappy, unless you are dealing with very large numbers of records; in that case the "order by" will be a real performance problem. It will be a far bigger performance problem if you execute the same logic on the PHP side, of course - but your database tends to be the bottleneck from a performance point of view, so the impact is larger.
If you're dealing with large numbers of records, you may just have to bite the bullet and go with my first suggestion.
If it weren't for the need to sort by the calculation, you could also do this on the PHP side; however, sorting an array in PHP is not something I'd want to do for large result sets, and it seems wasteful not to do sorting in the database (which is good at that kinda thing).
So, after all that, my actual advice boils down to:
do the simplest thing that could work
test whether it's fast enough within the constraints of your
project
if not, iteratively refactor to a faster solution, re-test
once you reach "good enough", move on.
Based on edit 1:
You've answered your own question, I think - returning (eventually) 2 million rows to PHP, only to find the top 20 records (after calculating their "value" one by one) will be incredibly slow. So calculating in PHP is really not an option.
So, you're going to be calculating it on the server. My recommendation would be to create a view (http://dev.mysql.com/doc/refman/5.0/en/create-view.html) which has the SQL to perform the calculation; benchmark the performance of the view with 200, 200K and 2M records, and see if it's quick enough.
If it isn't quick enough at 2M users/records, you can always create a regular table, with an index on your "value" column, and relatively little needs to change in your client code; you could populate the new table through triggers, and the client code might never know what happened.
doing the math in the database will be more efficient because sending the data back and forth from the database to the client will be slower than that simple expression no matter how fast the client is and how slow the database is.
Test it out and let us know the performance results. I think it is going to depend on the volume of data in your result set. For the SQL bit, just make sure your where clause has a covered index.
Where you do the math shouldn't be too important. It's the same fundamental operation either way. Now, if MySQL is running on a different server than your PHP code, then you may care which CPU does the calculation. You may wish that the SQL server does more of the "hard work", or you may wish to leave the SQL server doing "only SQL", and move the math logic to PHP.
Another consideration might be bandwidth usage (if MySQL isn't running on the same machine as PHP)--you may wish to have MySQL return whichever form is shorter, to use less network bandwidth.
If they're both on the same physical hardware, though, it probably makes no noticeable difference, from a sheer CPU usage standpoint.
One tip I would offer is to do the ORDER BY on the raw value (percent) rather than on the calculated value--this way MySQL can use an index on the percent column--it can't use indexes on calculated values.
If you have a growing number of records, your script (and its memory) will reach its limits faster than mysql would. Are you planning to fetch all records anyway?
Mysql would be quicker in general.
I don't get how you would use the value calculated in php in an ORDER BY afterwards. If you are planning to sort in php, it would become even slower but it all depends on the number of records you're dealing with.

Categories