Are date calculations faster in PHP or MySQL? - php

A while back a database administrator mentioned to me that some server-side programmers don't utilize SQL as often as they should. For instance, when it comes making time-based calculations, he claims that SQL is better suited.
I didn't give that much consideration since it didn't really affect what I was doing. However, now I am making considerable time-based calculations. Typically, I have used PHP for this in the past. For the sake of performance, I am curious as to whether SQL would be more efficient.
For example, these are some of the tasks I have been doing:
$todaysDate = date("d-m-Y");
$todayStamp = strtotime($todaysDate); //Convert date to unix timestamp for comparison
$verifyStamp = strtotime($verifyDate); //Convert submitted date to unix timestamp for comparison
//The date comparison
if((strtotime($lbp) <= $verifyStamp) && ($verifyStamp <= $todayStamp)){
return true;
}
else {
$invalid = "$verifyDate is outside the valid date range: $lbp - $todaysDate.";
return $invalid;
}
The variables aren't that important - it's just to illustrate that I am making comparisons, adding time to current dates, etc.
Would it be beneficial if I were to translate some or all of these tasks to SQL? Note that my connection to my database is via PDO and that I usually have to create a new connection. Also, my date calculations typically will be inserted into a database. So when I say that I'm making comparisons or adding time to a current date, I mean that I'm making these calculation before adding whatever results from them to a query:
i.e. $result = something...INSERT INTO table VALUE = $result
The calculations could just as easily be INSERT INTO table VALUE = DATE_ADD(...
Any input is appreciated.

The overhead of talking to the database would negate any and all advantages it may or may not have. It's simple: if you're in PHP anyway, do the calculations in PHP. If the data you want to do calculations on is in the database, do it in the database. Don't transition between systems just because unless you can really proof that it saves you a ton of time to do so (most likely it doesn't). What you're showing is child's play in either system, it hardly gets any faster as it is.

Well when you consider SQL with any of the programming language, then using SQL is more preferable for calculations than any other language.
If you consider Php and SQL then I would like to tell you what I have realized from my analysis..
The PHP architecture is a client-server architecture, that is Client sends a HTTP-Request to the Server and the server responds back to the client with HTTP-Response
One the backside of the server, the server generates a simple HTML Format page which is static that page is generated using the dynamic codes of PHP on the server.
Now the total time is:
HTTP-Request + SQL-Query + Fetching data from SQL Query + Data Manipulation of SQL Data + Php-to-HTMLGeneration + HTTP-Response
But if in case you use the calculations to be done within the SQL Query itself then the time for Data Manipulation of SQL in php would be saved. As the Php would have to deal with the datas explicitly.
So the total time would be:
HTTP-Request + SQL-Query + Fetching data from SQL Query + Php-to-HTMLGeneration + HTTP-Response
This may look almost equal if you are dealing with less amount of data. But for an instance if you are dealing with 1000 of rows in one query then a loop in php which would run 1000 time would be more time consuming than running a single query which would calculate the complete 1000 row in just one command.

One thing to consider is how many date calculations you are performing and where in the query your conversion is taking place. If you are searching a DB of 10 million records and you are converting a DateTime field into a Unix Timestamp inside of a WHERE clause for every single record and only ending up with 100 records in the query result it would be less efficient to use SQL to perform that conversion on 10 million records than it would be to use PHP to convert the DateTime object into a Timestamp on only the resulting 100 records.
Granted, only the result of 100 records would be converted anyway if you put the conversion in the select statement so it would be pretty much the same.

Related

php : speed up levensthein comparing, 10k + records

In my MySQL table I have the field name, which is unique. However the contents of the field are gathered on different places. So it is possible I have 2 records with a very similar name instead of second one being discarded, due to spelling errors.
Now I want to find those entries that are very similar to another one. For that I loop through all my records, and compare the name to other entries by looping through all the records again. Problem is that there are over 15k records which takes way too much time. Is there a way to do this faster?
this is my code:
for($x=0;$x<count($serie1);$x++)
{
for($y=0;$y<count($serie2);$y++)
{
$sim=levenshtein($serie1[$x]['naam'],$serie2[$y]['naam']);
if($sim==1)
print("{$A[$x]['naam']} --> {$B[$y]['naam']} = {$sim}<br>");
}
}
}
A preamble: such a task will always be time consuming, and there will always be some pairs that slip through.
Nevertheless, a few ideas :
1. actually, the algorithm can be (a bit) improved
assuming that $series1 and $series2 have the same values in the same order, you don't need to loop over the whole second array in the inner loop every time. In this use case you only need to evaluate each value pair once - levenshtein('a', 'b') is sufficient, you don't need levenshtein('b', 'a') as well (and neither do you need levenstein('a', 'a'))
under these assumptions, you can write your function like this:
for($x=0;$x<count($serie1);$x++)
{
for($y=$x+1;$y<count($serie2);$y++) // <-- $y doesn't need to start at 0
{
$sim=levenshtein($serie1[$x]['naam'],$serie2[$y]['naam']);
if($sim==1)
print("{$A[$x]['naam']} --> {$B[$y]['naam']} = {$sim}<br>");
}
}
2. maybe MySQL is faster
there examples in the net for levenshtein() implementations as a MySQL function. An example on SO is here: How to add levenshtein function in mysql?
If you are comfortable with complex(ish) SQL, you could delegate the heavy lifting to MySQL and at least gain a bit of performance because you aren't fetching the whole 16k rows into the PHP runtime.
3. don't do everything at once / save your results
of course you have to run the function once for every record, but after the initial run, you only have to check new entries since the last run. Schedule a chronjob that once every day/week/month.. checks all new records. You would need an inserted_at column in your table and would still need to compare the new names with every other name entry.
3.5 do some of the work onInsert
a) if the wait is acceptable, do a check once a new record should be inserted, so that you either write it to a log oder give a direct feedback to the user. (A tangent: this could be a good use case for an asynchrony task queue like http://gearman.org/ -> start a new process for the check in the background, return with the success message for the insert immediately)
b) PHP has two other function to help with searching for almost similar strings: metaphone() and soundex() . These functions generate abstract hashes that represent how a string will sound when spoken. You could generate (one or both of) these hashes on each insert, store them as a separate field in your table and use simple SQL functions to find records with similar hashes
The trouble with levenshtein is it only compares string a to string b. I built a spelling corrector once that puts all the strings a into a big trie, and that functioned as a dictionary. Then it would look up any string b in that dictionary, finding all nearest-matching words. I did it first in Fortran (!), then in Pascal. It would be easiest in a more modern language, but I suspect php would not make it easy. Look here.

Why to use TIMESTAMP instead of INT value? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the pros and cons of the various date/time field types in MySQL?
In many database, I saw they use TIMESTAMP type to store time value, my question is why don't use INT type to store $date->getTimestamp() value and then we could get time value easier?
Because when you treat a date as a date, and not as a number, you can do neat stuff like adding durations (DATE+1MONTH-1HOUR). TIMESTAMP, DATETIME etc are also optimized for dates, and will do native validation for you.
There can be many reasons, I'd say the most obvious (and straight forward one) is that the databse knows that the value is a TIMESTAMP (so Date/Time related), which is not the case for an INT.
This has several consequences, for example that Mysql is aware of timezones and automatically concerts the TIMESTAMP to UTC. That means the data is much more concrete, because it is clear what the data means. For the INT types you would need to take care of that your own, it would not be relative to the database any longer.
The next big difference is automatic initialization and updating. That means, if the row is inserted or changed, a TIMESTAMP column will get "stamped" with the current time.
There are several other differences then as well between these types, most of the are related to data/time functions. I suggest you dig into:
11.3.1. The DATE, DATETIME, and TIMESTAMP Types
You can do many more things in the database when using a timestamp instead of a plain number. Query by day of week, group by month, determine intervals, etc...
An int is only 4 bytes a datetime is 8 bytes so you'd have less possible values. In particular for php you are getting Unix timestamps which have a min date 1901-12-13, and max of 2038-01-19. This is essentially going back and making the same sort of decisions that lead to the Y2K problem. Assuming you can live with that you should be okay but what about non-Unix based hosts?
Because actual representation of data does not have to be exposed. Why?
flexibility (internal representation of data may change any time and user won't depend on it).
reliability (database may check data for consistency if it knows what the data is)
readability (there's no reason to treat a timestamp as integer, it shows the meaning of a record)
The reason there are different number types (such as timestamp), is to provide data integrity.
Data Integrity makes sure that we don't accidentally put in the number of waffles we had for breakfast :)
If we try to put in an invalid timestamp, MySQL will throw an error and prevent us from putting in bad data.

Which is most suitable datatype for storing time and date

I have two choices of storing date and time in my database.
Generate the time & date from time function in php and then storing in database into int datatype which is of 4 bytes.
Generate the time & date during insertion in database into datetime datatype which is of 8 bytes.
My question is which type will make my SQL queries faster if I use date&time column for sorting.
I always hate it building queries on a DB that contains human unreadable date and time values in int format.
Maybe the query will be a nano second faster if you use int but is it really worth it? I say no!
Use a TIMESTAMP datatype. It's stored as a number, but returned formatted. So it's faster for sorting, and more human-readable.
You are better off using the native format of the database to store date times.
I can see almost no occasion where you would want to use another binary format for this purposes. That would make that component of the database essentially inoperable for other access methods.
As for human readability, you can solve that issue by having views access the tables (in other databases you can define a table with computed columns) that provide the dates in human readable format. Also, tools that know the database should produce the output in human readable format.
Unless you have a very large database or are in a very tightly constrained environment, then don't worry about storage. You are probably not thinking about the extra bits that are stored when you allow NULLs for a column, or the extra padding that might go between fields when they are no aligned on hardware word boundaries, or the empty space on data pages because the records don't align on page boundaries.
If space is such an important consideration, then you might want to develop your own date/time format to see if you can get it down to 2 or 3 bytes.

Basic PHP/MySQL math example

I have a PHP form that grabs user-entered data and then posts it to a MySQL database. I'd like to know, how I can take the mathematical difference between two fields and post it to a third field in the database?
For example, I'd like to subtract "travel_costs" from "show_1_price" and write the difference to the "total_cost" variable. What's the best way to do this? Thanks so much.
You can lately process a select query: SELECT show_1_price - travel_costs AS pricediff FROM my_table; and then grab value in php and again do an insert query...
Should be simple to do on the PHP side of things how about
query=sprintf("INSERT INTO table VALUES(%d, %d, %d)", travel_costs,
show_1_price, show_1_price - travel_cost);
Generally though it is bad form to store a value in a database that can be calculated from other values. The reason being that you may never ever access this value again yet you are using storage for it. CPU cycles are much more abundant today so calculate the value when need. This is not a golden rule though - there are times when it could be more efficient to store the calculated value - although this is not usually the case.

Multiple Queries to a Large MySQL Table

I have a table with columns ID(int), Number(decimal), and Date(int only timestamp). There are millions of rows. There are indexes on ID and Date.
On many of my pages I am querying this four or five times for a list of Numbers in a specified date range (the range being different each query).
Like:
select number,date where date < 111111111 and date >111111100000
I'm querying these sets of data to be placed on several different charts. "Today vs Yesterday", "This Month vs Last Month", "This Year vs Last Year".
Would querying the largest possible result set with the sql statement and then using my programming language to filter down the query via a sorted and spliced array be better than waiting for each of these 0.3 second queries to finish?
Is there something else that can be done to speed this up?
It depends on the result set and the executing speed of your queries. There is no ultimate answer to this question.
You should benchmark and calculate the results if you really need to speed up things.
But keep in mind that premature optimization should be avoided besides that you'll implement an already implemented logic in your code which can contain bugs, etc. etc.
While it may cause the query to perform quicker you have to ask yourself about the potential impacts to memory if you were to attempt to load in the entire range of records and then aggregating it programatically.
Chances are that the MySQL optimatizations based on index will perform better than anything you could come up with anyway so it sounds like a bad idea.

Categories