MySQL DATETIME functions beauty vs perfomance (speed) - php

How much faster (in %) sql will be if I will avoid to used built-in mysql date and time functions ?
What do I mean ? For example: SELECT id FROM table WHERE WEEKOFYEAR(inserted)=WEEKOFYEAR(CURDATE())
MySQL has a lot of buil-in function to work with date and time, and they are suitable as well. But what about peromance ?
Above sql can be rewritten without built-in functions, like: SELECT id FROM table WHERE inserted BETWEEN 'date for 1 day of particular week 00:00:00' AND 'last day of particular week 23:59:59', server side code become worse :( but on db side we could use indexes
I see two problems for usage built-in functions:
1. indexes
I did small test
mysql> explain extended select id from table where inserted between '2013-07-01 00:00:00' and '2013-07-01 23:59:59';
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | table | range | ins | ins | 4 | NULL | 7 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+----------+--------------------------+
mysql> explain extended select id from table where date(inserted)=curdate();
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
| 1 | SIMPLE | table | index | NULL | ins | 4 | NULL | 284108 | 100.00 | Using where; Using index |
+----+-------------+-------+-------+---------------+------+---------+------+--------+----------+--------------------------+
First one took 0.00 sec second one was running after first one and took 0.15. Everything was made with small anout of data.
and second problem, is
time to call that functions
If in table I have 1 billion records it means that WEEKOFYEAR, DATE whatever... would be called so many times, so many records do we have, right ?
So the question will it bring real profit if I will stop to work with mysql built-in date and time functions ?

Using a function of a column in a WHERE clause or in a JOIN condition will prevent the use of indexes on the column(s), if such indexes exist. This is because the raw value of the column is indexed, as opposed to the computed value.
Notice the above does not apply for a query like this:
SELECT id FROM atable WHERE inserted = CURDATE(); -- the raw value of "inserted" is used in the comparison
And yes, on top of that, the function will be executed for each and every row scanned.

The second query is running the date function on every row in the table, while the first query can just use the index to find the rows it needs. Thats where the biggest slowdown would be. Look at the rows column in the explain output

Related

Splitting the result in while loop

Here i use a query for getting last two collision occurring dates of a licence number as follows:
<?php
$licence=$_POST['licence'];
$sel=mysqli_query($con,"select cdate from tblcollision where licence_number='$licence'");
while($s=mysqli_fetch_row($sel))
{
echo $s[2];
}
?>
I wish to split the result in to separate fields. That means date1 in a text field and date2 in another text field. Actually i didn't get any idea. Please help me.
My db design is
+--------------------------------------------------+
| CID | ID | LICENCE_NUMBER | CDATE |
+--------------------------------------------------+
| 1 | 1 | 3/4858/2012 | 2018-02-06 |
| 2 | 1 | 3/4858/2012 | 2018-03-20 |
+--------------------------------------------------+
As stated by mickmackusa, if the query is filtered for a single licence number, there is no point to group result in the SQL statement and then explode the result in PHP. Just read the rows and make the calculation on the resulting dates.
However, in a more general way, if dealing with several licences, grouping can be done within the SQL statement. Despite the added cost of exploding the result, it will simplify the grouping process.
You need to use GROUP_CONCAT in your query:
SELECT licence_number, GROUP_CONCAT(cdate) AS Events_dates
FROM tblcollision
WHERE licence_number='$licence'
GROUP BY licence_number;
This will return:
3/4858/2012 | 2018-02-06, 2018-03-20

MySQL compare dates in a column to today

I have a table of around 6000 records with a date column amongst other columns which represent the deadline for a query. I need to compare the date in the column to todays date which I understand is done something like:
SELECT DATEDIFF(DATE_TO_COMPARE, CURDATE());
However, I then have another comlumn I want to set to that date difference. So for each date, I need to compare, insert the difference in the column difference_in_days, iterate to the next date and repeat.
I am also invoking this function whenever a certain page on my site is loaded using AJAX and PHP/PDO
My SQL knowledge isn't that extensive, how can I achieve this.
Table is kinda of like
field 1, field2, field 3, date_to_compare, field 4, field 5, difference_in_days
| | | | 2016-04-20 | | | |
| | | | 2016-04-25 | | | |
| | | | 2016-04-22 | | | |
| | | | 2016-04-27 | | | |
| | | | 2016-04-29 | | | |
Sonds like you want to do an update?
UPDATE table_name
SET difference_in_days = DATEDIFF(date_to_compare, CURDATE());
This will update every record in the table to the diff of the current date.
However, this will require you running the update every day, if you want that column to maintain relevance.
Alternative Approach:
If you're not querying this a lot, you may be better off using a view, which will update real-time every time you query it.
CREATE VIEW diff_view_name AS
SELECT *, DATEDIFF(date_to_compare, CURDATE()) AS difference_in_days
FROM table_name;
Then you could query it using:
SELECT * FROM diff_view_name;

MYSQL: Get next 'n' results

Right now I have a PHP script that is fetching the first three results from a MYSQL database using:
SELECT * FROM table Order by DATE DESC LIMIT 3;
After that command I wanted PHP to fetch the next three results, initially I was going to use:
SELECT * FROM table Order by DATE DESC LIMIT 3,3;
However there will be a delay between the two commands which means that it is very possible that a new row will be inserted into the table during the delay. My first thought was to store the DATE value of the last result and then include a WHERE DATE > $stored_date but if entry 3 and 4 have the same date it will skip entry 4 and return results from 5 onward. This could be avoided using the primary key field which is an integer which increments automatically.
I am not sure which the best approach is, but I feel like there should be a more elegant and robust solution to this problem, however I am struggling to think of it.
Example table:
-------------------------------------------
| PrimaryKey | Data | Date |
-------------------------------------------
| 0 | abc | 2014-06-17 11:43:00 |
| 1 | def | 2014-06-17 12:43:00 |
| 2 | ghi | 2014-06-17 13:43:00 |
| 3 | jkl | 2014-06-17 13:56:00 |
| 4 | mno | 2014-06-17 14:23:00 |
| 5 | pqr | 2014-06-17 14:43:00 |
| 6 | stu | 2014-06-17 15:43:00 |
-------------------------------------------
Where Data is the column that I want.
Best will be using primary key and select like
SELECT * FROM table WHERE pk < $stored_pk Order by DATE DESC LIMIT 3;
And if you have automatically generated PK you should use ORDER BY pk it will be faster
Two options I can think of depending on what your script does:
You could either use transactions: performing these queries inside a transaction will give you a consistent view of the data.
Alternatively you could just use:
SELECT * FROM table Order by DATE DESC;
And only fetch the results as you need them.

MySQL Occasionally Returns Wrong Value

This is a general question, one that I've been scratching my head on for a while now. My company's database handles about 2k rows a day. 99.9% of the time, we have no problem with the values that are returned in the different SELECT statements that are set up. However, on a very rare occasion, our database will "glitch" and return the value for a completely different row than what was requested.
This is a very basic example:
+---------+-------------------------+
| row_id | columnvalue |
+---------+-------------------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+---------+-------------------------+
SELECT columnvalue FROM table_name WHERE row_id = 1 LIMIT 1
Returns: 10
But on the very rare occasion, it may return: 20, or 30, etc.
I am completely baffled as to why it does this sometimes and would appreciate some insight on what appears to be a programming phenomena.
More specific information:
SELECT
USERID, CONCAT( LAST, ', ', FIRST ) AS NAME, COMPANYID
FROM users, companies
WHERE users.COMPANYCODE = companies.COMPANYCODE
AND USERID = 9739 LIMIT 1
mysql> DESCRIBE users;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| USERID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| FIRST | varchar(255)| NO | MUL | | |
| LAST | varchar(255)| NO | MUL | | |
+------------+-------------+------+-----+---------+----------------+
mysql> DESCRIBE companies;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| COMPANYID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| COMPANYNAME| varchar(255)| NO | | | |
+------------+-------------+------+-----+---------+----------------+
What the results were suppose to be: 9739, "L----, E----", 2197
What the results were instead: 9739, "L----, E----", 3288
Basically, it returned the wrong company id based off the join with companycode. Given the nature of our company, I can't share any more information than that.
I have run this query 5k times and have made very modification to the code imaginable in order to generate the second set of results and I have no been able to duplicate it. I'm not quick to blame MySQL -- this has been happening (though rarely) for over 8 years, and have exhausted all other possible causes. I have suspected the results were manually changed after the query was ran, but the timestamps states otherwise.
I'm just scratching my head as to why this can run perfectly 499k out of 500k times.
Now that we have a more realistic query, I notice right away that you are joining the tables, not on the primary key, but on the company code. Are we certain that the company code is being enforced as a unique index on companies? The Limit 1 would hide a second row if such a row was found.
From a design perspective, I would make the join on the primary key to avoid even the possibility of duplicate keys and put company code in as a unique indexed field for display and lookup only.
This behavior is either due to an incredibly unlikely SERIOUS bug in MySQL, -or- MySQL is returning a result that is valid at the time the statement is run, and there is some other software that is garfing up the displayed result.
One possibility to consider is that the row had been modified (by some other statement) at the time your SQL statement executed, and then the row was changed again later. (That's the most likely explanation we'd have for MySQL returning an unexpected result.)
The use of the LIMIT 1 clause is curious, because if the predicate uniquely identifies a row, there should be no need for the LIMIT 1, since the query is guaranteed to return no more than one row.
This leads me to suspect that row_id is not unique, and that the query actually returns more than one row. With the LIMIT clause, there is no guarantee as to which of the rows will get returned (absent an ORDER BY clause.)
Otherwise, the most likely culprit is out dated cache contents, or other problems in the code.
UPDATE
The previous answer was based on the example query given; I purposefully omitted the possibility that EMP was a view that was doing a JOIN, since the question originally said it was a table, and the example query showed just the one table.
Based on the new information in the question, I suggest that you OMIT the LIMIT 1 clause from the query. That will identify that the query is returning more than one row.
From the table definitions, we see that the database isn't enforcing a UNIQUE constraint on the COMPANYCODE column in the COMPANY table.
We also know there isn't a foreign key defined, due to the mismatch between the datatypes.
Normally, the foreign key would be defined referencing the PRIMARY KEY of the target table.
What we'd expect the users table to have a company_id column, which references the id (primary key) column in the companies table.
(We note the datatype of the companycode column (int) matches the datatype of the primary key column in the companies table, and we note that the join condition is matching on the companycode column, even though the datatypes do not match, which is very odd.)
There are several reasons this could happen. I suggest you look at the assumptions you're making. For example:
If you're using GROUP BY and one of the columns isn't an aggregate or the grouping expression, you're going to get an unpredictable value in that column. Make sure you use an appropriate aggregation (such as MAX or MIN) to get a predictable result on each column.
If you're assuming a row order without making it explicit, and using LIMIT to get only the first row, the actual returned order of rows differs depending on that result's execution plan, which is going to differ in large resultsets based on the statistics available to the optimiser. Make sure you use ORDER BY in such situations.

WHERE vs HAVING in generated queries

I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.

Categories