SQL query optimization - multiple query or DAYOFYEAR()?

SQL query optimization - multiple query or DAYOFYEAR()? - php

I need to run queries with several conditions which will result large dataset. Whereas all the conditions are straight forward, I need advice regarding 2 issues in terms of speedoptimization:
1) If I need to run those queries between 1st Apr till 20th June of each year for last 10 years, I have 2 options in my knowledge:
a. Run the query 10 times
$year = 2015;
$start_month_date = "-04-01";
$end_month_date = "-06-20";
for($i=0;$i<10;$i++){
$start = $year.$start_month_date;
$end = $year.$start_month_date;
$result = mysql_query("....... WHERE .... AND `event_date` BETWEEN $start AND $end");
// PUSH THE RESULT TO AN ARRAY
$year = $year - 1;
}
b. Run the query single time, however query will compare by DayOfYear (hence each date has to be converted to DayOfYear by the query)
$start = Date("z", strtotime("2015-04-01")) + 1;
$end = Date("z", strtotime("2015-06-20")) + 1;
$result = mysql_query("....... WHERE .... AND DAYOFYEAR(`event_date`) BETWEEN $start AND $end");
I am aware of the 1 day difference in day count for leap year with other years, but I can live with that. I am sensing 1.b is more optimized, just want to verify.
2) I have a large query with 2 sub query. When I want to limit the result by date, I should put the conditions inside or outside the sub query?
a. Inside sub query means it has to validate the condition twice
SELECT X.a,X.b,Y.c FROM
(SELECT * FROM mytable WHERE `event_date` BETWEEN '$startdate' AND '$enddate' AND `case` = 'AAA' AND .......) X
(SELECT * FROM mytable WHERE `event_date` BETWEEN '$startdate' AND '$enddate' AND `case` = 'BBB' AND .......) Y
WHERE X.`event_date` = Y.`event_date` AND ........... ORDER BY `event_date`
b. Outside sub query means it will validate once, but has to join a larger dataset (for which I need to set SQL_BIG_SELECTS = 1)
SELECT X.a,X.b,Y.c FROM
(SELECT * FROM mytable WHERE `case` = 'AAA' AND .......) X
(SELECT * FROM mytable WHERE `case` = 'BBB' AND .......) Y
WHERE X.`event_date` = Y.`event_date` AND X.`event_date` BETWEEN '$startdate' AND '$enddate' AND ........... ORDER BY `event_date`
Again, in my opinion 2.a is more optimized, but requesting your advise.
Thanks

(1) Running the queries 10 times with event_date BETWEEN $start AND $end will be faster when the SQL engine can take advantage of an index on event_date. This could be significant, but it depends on the rest of the query.
Also, because you are ordering the entire data set, running 10 queries is likely to be a bit faster. That's because sorting is O(n log(n)), meaning that it takes longer to sort larger data sets. As an example, sorting 100 rows might take X time units. Sorting 1000 rows might take X * 10 * log(10) time units. But, sorting 100 rows 10 times takes just X * 10 (this is for explanatory purposes).
(2) Don't use subqueries if you can avoid them in MySQL. The subqueries are materialized, which adds additional overhead. Plus, they then prevent the use of indexes. If you need to use subqueries, filter the data as much as possible in the subquery. This reduces the data that needs to be stored.

I assume you have lots rows over 10 years otherwise that wouldn't be much of an issue.
Now the best bet is to do a couple explain on the different queries you plan to use, that will probably tell you which index it can use as currently we don't know them (you didn't post the structure of the table)
1.b. use a function in where clause so it will be terrible as it won't be able to use index for date (assuming there is one). So this will read the entire table
One thing that you could do, is ask the database to join the resultset of the 10 queries together using UNION. Mysql would join the result instead of php... (see https://dev.mysql.com/doc/refman/5.0/en/union.html)
2 - As gordon said, filter data as much as possible. However instead of trying option blindly you can use EXPLAIN and the database will help you decide which one make the most sense.

Related

MySQL PHP Order by Random for a certain time?

For example, say if you wanted a random result every 10 minutes. Is there a way to achieve this with ORDER BY RAND()?
$fetch = mysqli_query($conn, "
SELECT *
FROM food
JOIN food_images ON food.size = food_images.size
ORDER BY RAND()
");
I also am using a JOIN and worried if this might affect the answers. Thank you!

I don't have a MySQL server in front of me so most of this is a guess, but you might try as follows:
You can generate a number that changes only once every ten minutes by taking the system time in seconds, dividing by the number of seconds in ten minutes, and then casting to an integer:
$seed = (int) (time() / 600);
Then pass this value to MySQL's RAND() function as a parameter to seed the RNG, and you should get a repeatable sequence that changes every ten minutes:
$stmt = mysqli_prepare($conn, 'SELECT ... ORDER BY RAND(?)');
mysqli_stmt_bind_param($stmt, 'i', $seed);

You can do it as:
SELECT *, rand(time_to_sec(current_time()) / 600) as ord
FROM food
JOIN food_images ON food.size = food_images.size
order by ord
The parameter of the RAND() function is the seed. The expression in it, changes only every 10 minutes.

You can use MySQL Event Scheduler and as described in the documentation:
you are creating a named database object containing one or more SQL statements to be executed at one or more regular intervals, beginning and ending at a specific date and time
And I guess you are using php so You can use PHP Cron jobs too , Managing Cron Jobs With PHP

Using MYSQL, SELECT every nth row from a subquery SELECT

I have a table of over 300,000 rows and I would like to render this data on a graph, but 300,000 rows isn't really necessary all at once. For example, even though there may be 100 rows of data for a given day, I don't need to display all that data if I'm showing a whole year worth of data. So I would like to "granularize" the data.
I was thinking of getting everything and then using a script to remove what I don't need, but that seems like it would be much slower and harder on the database.
So here's what I have so far.
SET #row_number := 0;
SELECT #row_number := #row_number + 1 as row_number,
price, region, timestamp as row_number FROM pricehistory;
This gives me all the rows and numbers them. I was planning on adding a where clause to get every 1000 rows (i.e. every nth row) like this
SET #row_number := 0;
SELECT #row_number := #row_number + 1 as row_number,
price, region, timestamp as row_number FROM pricehistory
WHERE row_number % 1000 = 0;
But MYSQL doesn't see row_number as a column for some reason. Any ideas? I've looked at other solutions online, but they don't seem to work for MYSQL in particular.

As Racil's comment suggested, you can just go by an auto-incremented id field if you have one; but you've stated the amount of data for different dates could be different, so this could make for a very distorted graph. If you select every 1000th record for a year and half the rows are from the last 3 months ("holiday shopping" for a commerce example), the latter half of a year graph will actually reflect the latter quarter of the year. For more useful results you're most likely better off with something like this:
SELECT region, DATE(timestamp) AS theDate
, AVG(price), MIN(price), MAX(price)
FROM pricehistory
GROUP BY region, theDate
;

It doesn't look like I'm going to get another answer so I'll go ahead and write the solution I came up with.
My data is pretty evenly distributed as it grabs prices at regular intervals so there's no reason to worry about that.
Here's my solution.
Let's say I have 500,000 rows and I want to display a subset of those rows let's say 5000 rows. 500000/5000 is 100 so I take 100 and use it in my select statement like this SELECT * FROM pricehistory where id % 100 = 0;
Here is the actual code
public function getScaleFactor($startDate, $endDate) {
$numPricePoints = $this->getNumPricePointsBetweenDates($startDate, $endDate);
$scaleFactor = 1;
if ($numPricePoints > $this->desiredNumPricePoints) {
$scaleFactor = floor($numPricePoints / $this->desiredNumPricePoints);
}
return $scaleFactor;
}
I then use $scaleFactor in the SQL like this SELECT * FROM pricehistory WHERE id % {$scaleFactor} = 0;
This isn't a perfect solution because you don't always end up with 5000 rows exactly, but I don't NEED exactly 5000 rows. I'm just trying to reduce the resolution of the data while still getting a graph that looks close to what it would be had I used all 500,000 rows.

Finding Interval of a data present on latest 2 dates

I'm developing a web-based tool that can help analyze number intervals that occurs in a 6-digit lottery.
Let us focus on a certain number first. Say 7
The sql query I've done so far:
SELECT * FROM `l642` WHERE `1d`=7 OR `2d`=7 OR `3d`=7 OR `4d`=7 OR `5d`=7
OR `6d`=7 ORDER BY `draw_date` DESC LIMIT 2
This will pull the last two latest dates where number 7 is present
I'm thinking of using DATEDIFF but I'm confused on how to get the previous value to subtract it on the latest draw_date
My goal is to list the intervals of numbers 1-42 and I'll plan to accomplish it using PHP.
Looking forward to your help

A few ideas spring to mind.
(1) First, since you perfectly have your result set ordered, use PHP loop on the two rows getting $date1 =$row['draw_date']. Then fetch next/last row and set $date2 =$row['draw_date']. With these two you have
$diff=date_diff($date1,$date2);
as the difference in days.
(2)
A second way is to have mysql return datediff by including a rownumber in the resultset and doing a self-join with aliases say alias a for row1 and alias b for row2.
datediff(a.draw_date,b.drawdate).
How one goes about getting rownumber could be either:
(2a) rownumber found here: With MySQL, how can I generate a column containing the record index in a table?
(2b) worktable with id int auto_increment primary key column with select into from your shown LIMIT 2 query (and a truncate table worktable between iterations 1 to 42) to reset auto_increment to 0.
The entire thing could be wrapped with an outer table 1 to 42 where 42 rows are brought back with 2 columns (num, number_of_days), but that wasn't your question.
So considering how infrequent you are probably doing this, I would probably recommend not over-engineering it and would shoot for #1

MySql query execution takes for ever from PHP but finishes soon when run from phpMyAdmin

I have a sql querying a MySql table having millions of records. This gets executed in phpMyAdmin in around 2 secs but when run from PHP script, it doesn't complete executing.
select
concat(p1.`Date`," ",p1.`Time`) as har_date_from,
concat(p2.`Date`," ",p2.`Time`) as har_date_to,
(select concat(p3.`Date`," ",p3.`Time`) from
power_logger p3
where p3.slno between 1851219 and 2042099
and p3.meter_id="logger1"
and str_to_date(concat(p3.`Date`," ",p3.`Time`),"%d/%m/%Y %H:%i:%s") >=
str_to_date(concat(p1.`Date`," ",p1.`Time`),"%d/%m/%Y %H:%i:%s")
order by p3.slno limit 1) as cur_date_from,
(select concat(p4.`Date`," ",p4.`Time`) from
power_logger p4
where p4.slno between 1851219
and 2042099
and p4.meter_id="logger1"
and str_to_date(concat(p4.`Date`," ",p4.`Time`),"%d/%m/%Y %H:%i:%s") >=
str_to_date(concat(p2.`Date`," ",p2.`Time`),"%d/%m/%Y %H:%i:%s")
order by p4.slno
limit 1
)
as cur_date_to,
p1.THD_A_N_Avg-p2.THD_A_N_Avg as thd_diff
from power_logger p1
join
power_logger p2
on p2.slno=p1.slno+1
and p1.meter_id="fluke1"
and p2.meter_id=p1.meter_id
and p1.slno between 2058609 and 2062310
and p1.THD_A_N_Avg-p2.THD_A_N_Avg>=2.0000
php script:
$query=/*The query above passed as string*/
$mysql=mysql_connect('localhost','username','pwd') or die(mysql_error());
mysql_select_db('dbname',$mysql);
$rows=mysql_query($query,$mysql) or die(mysql_error());
There are no issues in mysql connectivity and related stuffs, as I run a lot of other queries successfully. I have set indexes on meter_id and Date,Time together. slno is the auto increment value.
I know there are similar questions asked as I found a lot from my research but none of them really helped me. Thanks in advance if anybody could help me out to find a solution.
Query Description:This queries the power_logger table containing millions of records and THD_A_N_AVG, meter_id,slno,Date and Time are among the columns of the table. This selects The date and time from two consecutive rows with in a range of slnos where difference between THD_A_N_AVG is greater than or equal to 2. When those dates are fetched, it even has to fetch the date and time with in a different range of slnos where the date and time are the closest to the once fetched earlier thus forming har_date_from,har_date_to, cur_date_from,cur_date_to.
What messes up here is the nested select.

Usually PHPMyAdmin automatically adds "LIMIT 0, 30" at the end of the query, so you only load 30 rows at once. In your code you are trying to load everything at once, that's why it's taking so long.

(PHP) MySQL random rows big table with order by and certain range

I have this table:
person_id int(10) pk
fid bigint(20) unique
points int(6) index
birthday date index
4 FK columns int(6)
ENGINE = MyISAM
Important info: the table contains over 8 million rows and is fast growing (1.5M a day at the moment)
What I want: to select 4 random rows in a certain range when I order the table on points
How I do it now: In PHP I randomize a certain range, let's say this gives me 20% as low range and 30% as high range. Next I count(*) the number of rows in table. After I determine the lowest row number: table count / 100 * low range. Same for high range. After I calculate a random row by using rand(lowest_row, highest_row), which gives me a row number within the range. And at last I select the random row by doing:
SELECT * FROM `persons` WHERE points > 0 ORDER BY points desc LIMIT $random_offset, 1;
The points > 0 is in the query since I only want randoms with at least 1 point.
Above query takes about 1.5 seconds to run, but since I need 4 rows it takes over 6 seconds, which is too slow for me. I figured the order by points takes the most time, so I was thinking about making a VIEW of the table, but I have really no experience with views, so what do you think? Is a view a good option or are there better solutions?
ADDED:
I forgot to say that it is important that all rows has the same chance of being selected.
Thanks, I appreciate all the help! :)
Kevin

Your query is so slow, and will become exponentially slower, because using LIMIT here forces it to do a full table sort, and then a full table scan, to get the result. Instead you should do this on the PHP end of things as well (this kind of 'abuse' of LIMIT is actually the reason it's non-standard SQL and for example MSSQL and Oracle do not support it).
First ensure there's an index on points. This will make select max(points), min(points) from persons a query that'll return instantly. Next you can determine from those 2 results the points range, and use rand() to determine 4 points in the requested range. Then repeat for each result:
SELECT * FROM persons WHERE points < $myValue ORDER BY points DESC LIMIT 1
Since it only has to retrieve one row, and can determine which one via the index, this'll be in the milliseconds execution time as well.

Views aren't going to do anything to help your performance here. My suggestion would be to simply run:
SELECT * FROM `persons` WHERE points BETWEEN ? AND ?
Make sure you have an index on points. Also, you SHOULD replace * with only the fields you are concerned about if applicable. Here is course ? represents the upper and lower bounds for your search.
You can then determine the number of rows returned in the result set using mysqli_num_rows() (or similar based on your DB library of choice).
You now have the total number of rows that meet your criteria. You can easily then calculate 4 random numbers within the range of results and use mysqli_data_seek() or similar to go directly to the record at the random offset and get the values you want from it.
Putting it all together:
$result = mysqli_query($db_conn, $sql); // here $sql is your SQL query
$num_records = 4; // your number of records to return
$num_rows = mysqli_num_rows($result);
$rows = array();
while ($i = 0; $i < $num_records; $i++) {
$random_offset = rand(0, $num_rows - 1);
mysqli_data_seek($result, $random_offset);
$rows[] = mysqli_fetch_object($result);
}
mysqli_free_result($result);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

SQL query optimization - multiple query or DAYOFYEAR()? - php

Related

MySQL PHP Order by Random for a certain time?

Using MYSQL, SELECT every nth row from a subquery SELECT

Finding Interval of a data present on latest 2 dates

MySql query execution takes for ever from PHP but finishes soon when run from phpMyAdmin

(PHP) MySQL random rows big table with order by and certain range

Categories

Resources