PHP/MYSQL query execution time optimisation - php

I am having a great deal of difficulty with this set of queries. I cannot find a way to speed up the second query at all. I've tried joining the the first query as a subquery on the second one, running through the first query results one by one and the current setup, all of which have proven extremely slow.
I would like to simply add a limit to the second query instead of this hocky stuff but our mysql version is too old to support it. For some reason it is also treating EXPLAINas a syntax error, which is unhelpful.
How can I reduce the execution time of this?
$limitQuery = $pdo->prepare("
SELECT r.supplier_option_code FROM third_party_raw_stock_price AS r ORDER BY r.id LIMIT 100
");
$limitQuery->execute();
$limitIds = $limitQuery->fetchAll();
$limitIds = implode("', '",array_column($limitIds, 'supplier_option_code'));
$limitQuery = null;
$linkColumn = 'supplier_code';
$thirdPartyId = 'FS';
$migrateQuery = $pdo->prepare("
UPDATE third_party_raw_stock_price AS r
JOIN options_new AS o
ON o.".$linkColumn." = r.supplier_option_code AND r.supplier_prefix = '".$thirdPartyId."'
JOIN third_party_config AS t
ON t.code = '".$thirdPartyId.""'
SET o.price = '989.99', o.cost_price_variation = '3.33', o.stock = '7'
WHERE r.supplier_option_code IN ('$limitIds')
");
$migrateQuery->execute([':config' => $thirdPartyId]);

Related

PHP query returning only one row out of four

This query was supposed to return me four rows: which are four people with status 50 (which, in the application means "maternity leave"). But it returns only one.
On HeidiSQL the query doesn't even run because it displays a
syntax error on line 13:
(...)
corresponds to your MariaDB server version for the right syntax to use near 'a.id_regiao = '$id_regiao'
AND a.cod_status = 50
AND a.status' at line 13 */"
Here is the query. I'm slowly becoming familiar with sql statements and i did search a lot on SO before asking it:
//SELECTING PROJECT DATA
$query = "SELECT b.id_clt,b.nome AS nome_clt,
a.id_evento AS a_id_evento,a.data AS a_data,a.data_retorno AS a_data_retorno,
c.id_evento AS c_id_evento,c.data AS c_data,c.data_retorno AS c_data_retorno,
(SELECT nome FROM projeto WHERE id_projeto = a.id_projeto) AS nome_projeto,
(SELECT nome FROM curso WHERE id_curso = b.id_curso) AS nome_curso,
DATE_FORMAT(a.data,'%d/%m/%Y') AS a_data_br,
DATE_FORMAT(a.data_retorno,'%d/%m/%Y') AS a_data_retorno_br,
DATE_FORMAT(c.data,'%d/%m/%Y') AS c_data_br,
DATE_FORMAT(c.data_retorno,'%d/%m/%Y') AS c_data_retorno_br
FROM rh_eventos AS a
INNER JOIN rh_clt AS b ON (a.id_clt = b.id_clt AND a.cod_status = 50)
LEFT JOIN rh_eventos AS c ON (b.id_clt = c.id_clt AND c.cod_status = 54)
WHERE $cond_projeto a.id_regiao = '$id_regiao'
AND a.cod_status = 50
AND a.status = 1
AND NOW() BETWEEN a.data AND a.data_retorno
ORDER BY nome_projeto,b.nome;";
The problem is here in the query:
WHERE $cond_projeto a.id_regiao = '$id_regiao'
This inserts a variable (or maybe a full test?) without proper syntax. If it is a variable, include the table's column name in the criterium. If it is a full test, include AND like so:
WHERE $cond_projeto AND a.id_regiao = '$id_regiao'
Beware though! Use prepared statements, your code now appears to be vulnerable to SQL injection attacks (and those are not to be trifled with).
Here is the query, (as seen by using an echo before it) . I can see the output on heidsql now. Now its better for us to check it:
SELECT b.id_clt,b.nome AS nome_clt,
a.id_evento AS a_id_evento,a.data AS a_data,a.data_retorno AS a_data_retorno,
c.id_evento AS c_id_evento,c.data AS c_data,c.data_retorno AS c_data_retorno,
(SELECT nome FROM projeto WHERE id_projeto = a.id_projeto) AS nome_projeto,
(SELECT nome FROM curso WHERE id_curso = b.id_curso) AS nome_curso,
DATE_FORMAT(a.data,'%d/%m/%Y') AS a_data_br,
DATE_FORMAT(a.data_retorno,'%d/%m/%Y') AS a_data_retorno_br,
DATE_FORMAT(c.data,'%d/%m/%Y') AS c_data_br,
DATE_FORMAT(c.data_retorno,'%d/%m/%Y') AS c_data_retorno_br
FROM rh_eventos AS a
INNER JOIN rh_clt AS b ON (a.id_clt = b.id_clt AND a.cod_status = 50)
LEFT JOIN rh_eventos AS c ON (b.id_clt = c.id_clt AND c.cod_status = 54)
WHERE a.id_regiao = '1' AND a.cod_status = 50
AND a.status = 1
AND NOW() BETWEEN a.data AND a.data_retorno ORDER BY nome_projeto,b.nome;
I can now see the output on heidsql, though i still cant figure out why it doesent bring the other thre rows.

Have PHP count rows with a query that is already using count(*)?

I have a subquery and I want to count the rows in PHP for MySQL. I am trying to fix old code and know that PDO is better and more secure and we will eventually rewrite all this code, but for now I need to just make it work. My problem is figuring out the command for the $total_employees to count the rows. This number will be used in a formula later. Is there a way to do it as 2 subqueries or rewriting it in the SQL statement other than just using php and mysql_fetch_row? I am trying to avoid multiple while loops. This is condensed from a bigger query for easier viewing.
while($rows=mysql_fetch_array($sqls)){
$cycle_id = $rows[cycle_id];
$sqls=("select subb.sqlcal AS sqlcalemp from
(select count(*) as sqlcal from dialogue_employees d_e,
dialogue_leaders d_l where
d_l.leader_group_id = d_e.leader_group_id and
d_l.cycle_id = $cycle_id) as subb");
$total_employees += $rows[sqlcalsemp];
This was the older code that worked before trying to update it:
while($rows=mysql_fetch_array($sqls)){
$cycle_id = $rows[cycle_id];
$sqlcalcemp=mysql_query("select count(*) from dialogue_employees d_e,
dialogue_leaders d_l where
d_l.leader_group_id = d_e.leader_group_id and
d_l.cycle_id = $cycle_id") or die(mysql_error());
$rowtotal = mysql_fetch_row($sqlcalcemp);
$total_employees += $rowtotal[0];
your looping through and looking at each cycle_id...
maybe try something like this to grab all counts for each cycle_id at once
select SUM(d_e.leader_group_id IS NOT NULL) as sqlcalemp,d_l.cycle_id
from dialogue_leaders d_l
left join dialogue_employees d_e
on d_l.leader_group_id = d_e.leader_group_id
group by d_l.cycle_id
http://sqlfiddle.com/#!9/4995b/4

Speeding up my queries in PHP

I'm working on trying to speed up a webpage I have created. I know the issue is that I have a query within a query. I feel like there has to be a quicker way to accomplish the same results, but I'm running out of ideas. (My first attempt at this took 45 seconds for the page to load, now I'm down to about 6)
What I'm trying to do is pull run rate information from tables. I need to pull the correct startup and end of run rates from the runrate table, but all I have to go off of initially is the workcenter ID.
I feel like if the tables were set up a little bit better then it probably would've have been so difficult, but it's what I inherited and as a result I'm a bit stuck. I need to pull a month worth of data from each workcenter (about 15) where there can be as many as 4-5 runs each day... Quite a bit of data to process.
Here's the PHP code:
$qtotalStartup = mysql_query("
SELECT startup.recordID, startup.date, startup.time, runrate.rate AS temRate, runrate.formID
FROM jos_a_inproc_startup startup JOIN jos_a_runrate runrate ON startup.recordID = runrate.recordID
WHERE startup.workcenterId = $id AND runrate.rate > 0 AND runrate.formID = 1 AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)") or die(mysql_error());
$totalStartCtr = mysql_num_rows($qtotalStartup);
if ($totalStartCtr > 0) {
while($rtotalStartup = mysql_fetch_assoc($qtotalStartup)) {
$hours = 0;
$goalRate = 0;
$sumHrRR = 0;
$startDate = 0;
$startTime = 0;
$startupNum = $rtotalStartup['recordID'];
$goalRate = $rtotalStartup['temRate'];
$startDate = $rtotalStartup['date'];
$startTime = $rtotalStartup['time'];
$startTime = strtotime($startDate . ' ' . $startTime);
//now that we have all of the startup form info, we can move to the end of run information
//this query will retrieve the correct date, time, and ending run rate for us to use with our calculations.
$qtotalEOR = mysql_query("
SELECT eor.recordID AS eorRec, eor.date, eor.time, eor.startupid, runrate1.rate AS tempRate, runrate1.formID
FROM jos_a_inproc_eor eor JOIN jos_a_runrate runrate1 ON eor.recordID = runrate1.recordID
WHERE eor.startupid = $startupNum AND runrate1.rate > 0 AND runrate1.formID = 3") or die(mysql_error());
$totalEORCtr = mysql_num_rows($qtotalEOR);
if ($totalEORCtr > 0) {
while($rtotalEOR = mysql_fetch_assoc($qtotalEOR)) {
//reset the accumulator to 0 so we don't get extra 'bad' data.
$sumHrRR = 0;
$newGoalRate = 0;
$lastestDate = 0;
$latestTime = 0;
$eorNum = $rtotalEOR['eorRec'];
$latestDate = $rtotalEOR['date'];
$latestTime = $rtotalEOR['time'];
$latestTime = strtotime($latestDate . ' ' . $latestTime);
$sumHrRR= $rtotalEOR['tempRate'];
Any ideas would be greatly appreciated. I know it may be difficult to understand what I'm trying to get at without much more information, so let me know if you need to know anything else. Thanks.
Maby try using multiple JOINS like this one:
SELECT startup.recordID, startup.date, startup.time,
runrate.rate AS temRate, runrate.formID
-- stuff from second query
eor.recordID AS eorRec, eor.date AS eor_date,
eor.time AS eor_time, eor.startupid AS eor_startupid,
runrate1.rate AS eor_tempRate,
runrate1.formID AS runrate1_formID
FROM jos_a_inproc_startup startup
JOIN jos_a_runrate runrate ON startup.recordID = runrate.recordID
-- second query LEFT JOIN
LEFT JOIN jos_a_inproc_eor eor
ON eor.startupid = startup.recordID
LEFT JOIN jos_a_runrate runrate1
ON eor.recordID = runrate1.recordID
AND runrate1.rate > 0
AND runrate1.formID = 3
WHERE startup.workcenterId = $id
AND runrate.rate > 0
AND runrate.formID = 1
AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)
I don't know if I'm right but I think that you are also doing some aggregation work with results inside PHP. You could do it inside database using like sum() or avg() and GROUP BY. You will save some time when transfering smaller result set from database to server and time for looping and aggregating inside PHP. Also most of the time using JOIN is much faster than using queries in loop or even subqueries inside query.
You should also check if indexes are set on columns you search in. Also use EXPLAIN to check how query is executed.
you can use Mem-Cache techniques to make it much faster ,and try to make your queries the simpler that u can .. dont retrieve values that you dont use in your scripts ..
How many records are you typically dealing with as output? How big are the tables? Have you reviewed the indexes? Have you analyzed them recently (rebuilt them)?
Also, are you sending the data back to the browser using deflate? See:
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
Well, you could try using multiple INNER JOINs (see) and have only one query instead of one query inside a query, which greatly impacts on performance. You could try something like this, and tweaking it a little:
SELECT
startup.recordID AS startupRecordID,
startup.date AS startupDate,
startup.time AS startupTime,
runrate.rate,
runrate.formID,
eor.recordID AS eorRecordID,
eor.date AS eorDate,
eor.time AS eorTime,
eor.startupid AS eorStartupID
FROM jos_a_inproc_startup startup
INNER JOIN jos_a_runrate runrate
ON startup.recordID = runrate.recordID
INNER JOIN jos_a_inproc_eor eor
ON startup.recordID = eor.startupid
WHERE
startup.workcenterId = $id
AND runrate.rate > 0
AND runrate.formID = 1
AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)

How To Optimize PostgreSQL generate_series function

I have a query that uses PostgreSQL generate_series function but when it comes to large amounts of data, the query can be slow. An example of code the generates the query is below:
$yesterday = date('Y-m-d',(strtotime ( '-1 day' ) ));
$query = "
WITH interval_step AS (
SELECT gs::date AS interval_dt, random() AS r
FROM generate_series('$yesterday'::timestamp, '2015-01-01', '1 day') AS gs)
SELECT articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
FROM development.articles JOIN interval_step ON articles.article_date_added::date=interval_step.interval_dt ";
if (isset($this -> registry -> get['category'])) {
$query .= "
JOIN development.feed_articles ON articles.article_id = feed_articles.article_id
JOIN development.rss_feeds ON feed_articles.rss_feed_id = rss_feeds.rss_feed_id
JOIN development.news_categories ON rss_feeds.news_category_id = news_categories.news_category_id
WHERE news_category_name = $1";
$params = array($category_name);
$query_name = 'browse_category';
}
$query .= " ORDER BY interval_step.interval_dt DESC, RANDOM() LIMIT 20;";
This series looks for only content that goes one day back and sorts the results in random order. My question is what are was that generate_series can be optimized to improve performance?
You don't need that generate_series at all. And do not concatenate query strings. Avoid it by making the parameter an empty string (or null) if it is not set:
if (!isset($this -> registry -> get['category']))
$category_name = '';
$query = "
select articles.article_id, article_title, article_excerpt, article_author, article_link, article_default_image, article_date_published, article_bias_avg, article_rating_avg
from
development.articles
inner join
development.feed_articles using (article_id)
inner join
development.rss_feeds using (rss_feed_id)
inner join
development.news_categories using (news_category_id)
where
(news_category_name = $1 or $1 = '')
and articles.article_date_added >= current_date - 1
order by
date_trunc('day', articles.article_date_added) desc,
random()
limit 20;
";
$params = array($category_name);
Passing $yesterday to the query is also not necessary as it can be done entirely in SQL.
If $category_name is empty it will return all categories:
(news_category_name = $1 or $1 = '')
Imho, try removing that random() in your order by statement. It probably has a much larger performance impact than you think. As things are it's probably ordering the entire set by interval_dt desc, random(), and then picking the top 20. Not advisable...
Try fetching e.g. 100 rows ordered by interval_dt desc instead, then shuffle them per the same logic, and pick 20 in your app. Or wrap the entire thing in a subquery limit 100, and re-order accordingly along the same lines.

More efficient way to do SQL queries

I've been using the below php and sql for loading schedule information and real time information for passenger trains in the UK. Essentially you have to find the relevant schedules, and then load the realtime information for each schedule which is in a different table relating to todays trains.
The query is taking a little longer than is really idea and using lots of CPU% which again isn''t ideal. I'm pretty weak when it comes to sql programming so any pointers as to what is inefficient would be great.
This is for an android app and so i've tried to all with one call over http. The prints(*) and > is for splitting the string at the other end.
Here is the code:
<?
//Connect to the database
mysql_connect("localhost","XXXX","XXXX")
or die ("No connection could be made to the OpenRail Database");
mysql_select_db("autotrain");
//Set todays date from system and get HTTP parameters for the station,time to find trains and todays locations table.
$date = date('Y-m-d');
$test = $_GET['station'];
$time = $_GET['time'];
$table = $_GET['table'];
//Find the tiploc associated with the station being searched.
$tiplocQuery = "SELECT tiploc_code FROM allstations WHERE c LIKE '$test';";
$tiplocResult =mysql_query($tiplocQuery);
$tiplocRow = mysql_fetch_assoc($tiplocResult);
$tiploc=$tiplocRow['tiploc_code'];
//Now find the timetabled trains for the station where there exists no departure information. Goes back two hours to account for any late running.
$timeTableQuery = "SELECT tiplocs.tps_description AS 'C', locations$table.public_departure, locations$table.id,schedules.stp_indicator
,schedules.train_uid
FROM locations$table, tiplocs, schedules_cache, schedules,activations
WHERE locations$table.id = schedules_cache.id
AND schedules_cache.id = schedules.id
AND schedules.id =activations.id
AND '$date'
BETWEEN schedules.date_from
AND schedules.date_to
AND locations$table.tiploc_code = '$tiploc'
AND locations$table.real_departure LIKE '0'
AND locations$table.public_departure NOT LIKE '0'
AND locations$table.public_departure >='$time'-300
AND locations$table.public_departure <='$time'+300
AND schedules.runs_th LIKE '1'
AND schedules_cache.destination = tiplocs.tiploc
ORDER BY locations$table.public_departure ASC
LIMIT 0,30;";
$timeTableResult=mysql_query($timeTableQuery);
while($timeTablerow = mysql_fetch_assoc($timeTableResult)){
$output[] = $timeTablerow;
}
//Now for each id returned in the timetable, get the locations and departure times so the app may calculate expected arrival times.
foreach ($output as $value) {
$id = $value['id'];
$realTimeQuery ="SELECT locations$table.id,locations$table.location_order,locations$table.arrival,locations$table.public_arrival,
locations$table.real_arrival,locations$table.pass,locations$table.departure,locations$ table.public_departure,locations$table.real_departure,locations$table.location_cancelled,
tiplocs.tps_description FROM locations$table,tiplocs WHERE id =$id AND locations$table.tiploc_code=tiplocs.tiploc;";
$realTimeResult =mysql_query($realTimeQuery);
while($row3 = mysql_fetch_assoc($realTimeResult)){
$output3[] = $row3;
}
print json_encode($output3);
print("*");
unset($output3);
unset($id);
}
print('>');
print json_encode($output);
?>
Many Thanks
Matt
The biggest issue with your setup is this foreach loop because it is unnecessary and results in n number of round trips to the database to execute a query, fetch and analyze the results.
foreach ($output as $value) {
Rewrite the initial query to include all of the fields you will need to do your later calculations.
Something like this would work.
SELECT tl.tps_description AS 'C', lc.public_departure, lc.id, s.stp_indicator, s.train_uid,
lc.id, lc.location_order, lc.arrival, lc.public_arrival, lc.real_arrival, lc.pass, lc.departure, lc.real_departure, lc.location_cancelled
FROM locations$table lc INNER JOIN schedules_cache sc ON lc.id = sc.id
INNER JOIN schedules s ON s.id = sc.id
INNER JOIN activations a ON s.id = a.id
INNER JOIN tiplocs tl ON sc.destination = tl.tiploc
WHERE '$date' BETWEEN schedules.date_from AND schedules.date_to
AND lc.tiploc_code = '$tiploc'
AND lc.real_departure LIKE '0'
AND lc.public_departure NOT LIKE '0'
AND lc.public_departure >='$time'-300
AND lc.public_departure <='$time'+300
AND s.runs_th LIKE '1'
ORDER BY lc.public_departure ASC
LIMIT 0,30;
Eliminating n query executions from your page load should dramatically increase response time.
Ignoring the problems with the code, in order to speed up your query, use the EXPLAIN command to evaluate where you need to add indexes to your query.
At a guess, you probably will want to create an index on whatever locations$table.public_departure evaluates to.
http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
A few things I noticed.
First, you are joining tables in the where clause, like this
from table1, table2
where table1.something - table2.something
Joining in the from clause is faster
from table1 join table2 on table1.something - table2.something
Next, I'm not a php programmer, but it looks like you are running similar queries inside a loop. If that's true, look for a way to run just one query.
Edit starts here
This is in response to gazarsgo's that I back up by claim about joins in the where clause being faster. He is right, I was wrong. This is what I did. The programming language is ColdFusion:
<cfsetting showdebugoutput="no">
<cfscript>
fromtimes = ArrayNew(1);
wheretimes = ArrayNew(1);
</cfscript>
<cfloop from="1" to="1000" index="idx">
<cfquery datasource="burns" name="fromclause" result="fromresult">
select count(distinct hscnumber)
from burns_patient p join burns_case c on p.patientid = c.patientid
</cfquery>
<cfset ArrayAppend(fromtimes, fromresult.executiontime)>
<cfquery datasource="burns" name="whereclause" result="whereresult">
select count(distinct hscnumber)
from burns_patient p, burns_case c
where p.patientid = c.patientid
</cfquery>
<cfset ArrayAppend(wheretimes, whereresult.executiontime)>
</cfloop>
<cfdump var="#ArrayAvg(fromtimes)#" metainfo="no" label="from">
<cfdump var="#ArrayAvg(wheretimes)#" metainfo="no" label="where">
I did ran it 5 times. The results, in milliseconds, follow.
9.563 9.611
9.498 9.584
9.625 9.548
9.831 9.769
9.792 9.813
The first number represents joining in the from clause, the second joining in the where clause. The first number is lower only 60% of the time. Had it been lower 100% percent of the time, it would have shown that joining in the from clause is faster, but that' not the case.

Categories