How to optimise handle of big data on laravel? - php

My task is:
"To take transactions table, grouped row by transaction date and calculate statuses. This manipulations will be formed statistics, wich will be rendered on the page".
This is my method of this statistics generation
public static function getStatistics(Website $website = null)
{
if($website == null) return [];
$query = \DB::table('transactions')->where("website_id", $website->id)->orderBy("dt", "desc")->get();
$transitions = collect(static::convertDate($query))->groupBy("dt");
$statistics = collect();
dd($transitions);
foreach ($transitions as $date => $trans) {
$subscriptions = $trans->where("status", 'subscribe')->count();
$unsubscriptions = $trans->where("status", 'unsubscribe')->count();
$prolongations = $trans->where("status", 'rebilling')->count();
$redirections = $trans->where("status", 'redirect_to_lp')->count();
$conversion = $redirections == 0 ? 0 : ((float) ($subscriptions / $redirections));
$earnings = $trans->sum("pay");
$statistics->push((object)[
"date" => $date,
"subscriptions" => $subscriptions,
'unsubscriptions' => $unsubscriptions,
'prolongations' => $prolongations,
'redirections' => $redirections,
'conversion' => round($conversion, 2),
'earnings' => $earnings,
]);
}
return $statistics;
}
if count of transaction rows below 100,000 - it's all wright. But, if count is above 150-200k - nginx throw 502 bad gateway. What can you advise to me? I'm don't have any expierince in bigdata handling. May be, my impiments has fundamental error?

Big data is never easy, but I would suggest using the Laravel chunk instead of get.
https://laravel.com/docs/5.1/eloquent (ctrl+f "::chunk")
What ::chunk does is select n rows at a time, and allow you to process them bit by bit. This is convenient in that it allows you to stream updates to the browser, but at the ~150k result range, I would suggest looking up how to push this work into a background process instead of handling it on request.

After several days of researching information on this question, I found the right answer:
NOT to use PHP for handling raw data. It's better to use SQL!
In my case, we are using PostgreSQL.
Below, i'll write sql-query which worked for me, maybe it will help someone else.
WITH
cte_range(dt) AS
(
SELECT
generate_series('2016-04-01 00:00:00'::timestamp with time zone, '{$date} 00:00:00'::timestamp with time zone, INTERVAL '1 day')
),
cte_data AS
(
SELECT
date_trunc('day', dt) AS dt,
COUNT(*) FILTER (WHERE status = 'subscribe') AS count_subscribes,
COUNT(*) FILTER (WHERE status = 'unsubscribe') AS count_unsubscribes,
COUNT(*) FILTER (WHERE status = 'rebilling') AS count_rebillings,
COUNT(*) FILTER (WHERE status = 'redirect_to_lp') AS count_redirects_to_lp,
SUM(pay) AS earnings,
CASE
WHEN COUNT(*) FILTER (WHERE status = 'redirect_to_lp') > 0 THEN 100.0 * COUNT(*) FILTER (WHERE status = 'subscribe')::float / COUNT(*) FILTER (WHERE status = 'redirect_to_lp')::float
ELSE 0
END
AS conversion_percent
FROM
transactions
WHERE
website_id = {$website->id}
GROUP BY
date_trunc('day', dt)
)
SELECT
to_char(cte_range.dt, 'YYYY-MM-DD') AS day,
COALESCE(cte_data.count_subscribes, 0) AS count_subscribe,
COALESCE(cte_data.count_unsubscribes, 0) AS count_unsubscribes,
COALESCE(cte_data.count_rebillings, 0) AS count_rebillings,
COALESCE(cte_data.count_redirects_to_lp, 0) AS count_redirects_to_lp,
COALESCE(cte_data.conversion_percent, 0) AS conversion_percent,
COALESCE(cte_data.earnings, 0) AS earnings
FROM
cte_range
LEFT JOIN
cte_data
ON cte_data.dt = cte_range.dt
ORDER BY
cte_range.dt DESC

Related

Returning Data From from a Sub-query With fetchcolumn()

I am trying to simply my code. I know this works and will return a value for $waiver_cash:
$waiver_query = $pdo->prepare("SELECT cap_number + waiver_number - :salary -
(SELECT sum(current_salary)
FROM salaries
WHERE gm = :gm AND franchise IS NULL AND waiver_bid IS NULL) -
(SELECT COALESCE(sum(waiver_bid), 0) FROM salaries
WHERE gm = :gm) AS waiver_cash
FROM base_numbers;");
$waiver_query->execute(['salary' => $salary, 'gm' => $gm]);
foreach ($waiver_query as $row) {
$waiver_cash = $row['waiver_cash'];
}
However, what I want to do, is this:
$waiver_query = $pdo->prepare("SELECT cap_number + waiver_number - :salary_retained -
(SELECT sum(current_salary)
FROM salaries
WHERE gm = :gm AND franchise IS NULL AND waiver_bid IS NULL) -
(SELECT COALESCE(sum(waiver_bid), 0)
FROM salaries
WHERE gm = :gm) AS waiver_cash
FROM base_numbers;");
$waiver_query->execute(['salary_retained' => $salary_retained, 'gm' => $gm]);
$waiver_cash = $waiver_query->fetchColumn();
When I do it with "fetchColumn()" nothing gets returned. Other than changing the $pdo->prepare to $pdo->query and putting the variables in the SELECT statements, is what I want to do, possible?
I found the issue. I have another query which fetches the data for $salary_retained.
$salary_retained_query = $pdo->prepare("SELECT COALESCE(sum(current_salary), 0) FROM salaries WHERE salary_retained= :gm;");
$salary_retained_query->execute(['gm' => $gm]);
$salary_retained = $salary_retained_query->fetchColumn();
Some of the GMs didn't have any criteria which matched and so the value returned was "null". Then in the query which I was having issues, the sum was returning a null, so in my webpage, it would return blank. By changing my code from sum(current_salary to COALESCE(sum(current_salary), 0), the issue was resolved.
When testing in PGAdmin, those GMs who I knew din't have any value in the "salary_retained" column, I would put a 0 for them in the calculations, so my query worked. Thanks to islemdev for making me rethink how the data I was using was being returned.

slow table comparison

I have written down this code to compare two tables and find out the difference but it is very slow, normally I have to compare 4k rows. It actually takes 3 min to complete.
$query = $pdo->query("select * from tab1 order by date_time ASC");
$calls = array();
foreach($query as $row){
//check the differences
$from = substr($row['from'],4,15); //remove prefix
$date_time = date('Y-m-d H:i:s'
, strtotime('-2 minute',strtotime($row['date_time'])));
//decrease of 2 min the time to match all time differences
$duration = $pdo->query(
"select duration
, abs(duration - ".$row['duration'].") as duration_diff
, price from tab2
where date_time between '".$date_time."' and '".$row['date_time']."'
and from like '%".$row['from']."'
and duration >0
order by duration_diff"
)->fetch();
//highlight the differences
if ($row['duration'] > $duration['duration'] ):
$color = "#ff0000";
elseif ($row['duration'] < $duration['duration'] ):
$color = "#ff9900";
else:
$color = "#fff";
endif;
$calls[] = array(
"date_time" => $row['date_time'],
"from" => $row['from'],
"to" => $row['to'],
"duration_tab1" => $row['duration'],
"duration_tab2" => $duration['duration'],
"price_tab1" => $row['price'],
"price_tab2" => substr($duration['price'],0,6),
"color" => $color);
}
All the fields in table structure are varchar, there are no indexes.
Which indexes on which fields have to be added to increase the performance?
There is N+1 Mysql Query in your code.
Since 4k rows are not too much, I suggest that you can fetch the whole table1 and table2. And do the comparison in PHP code. It should be faster.
$query = $pdo->query("select * from tab1 order by date_time ASC");
$query1 = $pdo->query("select * from tab2 order by date_time ASC");
....
Pulling a value into php land from a query in a loop then injecting that value into another query executed in a loop is an anti-pattern.
Your database is not normalized - which is causing a lot of complications for you.
Your database is very loosely coupled - which is compounding the complexity.
You've not provided any details of the table structure nor the indexes.
Your problem statement does not address the relative cardinality of the datasets - e.g. what happens if there are no matching rows in tab2.
If you fixed your schema, then it would be trivial to do the join in the database. As it stands, I am extremely dubious as to whether your code will produce reproducible results. However the same results could be obtained by using a function to pull the relevant records out of table 2, something like....
CREATE FUNCTION tab2data(pfrom VARCHAR
, pdate_time DATETIME
, pduration FLOAT)
RETURNS VARCHAR
BEGIN
DECLARE result VARCHAR(200);
SELECT CONCAT(tab2.duration, '#',
ABS(tab2.duration-pduration), '#',
tab2.date_time, '#',
tab2.price, '#',
tab2.`from`, '#',
tab2.`to`, '#')
INTO result
FROM tab2
WHERE `from` LIKE CONCAT('%',pfrom)
AND date_time BETWEEN pdate_time AND pdate_time + INTERVAL 2 MINUTE
AND duration>0
ORDER BY ABS(tab2.duration-pduration)
LIMIT 0,1;
return result;
END;

Speeding up my queries in PHP

I'm working on trying to speed up a webpage I have created. I know the issue is that I have a query within a query. I feel like there has to be a quicker way to accomplish the same results, but I'm running out of ideas. (My first attempt at this took 45 seconds for the page to load, now I'm down to about 6)
What I'm trying to do is pull run rate information from tables. I need to pull the correct startup and end of run rates from the runrate table, but all I have to go off of initially is the workcenter ID.
I feel like if the tables were set up a little bit better then it probably would've have been so difficult, but it's what I inherited and as a result I'm a bit stuck. I need to pull a month worth of data from each workcenter (about 15) where there can be as many as 4-5 runs each day... Quite a bit of data to process.
Here's the PHP code:
$qtotalStartup = mysql_query("
SELECT startup.recordID, startup.date, startup.time, runrate.rate AS temRate, runrate.formID
FROM jos_a_inproc_startup startup JOIN jos_a_runrate runrate ON startup.recordID = runrate.recordID
WHERE startup.workcenterId = $id AND runrate.rate > 0 AND runrate.formID = 1 AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)") or die(mysql_error());
$totalStartCtr = mysql_num_rows($qtotalStartup);
if ($totalStartCtr > 0) {
while($rtotalStartup = mysql_fetch_assoc($qtotalStartup)) {
$hours = 0;
$goalRate = 0;
$sumHrRR = 0;
$startDate = 0;
$startTime = 0;
$startupNum = $rtotalStartup['recordID'];
$goalRate = $rtotalStartup['temRate'];
$startDate = $rtotalStartup['date'];
$startTime = $rtotalStartup['time'];
$startTime = strtotime($startDate . ' ' . $startTime);
//now that we have all of the startup form info, we can move to the end of run information
//this query will retrieve the correct date, time, and ending run rate for us to use with our calculations.
$qtotalEOR = mysql_query("
SELECT eor.recordID AS eorRec, eor.date, eor.time, eor.startupid, runrate1.rate AS tempRate, runrate1.formID
FROM jos_a_inproc_eor eor JOIN jos_a_runrate runrate1 ON eor.recordID = runrate1.recordID
WHERE eor.startupid = $startupNum AND runrate1.rate > 0 AND runrate1.formID = 3") or die(mysql_error());
$totalEORCtr = mysql_num_rows($qtotalEOR);
if ($totalEORCtr > 0) {
while($rtotalEOR = mysql_fetch_assoc($qtotalEOR)) {
//reset the accumulator to 0 so we don't get extra 'bad' data.
$sumHrRR = 0;
$newGoalRate = 0;
$lastestDate = 0;
$latestTime = 0;
$eorNum = $rtotalEOR['eorRec'];
$latestDate = $rtotalEOR['date'];
$latestTime = $rtotalEOR['time'];
$latestTime = strtotime($latestDate . ' ' . $latestTime);
$sumHrRR= $rtotalEOR['tempRate'];
Any ideas would be greatly appreciated. I know it may be difficult to understand what I'm trying to get at without much more information, so let me know if you need to know anything else. Thanks.
Maby try using multiple JOINS like this one:
SELECT startup.recordID, startup.date, startup.time,
runrate.rate AS temRate, runrate.formID
-- stuff from second query
eor.recordID AS eorRec, eor.date AS eor_date,
eor.time AS eor_time, eor.startupid AS eor_startupid,
runrate1.rate AS eor_tempRate,
runrate1.formID AS runrate1_formID
FROM jos_a_inproc_startup startup
JOIN jos_a_runrate runrate ON startup.recordID = runrate.recordID
-- second query LEFT JOIN
LEFT JOIN jos_a_inproc_eor eor
ON eor.startupid = startup.recordID
LEFT JOIN jos_a_runrate runrate1
ON eor.recordID = runrate1.recordID
AND runrate1.rate > 0
AND runrate1.formID = 3
WHERE startup.workcenterId = $id
AND runrate.rate > 0
AND runrate.formID = 1
AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)
I don't know if I'm right but I think that you are also doing some aggregation work with results inside PHP. You could do it inside database using like sum() or avg() and GROUP BY. You will save some time when transfering smaller result set from database to server and time for looping and aggregating inside PHP. Also most of the time using JOIN is much faster than using queries in loop or even subqueries inside query.
You should also check if indexes are set on columns you search in. Also use EXPLAIN to check how query is executed.
you can use Mem-Cache techniques to make it much faster ,and try to make your queries the simpler that u can .. dont retrieve values that you dont use in your scripts ..
How many records are you typically dealing with as output? How big are the tables? Have you reviewed the indexes? Have you analyzed them recently (rebuilt them)?
Also, are you sending the data back to the browser using deflate? See:
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
Well, you could try using multiple INNER JOINs (see) and have only one query instead of one query inside a query, which greatly impacts on performance. You could try something like this, and tweaking it a little:
SELECT
startup.recordID AS startupRecordID,
startup.date AS startupDate,
startup.time AS startupTime,
runrate.rate,
runrate.formID,
eor.recordID AS eorRecordID,
eor.date AS eorDate,
eor.time AS eorTime,
eor.startupid AS eorStartupID
FROM jos_a_inproc_startup startup
INNER JOIN jos_a_runrate runrate
ON startup.recordID = runrate.recordID
INNER JOIN jos_a_inproc_eor eor
ON startup.recordID = eor.startupid
WHERE
startup.workcenterId = $id
AND runrate.rate > 0
AND runrate.formID = 1
AND startup.date > DATE_SUB(NOW(), INTERVAL 1 MONTH)

while (mysql_fetch_array) in a while loop

i have this code:
while ($sum<16 || $sum>18){
$totala = 0;
$totalb = 0;
$totalc = 0;
$ranka = mysql_query("SELECT duration FROM table WHERE rank=1 ORDER BY rand() LIMIT 1");
$rankb = mysql_query("SELECT duration FROM table WHERE rank=2 ORDER BY rand() LIMIT 1");
$rankc = mysql_query("SELECT duration FROM table WHERE rank=3 ORDER BY rand() LIMIT 1");
while ($rowa = mysql_fetch_array($ranka)) {
echo $rowa['duration'] . "<br/>";
$totala = $totala + $rowa['duration'];
}
while ($rowb = mysql_fetch_array($rankb)) {
$totalb = $totalb + $rowb['duration'];
}
while ($rowc = mysql_fetch_array($rankc)) {
$totalc = $totalc + $rowc['duration'];
}
$sum=$totala+$totalb+$totalc;
}
echo $sum;
It works fine, But the problem is until "$sum=16" the "echo $rowa['duration']" executes, the question is, is there a away to "echo" only the latest executed code in the "while ($rowa = mysql_fetch_array($ranka))" i this while loop?
Because most of the times returns all the numbers until the "$sum=16"
You are explicitly echoing the $rowa['duration'] in the first inner while loop. If you only want to print the last duration from the $ranka set, simple change the echo to $rowa_duration = $rowa['duration'] then echo it outside the loop.
while ($rowa = mysql_fetch_array($ranka)) {
$rowa_duration = $rowa['duration'];
$totala = $totala + $rowa['duration'];
}
echo $rowa_duration . '<br/>';
What you are doing there is bad on multiple levels. And your english horrid. Well .. practice makes perfect. You could try joining ##php chat room on FreeNode server. That would improve both your english and php skills .. it sure helped me a lot. Anyway ..
The SQL
First of all, to use ORDER BY RAND() is extremely ignorant (at best). As your tables begin the get larger, this operation will make your queries slower. It has n * log2(n) complexity, which means that selecting querying table with 1000 entries will take ~3000 times longer then querying table with 10 entries.
To learn more about it , you should read this blog post, but as for your current queries , the solution would look like:
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
This would select random duration from the table.
But since you you are actually selecting data with 3 different ranks ( 1, 2 and 3 ), it would make sense to create a UNION of three queries :
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 2
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 3
LIMIT 1
Look scary, but it actually will be faster then what you are currently using, and the result will be three entries from duration column.
PHP with SQL
You are still using the old mysql_* functions to access database. This form of API is more then 10 years old and should not be used, when writing new code. The old functions are not maintained (fixed and/or improved ) anymore and even community has begun the process of deprecating said functions.
Instead you should be using either PDO or MySQLi. Which one to use depends on your personal preferences and what is actually available to you. I prefer PDO (because of named parameters and support for other RDBMS), but that's somewhat subjective choice.
Other issue with you php/mysql code is that you seem to pointlessly loop thought items. Your queries have LIMIT 1, which means that there will be only one row. No point in making a loop.
There is potential for endless loop if maximum value for duration is 1. At the start of loop you will have $sum === 15 which fits the first while condition. And at the end that loop you can have $sum === 18 , which satisfies the second loop condition ... and then it is off to the infinity and your SQL server chokes.
And if you are using fractions for duration, then the total value of 3 new results needs to be even smaller. Just over 2. Start with 15.99 , ends with 18.01 (that's additional 2.02 in duration or less the 0.7 per each). Again .. endless loop.
Suggestion
Here is how i would do it:
$pdo = new PDO('mysql:dbname=my_db;host=localhost', 'username', 'password');
$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$sum = 0;
while ( $sum < 16 )
{
$query = 'that LARGE query above';
$statement = $pdo->prepare( $query );
if ( $statement->execute() )
{
$data = $statement->fetchAll( PDO::FETCH_ASSOC );
$sum += $data[0]['duration']+$data[1]['duration']+$data[2]['duration'];
}
}
echo $data[0]['duration'];
This should do what your code did .. or at least, what i assume, was your intentions.

Getting sales values by day in CakePHP and MySQL

I have a sales model, with a salesitems related model, the sales model has some modifiers, ie discount.
To get sales totals, I have done this:
var $virtualFields = array(
'total' => '#vad:=(SELECT COALESCE(SUM(price*quantity), 0) FROM saleitems WHERE saleitems.sale_id = Sale.id)',
'paid' => '#pad:=(SELECT COALESCE(SUM(amount), 0) FROM payments WHERE payments.sale_id = Sale.id)',
'discountamount' => '#dis:=(SELECT COALESCE(SUM(price*quantity), 0) FROM saleitems WHERE saleitems.sale_id = Sale.id)*(0.01 * Sale.discount)',
'saleamount' => '#vad - #dis',
);
Which all seems to be working well. However, when I come to do some reporting, and try to get total sales amount per day, I have run up against the limit of brain power. Should I just tot them up in PHP, or run a query? Or is there a way to do this with Cake's ORM?
I tried the query method:
SELECT
created,
(#vad:=(SELECT COALESCE(SUM(price*quantity), 0) FROM saleitems WHERE `saleitems`.`sale_id` = `Sale`.`id`)) AS `Sale__total`,
(#pad:=(SELECT COALESCE(SUM(amount), 0) FROM payments WHERE `payments`.`sale_id` = `Sale`.`id`)) AS `Sale__paid`,
(#dis:=(SELECT COALESCE(SUM(price*quantity), 0) FROM saleitems WHERE `saleitems`.`sale_id` = `Sale`.`id`)*(0.01 * `Sale`.`discount`)) AS `Sale__discountamount`,
sum(#vad - #dis) AS `Sale__saleamount`
FROM `sales` AS `Sale` WHERE `Sale`.`account_id` = 37 GROUP BY DAY(`Sale`.`created`) order by created
But this is giving me completely incorrect answers.
you can run this query:
SELECT SUM((si.price * si.quantity) * (1 - (0.01 * s.discount))) AS SalesByDay
FROM sales s JOIN saleitems si ON s.id = si.sale_id
WHERE s.account_id = 37
GROUP BY DATE(s.created)
Notes:
The DAY function, returnes the day of the month, not the date
I did not join the payments table since i do not see where you use the #pad variable

Categories