I am using doctrine in my symfony2 project.
I have a table Event and a table Photo. One event can have one or more photos, and a photo is related to one event.
Here is one of my dql queries :
$dql = "
SELECT e, (e.views * 0.1) + (e.likes * 0.9) as ratingEvent
FROM WevseenMainBundle:Event e
INNER JOIN e.photos p
INNER JOIN e.firstPhoto fp
WHERE fp.date BETWEEN :dateA AND :dateB
AND p.lat BETWEEN :latA AND :latB
AND ( p.lng > :lngA AND p.lng < :lngB )
AND e.status = 'open'
GROUP BY e
HAVING COUNT(p.id) >= :minCountPhotos
ORDER BY ratingEvent DESC
";
$query = $em->createQuery($dql)
->setParameters($parameters)
->setFirstResult($firstEntry) // 0
->setMaxResults($numberOf); // 10
$paginatorEvents = new Paginator($query, true);
With something like 11500 events and 160 000 photos, the query is very slow (more than 10 seconds), it is coming from the
GROUP BY e
HAVING COUNT(p.id) >= :minCountPhotos
Without this, it's fast.
I checked the sf2 profiler and it says that :
SELECT count(DISTINCT e0_.id) AS sclr0 FROM Event e0_ INNER JOIN Photo p1_ ON e0_.id = p1_.event_id INNER JOIN Photo p2_ ON e0_.firstPhoto_id = p2_.id WHERE p2_.date BETWEEN ? AND ? AND p1_.lat BETWEEN ? AND ? AND (p1_.lng > ? AND p1_.lng < ?) AND e0_.status = 'open' GROUP BY e0_.id, e0_.name, e0_.description, e0_.nb_photos, e0_.views, e0_.viewsEventPhotos, e0_.votes, e0_.rating, e0_.likes, e0_.up, e0_.down, e0_.status, e0_.end, e0_.time, e0_.averageTimeEvent, e0_.averageTimePhotos, e0_.averageTimeEventAndPhotos, e0_.needInstagramUpdate, e0_.origin, e0_.featured, e0_.firstPhoto_id HAVING COUNT(p1_.id) >= ?
Parameters: [Object(DateTime), Object(DateTime), '-42.93442389074508', '73.48078267112892', '-180', '180', '2']
Time: 2029.33 ms
SELECT DISTINCT e0_.id AS id0, e0_.views * 1 + e0_.likes * 0 AS sclr1 FROM Event e0_
INNER JOIN Photo p1_ ON e0_.id = p1_.event_id INNER JOIN Photo p2_ ON e0_.firstPhoto_id = p2_.id WHERE p2_.date BETWEEN ? AND ? AND p1_.lat BETWEEN ? AND ? AND (p1_.lng > ? AND p1_.lng < ?) AND e0_.status = 'open' GROUP BY e0_.id, e0_.name, e0_.description, e0_.nb_photos, e0_.views, e0_.viewsEventPhotos, e0_.votes, e0_.rating, e0_.likes, e0_.up, e0_.down, e0_.status, e0_.end, e0_.time, e0_.averageTimeEvent, e0_.averageTimePhotos, e0_.averageTimeEventAndPhotos, e0_.needInstagramUpdate, e0_.origin, e0_.featured, e0_.firstPhoto_id HAVING COUNT(p1_.id) >= ? ORDER BY sclr1 DESC LIMIT 10 OFFSET 0
Time: 6179.01 ms
which are the two queries that takes time.
How can I improve this ?
UPDATE, SOLUTION:
changed
GROUP BY e
with
GROUP BY e.id
Your request is executed only when you iterate on the paginator.
You should copy the translated SQL request and profile it outside of Doctrine, chances are it will take 8 seconds to execute. And chances are it will be much faster once you add some wisely chosen indexes to your tables.
Related
I would like to better optimize my code. I'd like to have a single query that allows an alias name to have it's own limit and also include a result with no limit.
Currently I'm using two queries like this:
// ALL TIME //
$mikep = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC;");
$num_rows_mikep = mysqli_num_rows($mikep);
$rating_sum_mikep = 0;
while ($row = mysqli_fetch_assoc($mikep)) {
$rating_mikep = $row['how_did_gig_go'];
$rating_sum_mikep += $rating_mikep;
}
$average_mikep = $rating_sum_mikep/$num_rows_mikep;
// AND NOW WITH A LIMIT 10 //
$mikep_limit = mysqli_query($link, "SELECT tasks.EID, reports.how_did_gig_go FROM tasks INNER JOIN reports ON tasks.EID=reports.eid WHERE `priority` IS NOT NULL AND `partners_name` IS NOT NULL AND mike IS NOT NULL GROUP BY EID ORDER BY tasks.show_date DESC LIMIT 10;");
$num_rows_mikep_limit = mysqli_num_rows($mikep_limit);
$rating_sum_mikep_limit = 0;
while ($row = mysqli_fetch_assoc($mikep_limit)) {
$rating_mikep_limit = $row['how_did_gig_go'];
$rating_sum_mikep_limit += $rating_mikep_limit;
}
$average_mikep_limit = $rating_sum_mikep_limit/$num_rows_mikep_limit;
This allows me to show an all-time average and also an average over the last 10 reviews. Is it really necessary for me to set up two queries?
Also, I understand I could get the sum in the query, but not all the values are numbers, so I've actually converted them in PHP, but left out that code in order to try and simplify what is displayed in the code.
All-time average and average over the last 10 reviews
In the best case scenario, where your column how_did_gig_go was 100% numeric, a single query like this could work like so:
SELECT
AVG(how_did_gig_go) AS avg_how_did_gig_go
, SUM(CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END) / 10 AS latest10_avg
FROM (
SELECT
#num + 1 AS rn
, tasks.show_date
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
But; Unless all the "numbers" are in fact numeric you are doomed to sending every row back from the server for php to process unless you can clean-up the data in MySQL somehow.
You might avoid sending all that data twice if you establish a way for your php to use only the top 10 from the whole list. There are probably way of doing that in PHP.
If you wanted assistance in SQL to do that, then maybe having 2 columns would help, it would reduce the number of table scans.
SELECT
EID
, how_did_gig_go
, CASE
WHEN rn <= 10 THEN how_did_gig_go
ELSE 0
END AS latest10_how_did_gig_go
FROM (
SELECT
#num + 1 AS rn
, tasks.EID
, reports.how_did_gig_go
FROM tasks
INNER JOIN reports ON tasks.EID = reports.eid
CROSS JOIN ( SELECT #num := 0 AS n ) AS v
WHERE priority IS NOT NULL
AND partners_name IS NOT NULL
AND mike IS NOT NULL
ORDER BY tasks.show_date DESC
) AS d
In future (MySQL 8.x) ROW_NUMBER() OVER(order by tasks.show_date DESC) would be a better method than the "roll your own" row numbering (using #num+1) shown before.
Can anyone tell me how to make this query faster?
$session_id = '000000000015';
$start = 0;
$finish = 30;
try {
$stmt = $conn->prepare("SELECT TOPUSERS.ID, TOPUSERS.USERNAME, TOPUSERS.NAME, TOPUSERS.NAME2, TOPUSERS.PHOTO, TOPUSERS.FB_USERID, TOPUSERS.IMAGE_TYPE, TOPUSERS.TW_USERID, TOPUSERS.TW_PHOTO,
COALESCE((SELECT COUNT(USERS_BUCKETS.ID) FROM USERS_BUCKETS WHERE USERS_BUCKETS.USERID=TOPUSERS.ID),0) AS NUM_ALL,
COALESCE((SELECT SUM(CASE WHEN USERS_BUCKETS.STATUS='Completed' THEN 1 ELSE 0 END) FROM USERS_BUCKETS WHERE USERS_BUCKETS.USERID=TOPUSERS.ID),0) AS NUM_DONE,
COALESCE((SELECT COUNT(USERS_LIKES.ID) FROM USERS_LIKES WHERE USERS_LIKES.USERID=TOPUSERS.ID),0) AS NUM_LIKES,
(SELECT USERS_BUCKETS.BUCKETID FROM USERS_BUCKETS WHERE USERS_BUCKETS.USERID=TOPUSERS.ID ORDER BY USERS_BUCKETS.DATE_MODIFIED DESC LIMIT 1) AS RECENT_BUCKET,
(SELECT BUCKETS_NEW.BUCKET_NAME FROM BUCKETS_NEW WHERE BUCKETS_NEW.ID=RECENT_BUCKET) AS REC,
COALESCE((SELECT COUNT(ID) FROM FOLLOW WHERE FOLLOW.USER_ID=TOPUSERS.ID),0) AS FOLLOWING,
COALESCE((SELECT COUNT(ID) FROM FOLLOW WHERE FOLLOW.FOLLOW_ID=TOPUSERS.ID),0) AS FOLLOWERS,
(SELECT IF(TOPUSERS.NAME = '',0,1) + IF(TOPUSERS.BIO = '',0,1) + IF(TOPUSERS.LOCATION = '',0,1) + IF(TOPUSERS.BIRTHDAY = '0000-00-00',0,1) + IF(TOPUSERS.GENDER = '',0,1)) as COMPLETENESS,
CASE WHEN ? IN (SELECT USER_ID FROM FOLLOW WHERE FOLLOW_ID = TOPUSERS.ID) THEN 'Yes' ELSE 'No' END AS DO_I_FOLLOW_HIM
FROM TOPUSERS
LEFT JOIN FOLLOW ON TOPUSERS.ID = FOLLOW.FOLLOW_ID
LEFT JOIN USERS_BUCKETS ON USERS_BUCKETS.USERID=TOPUSERS.ID
LEFT JOIN BUCKETS_NEW ON BUCKETS_NEW.ID=USERS_BUCKETS.BUCKETID
WHERE NOT TOPUSERS.ID = ?
GROUP BY TOPUSERS.ID ORDER BY TOPUSERS.RANDOM, TOPUSERS.USERNAME LIMIT $start, $finish");
When I run this in a browser it takes about 7 seconds to load. Without a few lines (the COALESCE in the middle, the two SELECTS above and the line below them) the time is reduced to 3-4 seconds.
The result of the query is a list of people with names, profile picture and some data.
TL,DR: you need to rewrite the query.
You need to rewrite your query to make it more efficient. I had to rewrite a similar query at work last week and here is what I have done.
The structure of your query should look like this to be efficient:
select ...
...
from ...
join ...
where ...
what you have now is something like:
select ...
inner select
inner select
from ...
join ...
where ...
That's the inner selects that kill your query. You need to find a way to move the inner select into the from section. Especially that you already query the tables.
What you need to understand is that your inner selects run for every records you have. So if you have 10 records, it would be alright (speed wise). But with hundred or thousand of records, it would be very slow.
If you want more information on your query run it with the explain keyword in from of it.
Given I have an instance of Event ($event) that has many AttendancePerson, I need to get all of the AttendancePerson objects belonging to $event where the AttendancePerson.person attended more than one event that has a calendar_id matching $event->calendar_id and where the AttendancePerson.event.dateTo ends in the previous year.
The schema minus irrelevant column names:
event_attendance_person
- id
- event_id
- person_id
event
- id
- calendar_id
- dateTo
person
- id
event_calendar
- id
The purpose is to find old members of any given event. Any event attendance person who attended an event sharing the same calendar more than once in the previous year is an "old member" of the event.
I read through many relevant questions. None of them helped. Thank you to anyone who can help on this.
For your specific requirement of having persons from event_attendance_person who have attended more than 1 event in past year of same calendar to the calendar of provided event so in plain Mysql query you can join your tables get the count of distinct events per person id i.e COUNT(DISTINCT e.id) and a conditional count for the provided event id lets say i want to get the persons who have attended event with id 2228 so for this suing case in count you can do so COUNT(CASE WHEN e.id = 2228 THEN 1 END) this will give you the count 1 for the person who attended this event and 0 for persons who misses that event, reason for this conditional count is because i am not using where filter for event id i have overcome this one by using having clause and for the past year a simple where clause is WHERE e.dateTo < DATE_FORMAT(NOW() ,'%Y-01-01 00:00:00')
SELECT p.*,COUNT(DISTINCT e.id) total_events,
COUNT(CASE WHEN e.id = 2228 THEN 1 END) count_event
FROM `event_attendance_person` p
JOIN `event_event` e ON(p.`eventId` = e.id )
JOIN `event_calendar` c ON(e.`calendar` =c.`id`)
WHERE e.`dateTo` < DATE_FORMAT(NOW() ,'%Y-01-01 00:00:00')
GROUP BY p.`personId`
HAVING count_event = 1 AND total_events > 1
ORDER BY total_events DESC
You can test this query on your Mysql server
Now here comes the doctrine part you can replicate above query in DQL as
$DQL="SELECT p,COUNT(DISTINCT e.id) AS total_events,
COUNT(CASE WHEN e.id = 2228 THEN 1 END) AS count_event
FROM NamespaceYourBundle:EventAttendencePerson p
JOIN p.events e
JOIN e.calandar c
WHERE e.dateTo < :dateTo
GROUP BY p.personId
HAVING total_events = 1 AND count_event >1
ORDER BY c DESC
";
For above DQL i assume you have already mapped your relations among your entities like for above query below are the mandatory relations which must exist in your entities
JOIN p.events e Now p is alias for entityNamespaceYourBundle:EventAttendencePerson, EventAttendencePerson entity must point to your Event entity so that the on ON(p.eventId = e.id ) part can be achieved
JOIN e.calandar c Now Event entity must point to your Calendar entity in order to achieve ON(e.calendar =c.id)
And then you can run your DQL as below by using doctrine's paginator class
use Doctrine\ORM\Tools\Pagination\Paginator;
$query = $DM->createQuery($DQL)
->setParameter('dateTo', date("Y-01-01 00:00:00"))
->setFirstResult(0)->setMaxResults(100);
$Persons = new Paginator($query, $fetchJoinCollection = true);
Assuming that (person_id,event_id) is unique in event_attendance_person.
1 get all persons belonging to given event
2 for each person get all their other events having the same calendar id and some end date
3 group by person id
4 filter by persons having more than 1 other event
In SQL (updated column names to match example data)
select p.id
from event_event e
join event_attendance_person eap on eap.eventId = e.id
join person p on eap.personId = p.id
join event_attendance_person eap2 on eap2.personId = p.id
join event_event e2 on e2.id = eap2.eventId
where e.id = 2230
and e2.id <> 2230
and e2.calendar = e.calendar
and e2.dateTo between '2013-01-01' and '2014-12-31'
group by p.id
having count(e2.id) > 1
Using QueryBuilder
$qb->select('p')
->from('MyBundleNameSpace\Entity\Event', 'e')
->innerJoin('e.person','p')
->innerJoin('p.event','e2')
->where('IDENTITY(e) = :event_id')
->andWhere('IDENTITY(e2) != :event_id')
->andWhere('IDENTITY(e2.calendar) = IDENTITY(e.calendar)')
->andWhere('e2.dateTo BETWEEN :start AND :end')
->groupBy('p')
->having('count(e2.id) > 1')
->setParameter('event_id',$event->getId())
->setParameter('start','2012-01-1')
->setParameter('end','2013-12-31');
Why does event_attendance_person need to be it's own object? Wouldn't a many to many relationship with a join table like event_person suffice?
Anycase assuming you have your doctrine entities set up correctly, you'd probably want to split this up into two separate DQL queries, the first query is where you get the list of people who attended the event and then you pass the id list of those people into your second query which does a where person_id IN (person_ids) and WHERE event_id != event.id AND calendar.id = event.calendar.id AND event.dateTo > calculated_date
the DQL for those two separate queries should be easy enough to write.
I am making a stats page about golf for the people I play with. I am trying to pull out of the database the number of times out of all our scorecards that we received birdies (which is -1 under par). It does pull out the -1s per hole, however I noticed that you if you had 2 birdies on a scorecard, it still only counts as 1 birdie instead of 2. I want it to keep counting, so if someone gets 9 birdies, those 9 are added to the total.
$query_p321 = "SELECT t1.*,COUNT(t1.player_id),t2.* FROM scorecards t1 LEFT JOIN courses t2 ON t1.course_id=t2.course_id
WHERE t1.hole1<t2.hole1_par AND t1.hole1>t2.hole1_par-2
OR t1.hole2<t2.hole2_par AND t1.hole2>t2.hole2_par-2
OR t1.hole3<t2.hole3_par AND t1.hole3>t2.hole3_par-2
OR t1.hole4<t2.hole4_par AND t1.hole4>t2.hole4_par-2
OR t1.hole5<t2.hole5_par AND t1.hole5>t2.hole5_par-2
OR t1.hole6<t2.hole6_par AND t1.hole6>t2.hole6_par-2
OR t1.hole7<t2.hole7_par AND t1.hole7>t2.hole7_par-2
OR t1.hole8<t2.hole8_par AND t1.hole8>t2.hole8_par-2
OR t1.hole9<t2.hole9_par AND t1.hole9>t2.hole9_par-2
OR t1.hole10<t2.hole10_par AND t1.hole10>t2.hole10_par-2
OR t1.hole11<t2.hole11_par AND t1.hole11>t2.hole11_par-2
OR t1.hole12<t2.hole12_par AND t1.hole12>t2.hole12_par-2
OR t1.hole13<t2.hole13_par AND t1.hole13>t2.hole13_par-2
OR t1.hole14<t2.hole14_par AND t1.hole14>t2.hole14_par-2
OR t1.hole15<t2.hole15_par AND t1.hole15>t2.hole15_par-2
OR t1.hole16<t2.hole16_par AND t1.hole16>t2.hole16_par-2
OR t1.hole17<t2.hole17_par AND t1.hole17>t2.hole17_par-2
OR t1.hole18<t2.hole18_par AND t1.hole18>t2.hole18_par-2
GROUP BY t1.player_id ORDER BY count(t1.player_id) DESC";
$result_p321 = mysql_query($query_p321);
$number = 1;
while ($row_p321 = mysql_fetch_array($result_p321)) {
$player_id2 = $row_p321["player_id"];
}
and so on..
You'll notice the "-2" in there. That is taking the par minus 2, as I don't want to record if the person is 2 strokes under. Just one stroke under. Any help is appreciated. Thank you.
Oh, also, GROUP BY needs to be used as I don't want to list the player name more than once. Just want it to count all the birdies. I guess my big problem is its not counting more than 1 per row. Thanks.
The problem is the where clause. You need to do the comparisons in the select clause in order to count them:
SELECT t1.*,
sum((t1.hole1 = t2.hole1_par - 1) +
(t1.hole2 = t2.hole2_par - 1) +
. . .
(t1.hole18 = t2.hole18_par - 1)
) as birdies
FROM scorecards t1 LEFT JOIN
courses t2 ON t1.course_id=t2.course_id
GROUP BY t1.player_id
ORDER BY birdies DESC
This uses the MySQL convention that true is 1 and false 0 to add the numbers up. An alternative formulation using standard SQL is:
sum((case when t1.hole1 = t2.hole1_par - 1) then 1 else 0 end) +
Try something like that:
SELECT t1.*, SUM( IF(t1.hole1 = t2.hole1_par-1,1,0) +
IF(t1.hole2 = t2.hole2_par-1,1,0) +
IF(t1.hole3 = t2.hole3_par-1,1,0) +
IF(t1.hole4 = t2.hole4_par-1,1,0) +
-- etc.
IF(t1.hole18 = t2.hole18_par-1,1,0) ) AS birdies
FROM scorecards t1
LEFT JOIN courses t2 ON t1.course_id=t2.course_id
GROUP BY t1.player_id
ORDER BY birdies DESC
I have the script:
SELECT *, (pbct_hits + (COUNT(likes.rvw_usr_like) * 5) - (COUNT(unlikes.rvw_usr_like)) * 5) AS score
FROM tb_publications
LEFT JOIN tb_reviews_users likes ON likes.rvw_usr_fk_publication = pbct_id AND likes.rvw_usr_like IS TRUE
LEFT JOIN tb_reviews_users unlikes ON unlikes.rvw_usr_fk_publication = pbct_id AND unlikes.rvw_usr_like IS FALSE
GROUP BY pbct_id
ORDER BY score DESC;
I would not want to make two joins to the same table.
I believe it is possible to optimize the above script, but I'm not getting.
Edit
The question is solved:
-- Final Script:
SELECT pbct.*
FROM tb_publications pbct
LEFT JOIN tb_reviews_users ON rvw_usr_fk_publication = pbct_id
GROUP BY pbct_id
ORDER BY
(
(pbct_hits * 1) +
((SUM(CASE WHEN rvw_usr_like IS TRUE THEN 1 ELSE 0 END)) * 5) -
((SUM(CASE WHEN rvw_usr_like IS FALSE THEN 1 ELSE 0 END)) * 5)
) DESC, pbct_record ASC;
Is based on answer of #MikeSmithDev.
What about
SELECT pbct_id,
score =
(pbct_hits +
((SUM(CASE WHEN rvw_usr_like IS TRUE THEN 1 ELSE 0 END)) * 5) -
((SUM(CASE WHEN rvw_usr_like IS FALSE THEN 1 ELSE 0 END)) * 5))
FROM tb_publications
LEFT JOIN tb_reviews_users likes ON likes.rvw_usr_fk_publication = pbct_id
GROUP BY pbct_id
That should work... or do something simpler in SQL with math on php side
I would not do the math like that in the query. I would do:
SELECT *
FROM tb_publications
LEFT JOIN tb_reviews_users review_users ON review_users.rvw_usr_fk_publication = pbct_id
GROUP BY pbct_id
then I would manually do the math in php
$score = 0;
if($row['rvw_usr_like'])
$score += 5;
Also, depending on whether you insert likes or display score more often, you may want to consider storing an aggregate score in the publications table.