PHP, MySQL, Huge Join, processing speed

PHP, MySQL, Huge Join, processing speed - php

This is more of a theoretical query than anything else, but I have a complex join (resulting in upwards of 1900 records in the main table, combined with all the sub-result tables in the join -- join shown below), the resulting web page is taking 5-10 minutes on my local machine to process and complete building. I realize this could easily be many factors, but am hoping to get some hints. Basically I am loading an array of names from two tables (one is cross-references, so the array is used to sort the data on the names, with links and a field noting if it is a cross reference), then if a name is not a cross reference, I issue this join:
select
n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio,
n.HeadShotPhoto, n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`,
p2.`Date Started`, p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink,
p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode = ?
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate;
In order to output the various parts of the data, I:
1) Start a table,
2) Output the name and some other info in the first row,
3) Then in order to process, say, the groups (sub-groups someone associates themselves with within the organization), I issue:
mysqli_data_seek( $result, 0 ); // to rewind to top of data so we're at first row
and see if there's anything to process for subgroups (not everyone has anything ...),
4) I repeat for personal relationships, and other sections, going back to the top of the data and looping back through if there's anything to process.
When done with that individual, I close off the table, and loop back in the array to the next name, and repeat ...
While this works, 5-10 minutes is way to long to load a web page.
I am pondering ideas to resolve this, but I am not sure if it is any specific aspect of my code. Is it the seeks back to the top of the rowset returned? Is it the tables in the browser? Is it a combination of both (very possibly)? The program is too big to post here in its entirety. I am feeling rather flummoxed at how to resolve this, and hoping someone has some pointers to help me speed the processing up, and I hope the details I've given are enough to give something to work with.
Based on comments and feedback below, in PHP Admin, I did the following:
explain select n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio, n.HeadShotPhoto,
n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`, p2.`Date Started`,
p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink, p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode=638
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate
This returned:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE n const PRIMARY,ix1_names PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE x ref ix2_crossref ix2_crossref 4 const 1 NULL
1 SIMPLE g ref ix3_groups ix3_groups 4 const 3 NULL
1 SIMPLE p ref ix4_personal ix4_personal 4 const 1 NULL
1 SIMPLE p2 ref ix5_positions ix5_positions 4 const 13 NULL
1 SIMPLE a ref ix6_arts ix6_arts 4 const 28 NULL
Which appears to just be a list of the indexes, so it doesn't seem to be helping me.

Since you are using a SINGLE main table and the rest of the joins are all OUTER JOIN there's a single most important index that can make your query faster:
create index ix1_names on who_names (NameCode, Name);
Also, the Nested Loop Joins (NLJ) against the related tables will benefit of the following indexes. You may already have several of these so check if you have them first. If you don't, then create them:
create index ix2_crossref on who_crossref (NameCode);
create index ix3_groups on who_groups (NameLink);
create index ix4_personal on who_personal (NameLink);
create index ix5_positions on who_positions (NameLink);
create index ix6_arts on who_arts (`Name Link`);
But again, it's the first one the one I consider the most important one.
You'll need to test for real to see if the performance improves with it/them.
If the query is still slow, please retrieve the execution plan, as #memo suggested, by using:
explain select ...

First, try removing the "order by" clause and see if that improves anything. Sometimes it can happen that the query itself is fast, but the re-ordering is slow, requiring temporary files.
Second, feed the query to an EXPLAIN statement (e.g. EXPLAIN SELECT whathaveyou FROM table...). Check out the output for bottlenecks, missing indexes etc. (https://dev.mysql.com/doc/refman/8.0/en/using-explain.html)

After a lot of work, I found a few issues that I was able to resolve: I was (thinking it made sense at the time) opening some tables when they weren't necessary to get row counts; I dropped the big join and just opened the sub-tables as needed; cleaned up a few other places in the code; added a few more indexes on another set of tables that weren't in the original join. I was able to reduce the speed from 4 minutes to 45 seconds. While 45 seconds is a long time to load a page, I figure since this page was handling up to 1500 (sometimes more) primary records, and pulling data from up to 10 different tables, formatting (tables inside tables, etc.), that 45 seconds is probably doable, with a note at the top of the page and a progress bar that displays while loading the page. Thanks, all. The indexes did help, and the other explanations also helped a lot.

Related

Performance: Get and group productdetails from 5 tables in a single query instead of multiples?

Currently I'm developing a background cms for an online shop.
I split the tables as follow in my database:
-products
-productdetails (descrition...)
-productimages
-product variants (colors..)
-product cross selling
Now on the product edit page i need to fetch all data for a single product.
So my question is how i can get those details more efficient then make 3-5 database calls.
Or would the processing with php be less efficient then make those 3-5 calls ?
At the moment the query looks like that:
SELECT
pr.id, pr.categorieid, pr.itemnumber, pr.barcode, pr.price, pr.weight, pr.gender, pr.manufracture, pr.fsk18, pr.condition, pc.id AS pcid, pc.productcrossid, pc.sort, pd.productname,
pd.productdesc, pd.additional, pd.linktitle, pd.metatitle, pd.metadesc, pd.urlkeywords, pi.id AS piid, pi.wichimage, pi.variantid, pi.image, pi.imagealt, pv.id AS pvid, pv.variant,
pv.variantvalue, pv.sku, pv.price AS pvprice, pv.weight AS pvweight, pv.stock, pv.special
FROM
products pr
LEFT JOIN
productcross as pc
ON pr.id = pc.productid
LEFT JOIN
productdetails as pd
ON pr.id = pd.productid
LEFT JOIN
productimage as pi
ON pr.id = pi.productid AND pd.lang = pi.lang
LEFT JOIN
productvariants as pv
ON pr.id = pv.productid
WHERE
pr.id = :id
ORDER BY pd.lang ASC
As result i recieve many rows, because of the left join each value get joined with the rows i joined before.
The problem is there are dynamic many rows for cross selling, variants, images, so it can be random if variants or images are more (else i could group them atleast because each variant can get an own image, but there can be also more images then variants)
Products 1 row, productdetails according to how many languages are used, most likely 3.
Edit: According to Explain and the indexes i set, the performance of this single query is very good.
Edit:
According Paul Spiegel i tryed using GROUP_CONCAT
SELECT
pr.id, pr.categorieid, pr.itemnumber, pr.barcode, pr.price, pr.weight, pr.gender, pr.manufracture, pr.fsk18, pr.condition, pc.id AS pcid, pc.productcrossid, pc.sort, pd.productname,
pd.productdesc, pd.additional, pd.linktitle, pd.metatitle, pd.metadesc, pd.urlkeywords
FROM
products pr
LEFT JOIN
productsdetails as pd
ON pr.id = pd.productid
LEFT JOIN (
SELECT
GROUP_CONCAT(productcrossid) AS pcproductcrossid, GROUP_CONCAT(sort) AS pcsort, GROUP_CONCAT(id) AS pcid, productid
FROM productscross
WHERE productid = :id
) pc
ON pr.id = pc.productid
WHERE
pr.id = :id
ORDER BY pd.lang ASC

As result i recieve many rows, because of the left join each value get joined with the rows i joined before.
That's not what LEFT means.
X JOIN Y ON ... delivers rows that show up on both X and Y.
X LEFT JOIN Y ON ... delivers all the rows of X even if there is no matching row (or rows) in Y.
You might get "many rows" because the relationship is "1:many". Think of Classes JOIN Students With JOIN you get multiple rows per Class (one per student), except for any classes without any students. With LEFT JOIN, you additionally get a row for any Class with no students.
Your query with products will be a huge explosion of rows. All products, expanded by multiple details by multiple images, etc. It will be a mess.
In the EXPLAIN, multiply the numbers in the "Rows" column -- that will be a crude metric of how big the result set will be.
Use one query to get the images; another to get the colors; etc. Use JOIN (or LEFT JOIN only when needed.
GROUP_CONCAT() is handy sometimes. It might be useful to list the "colors". But for "images", you would then have to split it up so you can build multiple <img..> tags. That's rather easy to do, but it is extra work.
It is usually 'wrong' to have 1:1 mapping between tables. In such cases, why not have a single table?
Do not fear 3-5 queries. We are talking milliseconds. The rendering of the page is likely to take several times as long as the SELECTs. I often have several dozen queries to build a web page, yet I am satisfied with the performance. And, yes, I ascribe to the notion of putting all the info about one 'product' on the page at once (when practical). It's much better than having to click here to get the colors and click there to see the images, etc.

Rather than hitting so many query's you can refer to the concept which is known as flat tables in magento.
The logic behind using this concept is that what ever important data which is required to be show on the front end is stored in single table itself as well as the data is stored in there prescriptive tables.
So while querying you just need to pick the data from that flat table itself rather than querying to multiple tables and increasing the query execution time.
For reference Please check out the below link,Hope this helps.
Visit http://excellencemagentoblog.com/blog/2015/06/10/magento-flat-tables/
I do know the question is not regarding Magento but you can build your own logic to achieve this mechanism.

INNER JOIN with subquery with max and where clause in mysql

SELECT a.ts, b.barcodenumber, a.remarks, c.department
FROM documentlog a
INNER JOIN (select docid, max(logid) as logid from documentlog GROUP BY docid) d ON d.docid=a.docid AND d.logid=a.logid
INNER JOIN user c ON c.uid=a.user
INNER JOIN document b ON b.id=a.docid
WHERE c.department = 'PTO' AND b.end = 0
My problem is When I execute this query it's slow like 2sec+ execution but the data is only 9 , How can I speed up the execution of my query?
Old SS for EXPLAIN RESULT
UPDATED SS for EXPLAIN RESULT (Add INDEX logid,docid)

Check out your EXPLAIN result. Notice that MySQL does not use any kind of key when querying the documentlog table i.e., the documentlog table does not have a key defined on it. More than 2 million records are processed at this point in your query. This could be the most likely source of the slowness of your query.
Add an index on the docid, and logid fields in your documentlog table and check if it improves the queries' execution time.
Update!!
The output of the updated EXPLAIN query is saying that it is using a full table scan!! (i.e., type=ALL) to produce the output of the main outer query. Why? This is caused by the fact that there are no indices defined on the attributes used in the Where clause i.e., (department and end).
In general, if you want to speed up queries, then one has to make sure that appropriate indices are defined for the attributes used in the queries' WHERE condition.
By the way, you can learn more about the meaning of MySQL's EXPLAIN result by reading its documentation.

Calculation of Points in a Database in 3rd NF

I have a database where the results from a shooter game are stored. I put them to 3NF to allow extensions of the system. So it looks like this:
Player
-------------------
GameId integer
PlayerId integer
TeamId integer
Hits
-------------------
GameId integer
FromId integer
ToId integer
Hits integer
So basically for every game there is a ID and every Player and Team has its ID (with their names stored in other databases)
Now I want to calculate points for each player. I need the points for each game but more importantly the total per player. The points are basically: 3 Points for each hit on opponent, -2 points for each hit of a team member and -2 points for each hit taken.
Alone the calculation of the number of team hits requires a JOIN with 3 tables and I fear for performance in production environment. (Each game has ~8 players-> PlayerDB-Size is 8n and HitsDB-Size is (8-1)^2*n)
And at the end: I need to calculate the points per player for each game and sum those up because the minimum points per game should be zero. And finally get a rank for each player (player x has the 2nd most total points etc)
I feel like I'm getting lost in overly complicated queries that will kill the database' performance at some point.
Could anyone judge the design and maybe give me some pointers where to start looking further? I though about storing the TeamHits and Points per Game in the players Database (Points for summing over them, teamHits for statistical purposes) but that would of course break normalization.
PS: I'm working with PHP 5 and MYSQL. I also thought about getting each game from the database, calculating the points in PHP (which I'm already doing when I show the game) and writing this back (optimally on putting in the game to the DB but also when the parameters for the points change)
Edit: Idea to avoid subselects would be:
SELECT p.*, SUM(h.Hits) AS TeamHits, SUM(h2.Hits) as Hits
FROM player p
LEFT JOIN
(hits h
INNER JOIN player p2
ON h.GameId=p2.GameId AND h.ToId=p2.PlayerId
)
ON p.GameId=p2.GameId AND h.FromId=p.PlayerId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId
LEFT JOIN hits h2
ON h2.GameId=p.GameId AND h2.FromId=p.PlayerId
But of course this does not work. Is it even possible to combine groupings with joins or will I have to use subqueries?
Best I have is:
SELECT p.PlayerId, SUM((-2-3)*IFNULL(th.TeamHits, 0) + (3)*IFNULL(h.Hits, 0) + (-2)*IFNULL(ht.HitsTaken, 0)) AS Points
FROM player p
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS TeamHits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.ToId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId) th
ON p.GameId=th.GameId AND p.PlayerId=th.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS Hits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
GROUP BY p.PlayerId, p.GameId) h
ON p.GameId=h.GameId AND p.PlayerId=h.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS HitsTaken
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.ToId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.FromId AND p.TeamId!=p2.TeamId
GROUP BY p.PlayerId, p.GameId) ht
ON p.GameId=ht.GameId AND p.PlayerId=ht.PlayerId
GROUP BY p.PlayerId
Fiddle: http://sqlfiddle.com/#!9/dc0cb/4
Current problem: For a database with about 10,000 games calculating the points for all players takes about 18s. This is unusable, so I need to improve this...

Joins are not that expensive, subqueries are. as long as you can avoid subqueries you're not hitting too bad.
Remember, a database is built for this stuff these days.
Just make sure you have the proper indexes on the right fields so its optimised. Like teamID and GameID and playerID should be indexes.
Just run it in phpmyadmin and see how many milliseconds it takes to execute. if it takes more than 50 its a heavy query, but usually its pretty hard to hit this... I once managed to make a very heavy query that joined 100.000+ rows out of different tables and views and still did that in 5ms...
What numbers of requests a hour are we talking about? 200 players a day? 200.000 players a day? How often do the requests happen? 10 per second per player? once a minute? how loaded is your database?
I think that all these parameters are low, so you shouldnt worry about this optimisation yet.
Get your game up and running, clean up the php code where real gains can be had, and stay clear of complex subqueries or views.
As long as your table does joins and unions its pretty darn fast. and if you must do a subquery see if there is not an alternative way by using a linking table to link certain results to certain other tables so you can do a join instead of a subquery.

How to improve the performance of MYSQL query with large data?

I am using MySQL tables that have the following data:
users(ID, name, email, create_added) (about 10000 rows)
points(user_id, point) (about 15000 rows)
And my query:
SELECT u.*, SUM(p.point) point
FROM users u
LEFT JOIN points p ON p.user_id = u.ID
WHERE u.id > 0
GROUP BY u.id
ORDER BY point DESC
LIMIT 0, 10
I only get the top 10 users having best point, but then it dies. How can I improve the performance of my query?

Like #Grim said, you can use INNER JOIN instead of LEFT JOIN. However, if you truly look for optimization, I would suggest you to have an extra field at table users with a precalculate point. This solution would beat any query optimization with your current database design.

Swapping the LEFT JOIN for an INNER JOIN would help a lot. Make sure points.point and points.user_id are indexed. I assume you can get rid of the WHERE clause, as u.id will always be more than 0 (although MySQL probably does this for you at the query optimisation stage).

It doesn't really matter than you are getting only 10 rows. MySQL has to sum up the points for every user, before it can sort them ("Using filesort" operation.) That LIMIT is applied last.
A covering index ON points(user_id,point) is going to be the best bet for optimum performance. (I'm really just guessing, without any EXPLAIN output or table definitions.)
The id column in users is likely the primary key, or at least a unique index. So it's likely you already have an index with id as the leading column, or primary key cluster index if it's InnoDB.)
I'd be tempted to test a query like this:
SELECT u.*
, s.total_points
FROM ( SELECT p.user_id
, SUM(p.point) AS total_points
FROM points p
WHERE p.user_id > 0
GROUP BY p.user_id
ORDER BY total_points DESC
LIMIT 10
) s
JOIN user u
ON u.id = s.user_id
ORDER BY s.total_points DESC
That does have the overhead of creating a derived table, but with a suitable index on points, with a leading column of user_id, and including the point column, it's likely that MySQL can optimize the group by using the index, and avoiding one "Using filesort" operation (for the GROUP BY).
There will likely be a "Using filesort" operation on that resultset, to get the rows ordered by total_points. Then get the first 10 rows from that.
With those 10 rows, we can join to the user table to get the corresponding rows.
BUT.. there is one slight difference with this result, if any of the values of user_id that are in the top 10 which aren't in the user table, then this query will return less than 10 rows. (I'd expect there to be a foreign key defined, so that wouldn't happen, but I'm really just guessing without table definitions.)
An EXPLAIN would show the access plan being used by MySQL.

Ever thought about partitioning?
I'm currently working with large database and successfully improve sql query.
For example,
PARTITION BY RANGE (`ID`) (
PARTITION p1 VALUES LESS THAN (100) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (200) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (300) ENGINE = InnoDB,
... and so on..
)
It allows us to get better speed while scanning mysql table. Mysql will scan only partition p 1 that contains userid 1 to 99 even if there are million rows in table.
Check out this http://dev.mysql.com/doc/refman/5.5/en/partitioning.html

reading data via PDO - the same row is brought twice

I read rows from some mssql table via PHPs PDO.
Some rows, are brought twice, exactly same rows, with exactly the same id values
This happens to specific rows. Each time I run my import script, the issue happens on the very same rows. For example, after bringing some 16,000 rows correctly, one row, the same one each time, is brought twice.
The duplication occurs in a row. The line is brought, and the next fetch() request returns the very same row.
When I run:
select * from MY_TABLE where id='the problematic id'
only one row is returned, not two
Any ideas what (the hell) can go on here?
Thank you very much guys
edit:
The query that is being run:
select o.accountid, c.contactid, o.opportunityid, o.createdate, o.modifydate, o.createuser, o.modifyuser, o.description, o.projclosedate, o.notes, o.accountmanagerid
from sysdba.opportunity o
left join sysdba.opportunity_contact oc on o.opportunityid = oc.opportunityid and oc.salesrole = 'speaker' ";
left join sysdba.contact c on c.contactid = oc.contactid
where o.status <> 'Inactive'
order by o.opportunityid asc;

I think you need to join your contact table to your opportunity table. It seems that you might not have a 1 to 1 mapping between those tables the way you have it set up. See below:
--This should reference the "o" table but it doesn't.
left join sysdba.contact c on c.contactid = oc.contactid
If that's not the case then you should really be joining around the opportunity_contact table instead (put it as your 'from' table).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.