Select additional rows depending on each row's value - php

What I'm trying to do is: I'm trying to build a comments and replies system for my website. I already have this working, but the way I'm doing is probably not the best for performance.
I want to select 10 rows from a table that contains the comments, then I want to select 2 additional rows from another table that contains the replies for each of these comments. I do that by having a loop on PHP to select 2 replies from another table for each comment. Something more or less like this:
$comments = $MySQL->fetchRows("SELECT id, text FROM comments LIMIT 10");
foreach($comments as $i => $c) {
$comments[$i]["replies"] = $MySQL->fetchRows("SELECT id, text FROM replies WHERE comment_id = $c['id'] LIMIT 2");
}
Like I said, I'm sure this isn't the most optimal way of doing it, since it requires multiple calls to the database. Is there a better way of doing this in a single query using MySQL?

I often have 40 well-tuned queries on a single web page. It is not bad.
On the other hand, JOINs, UNIONs, Stored Procedures, etc can cut down the number of roundtrips to the server.
Notes:
A LIMIT without an ORDER BY does not make much sense.
The two queries you have can be combined using a JOIN and a "derived table".
SELECT c.text, r.text
FROM ( SELECT id, text FROM comments
WHERE ...
ORDER BY ...
LIMIT 10 ) AS c
JOIN replies AS r
ON r.id = c.id -- Really the same id??
That will find all the replies from some 10 "comments".
To have limits on both gets trickier.

Related

Displaying a large amount of data in paging table without heavily impacting DB

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.
Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.
Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.
PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.

How to optimize a SQL query using multiple tables

I have this SQL query here that grabs the 5 latest news posts. I want to make it so it also grabs the total likes and total news comments in the same query. But the query I made seems to be a little slow when working with large amounts of data so I am trying to see if I can find a better solution. Here it is below:
SELECT *,
`id` as `newscode`,
(SELECT COUNT(*) FROM `likes` WHERE `type`="newspost" AND `code`=`newscode`) as `total_likes`,
(SELECT COUNT(*) FROM `news_comments` WHERE `post_id`=`newscode`) as `total_comments`
FROM `news` ORDER BY `id` DESC LIMIT 5
Here is a SQLFiddle as well: http://sqlfiddle.com/#!2/d3ecbf/1
I would recommend adding a total_likes and total_comments fields to the news table which gets incremented/decremented whenever a like and/or comment is added or removed.
Your likes and news_comments tables should be used for historical purposes only.
This strenuous counting should not be performed every time a page is loaded because that is a complete waste of resources.
You could rewrite this using joins, MySQL has known issues with subqueries, especially when dealing with large data sets:
SELECT n.*,
`id` as `newscode`,
COALESCE(l.TotalLikes, 0) AS `total_likes`,
COALESCE(c.TotalComments, 0) AS `total_comments`
FROM `news` n
LEFT JOIN
( SELECT Code, COUNT(*) AS TotalLikes
FROM `likes`
WHERE `type` = "newspost"
GROUP BY Code
) AS l
ON l.`code` = n.`id`
LEFT JOIN
( SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
) AS c
ON c.`post_id` = n.`id`
ORDER BY n.`id` DESC LIMIT 5;
The reason is that when you use a join as above, MySQL will materialise the results of the subquery when it is first needed, e.g at the start of this query, mySQL will put the results of:
SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
into an in memory table and hash post_id for faster lookups. Then for each row in news it only has to look up TotalComments from this hashed table, when you use a correlated subquery it will execute the query once for each row in news, which when news is large will result in a large number of executions. If the initial result set is small you may not see a performance benefit and it may be worse.
Examples on SQL Fiddle
Finally, you may want to index the relevant fields in news_comments and likes. For this particular query I think the following indexes will help:
CREATE INDEX IX_Likes_Code_Type ON Likes (Code, Type);
CREATE INDEX IX_newcomments_post_id ON news_comments (post_id);
Although you may need to split the first index into two:
CREATE INDEX IX_Likes_Code ON Likes (Code);
CREATE INDEX IX_Likes_Type ON Likes (Type);
First check for helping indexes on columns id, post_id and type,code.
I assume this is T-SQL, as that is what I am most familiar with.
First I would check indexes. If that looks good, then I'd check statement. Take a look at your query map to see how it's populating your result.
SQL works backward, so it starts with your last AND statement and goes from there. It'll group them all by code, and then type, and finally give you a count.
Right now, you're grabbing everything with certain codes, regardless of date. When you stated that you want the latest, I assume there is a date column somewhere.
In order to speed things up, add another AND to your WHERE and account for the date. Either last 24 hours, last week, whatever.

mysql join not working

I have two tables: "users" and "posts." The posts table has a 'post' column and a 'poster_id' column. I'm working on a PHP page that shows the latest posts by everyone, like this:
SELECT * FROM posts WHERE id < '$whatever' LIMIT 10
This way, I can print each result like this:
id: 43, poster_id:'4', post: hello, world
id: 44, poster_id:'4', post: hello, ward
id: 45, poster_id:'5', post: oh hi!
etc...
Instead of the id, I would like to display the NAME of the poster (there's a column for it in the 'users' table)
I've tried the following:
SELECT *
FROM posts
WHERE id < '$whatever'
INNER JOIN users
ON posts.poster_id = users.id LIMIT 10
Is this the correct type of join for this task? Before learning about joins, I would query the users table for each post result. The result should end up looking similar to this:
id: 43, poster_id:'4', name:'foo', post: hello, world
id: 44, poster_id:'4', name:'foo', post: hello, ward
id: 45, poster_id:'5', name:'fee', post: oh hi!
etc...
Thanks for helping in advance.
WHERE clause must come after the FROM clause.
SELECT posts.*, users.* // select your desired columns
FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE id < '$whatever'
LIMIT 10
the SQL Order of Operation is as follows:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
UPDATE 1
For those column names that exists on both tables, add an ALIAS on them so it can be uniquely identified. example,
SELECT post.colName as PostCol,
users.colName as UserCol, ....
FROM ....
on the example above, both tables has column name colName. In order to get them both, you need to add alias on them so in your front end, use PostCol and UserCol to get their values.
Try:
SELECT *
FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE posts.id < '$whatever'
LIMIT 10
Got the syntax a little incorrect.
Should be
SELECT * FROM posts
INNER JOIN users ON posts.poster_id = users.id
WHERE id < '$whatever' LIMIT 10
The answers already given tell you the main reason for your query not working at all (ie the WHERE clause should come after the JOIN clauses), however, I'd like to make a couple of additional points:
I would suggest using an OUTER JOIN for this. It probably won't make much difference, but in the event of a post record having an invalid poster_id, an INNER JOIN will mean the record is dropped from the results, whereas an OUTER JOIN will mean that the record is included, but the values from the users table will be null. I imagine you don't want to ever have an invalid poster_id on the posts table, but broken data does happen even in the best regulated system, and it is helpful in these cases to still get the data from the query.
I would strongly suggest not doing SELECT *, and instead itemising the fields you want to get back from the query. SELECT * has a number of problems, but it's particularly bad when you have multiple tables in the query, because if you have fields with the same name on both tables, (eg id), then it becomes very hard to distinguish which one you're working with, as your PHP recordset won't include the table reference. Itemising the fields may make your query string longer, but it won't make it any slower - if anything it'll be quicker - and it will be easier to work with in the long run.
Neither of these points are essential; the query will work without them (as long as you switch the WHERE clause to after the JOIN), but they may improve your query and hopefully also improve your understanding of SQL.

Mysql query : joined or separate

I have two tables:
One is called data and there is only one (unique) row per ID.
Second is called images and there are 3 rows per ID.
Every time the page loads i would like to fetch data and one image for exactly 5 different IDs.
My question now is: Two separate SELECT queries or one query where both are joined.
Queries:
...
$all = $row["iD"] // includes **5** last iDs - fetched from DB
$all = implode(',',$all);
SELECT Name, Address FROM data WHERE iD IN($all);
SELECT url FROM images WHERE iD IN ($all) LIMIT 1;
I already have 3 other select queries on page, so i would like to know what is best regarding performance, one bigger - joined or two small - faster queries.
If join, how would these two be joined?
You have three images per ID and desire one image per ID for the last inserted images (aka "recent content" )?
Then you could use one easy natural join combined with group by like this:
SELECT d.Name, d.Address, MAX(i.url)
FROM data d, images i
WHERE i.iD = d.iD
GROUP BY d.Name, d.Address
ORDER BY d.iD DESC
LIMIT 5
Most of the time it is better to combine selects to skip the programmitcally overhead (calling mysql_query() in an loop itself for example).
But sometimes it depends on the underlying data.
Since your queries go to completely separate tables, I recommend you stay with 2 separate queries: This keeps the result sets smaller and makes it more likely, that at least one stays in the query cache,
Concerning your 2nd query: Do you understand, that this is not guaranteed to fetch a special URL, but any? Mostly the first one by key, but not guaranteed so.
For an answer on performance issues, see JOIN queries vs multiple queries . What I can understand from there is that performance issues vary depending on the specific situation so you should test both.
For the join, you could do;
SELECT User.iD, User.Name, User.Address, Image.url
FROM images as Image
JOIN data as User
ON Image.iD = User.iD
WHERE Image.iD IN ($all)
LIMIT 1;
It is not tested yet, so you should take it with a grain of salt. It is at least a starting point.

Quering one table and ordering by occurences from another

I need to select all my comments from the 'comments' talbe. the thing is, each comment receive multiple 'likes' which are stored in the 'likes' table.
I would like to select all the items from the 'comments' table, but order then by the number of likes when each 'like' is represented as a row in the 'likes' table...
does anyone know a good way to do this?
thanks so much,
Yanipan
** edit **
Hi,
you are correct, sorry.
I use php as my server side, and MySql as my database...
I forgot how wide is the range of questions asked on this forum...
Fetching 10 most liked comments, including those without any likes at all (left join) (assuming a simple structure with two tables: comments and likes):
select c.*,count(l.id) as likes
from comments c
left join likes l on c.id = l.comment
group by l.comment
order by count(l.id) desc
limit 10;
Note that the performance of this query will be pretty nasty though. Hence you'll most likely have to find another strategy to sort your comments by number of likes.

Categories