inner join large table with small table , how to speed up

inner join large table with small table , how to speed up - php

dear php and mysql expertor
i have two table one large for posts artices 200,000records (index colume: sid) , and one small table (index colume topicid ) for topics has 20 record .. have same topicid
curent im using : ( it took round 0.4s)
+do get last 50 record from table:
SELECT sid, aid, title, time, topic, informant, ihome, alanguage, counter, type, images, chainid FROM veryzoo_stories ORDER BY sid DESC LIMIT 0,50
+then do while loop in each records for find the maching name of topic in each post:
while ( .. ) {
SELECT topicname FROM veryzoo_topics WHERE topicid='$topic'"
....
}
+Now
I going to use Inner Join for speed up process but as my test it took much longer from 1.5s up to 3.5s
SELECT a.sid, a.aid, a.title, a.time, a.topic, a.informant, a.ihome, a.alanguage, a.counter, a.type, a.images, a.chainid, t.topicname FROM veryzoo_stories a INNER JOIN veryzoo_topics t ON a.topic = t.topicid ORDER BY sid DESC LIMIT 0,50
It look like the inner join do all joining 200k records from two table fist then limit result at 50 .. that took long time..
Please help to point me right way doing this..
eg take last 50 records from table one.. then join it to table 2 .. ect

Do not use inner join unless the two tables share the same primary key, or you'll get duplicate values (and of course a slower query).

Please try this :
SELECT *
FROM (
SELECT a.sid, a.aid, a.title, a.time, a.topic, a.informant, a.ihome, a.alanguage, a.counter, a.type, a.images, a.chainid
FROM veryzoo_stories a
ORDER BY sid DESC
LIMIT 0 , 50
)b
INNER JOIN veryzoo_topics t ON b.topic = t.topicid
I made a small test and it seems to be faster. It uses a subquery (nested query) to first select the 50 records and then join.
Also make sure that veryzoo_stories.sid, veryzoo_stories.topic and veryzoo_topics.topicid are indexes (and that the relation exists if you use InnoDB). It should improve the performance.
Now it leaves the problem of the ORDER BY LIMIT. It is heavy because it orders the 200,000 records before selecting. I guess it's necessary. The indexes are very important when using ORDER BY.
Here is an article on the problem : ORDER BY … LIMIT Performance Optimization

I'm just give test to nested query + inner join and suprised that performace increase much: it now took only 0.22s . Here is my query:
SELECT a.*, t.topicname
FROM (SELECT sid, aid, title, TIME, topic, informant, ihome, alanguage, counter, TYPE, images, chainid
FROM veryzoo_stories
ORDER BY sid DESC
LIMIT 0, 50) a
INNER JOIN veryzoo_topics t ON a.topic = t.topicid
if no more solution come up , i may use this one .. thanks for anyone look at this post

Related

SQL Execution tooks too long

I'm currently working on a project that needs a "Similar Series" Panel. I have a Table for the series and one of their genres. Need to select movies with least 4 genres in common with the selected movie.
I came up with this query:
SELECT s_id,stitle,sdesc FROM s_genre
INNER JOIN series ON s_genre.s_id=series.id
WHERE s_id IN (
SELECT s_id FROM s_genre
INNER JOIN (SELECT g_id FROM s_genre WHERE s_id = $mid) a USING (g_id)
GROUP BY s_id HAVING COUNT(s_id)>3
) AND s_id != $mid
ORDER BY RAND()
LIMIT 5
It works, but it takes almost 20 seconds to load :(
Any ideas how I can reduce the loading time? Any suggestions are appreciated :)
Thanks in advance.
Looks like the problem is cause by "ORDER BY RAND()" ... removing it would reduce the loading time to 0.5s

While ORDER BY rand() seems to be the major problem, it might be worth recoding the query to do a join against the sub query:-
SELECT s_genre.s_id,
series.stitle,
series.sdesc
FROM s_genre
INNER JOIN series ON s_genre.s_id = series.id
INNER JOIN
(
SELECT a.s_id
FROM s_genre a
INNER JOIN s_genre b
ON a.g_id = b.g_id
WHERE b.s_id = $mid
GROUP BY a.s_id
HAVING COUNT(a.s_id) > 3
) sub0
ON s_genre.s_id = sub0.s_id
WHERE s_genre.s_id != $mid
ORDER BY RAND()
LIMIT 5
There are techniques to select random records, but doing so with this query could be a bit messy. Most rely on having a fairly evenly distributed unique identifier for the rows, and (while I assume s_genre.s_id is a unique id) the query will have generated holes in the range of these and they will doubtless have holes in the range.

How to quickly SELECT 3 random records from a 30k MySQL table with a where filter by a single query?

Well, this is a very old question never gotten real solution. We want 3 random rows from a table with about 30k records. The table is not so big in point of view MySQL, but if it represents products of a store, it's representative. The random selection is useful when one presents 3 random products in a webpage for example. We would like a single SQL string solution that meets these conditions:
In PHP, the recordset by PDO or MySQLi must have exactly 3 rows.
They have to be obtained by a single MySQL query without Stored Procedure used.
The solution must be quick as for example a busy apache2 server, MySQL query is in many situations the bottleneck. So it has to avoid temporary table creation, etc.
The 3 records must be not contiguous, ie, they must not to be at the vicinity one to another.
The table has the following fields:
CREATE TABLE Products (
ID INT(8) NOT NULL AUTO_INCREMENT,
Name VARCHAR(255) default NULL,
HasImages INT default 0,
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The WHERE constraint is Products.HasImages=1 permitting to fetch only records that have images available to show on the webpage. About one-third of records meet the condition of HasImages=1.
Searching for a Perfection, we first let aside the existent Solutions that have drawbacks:
I. This basic solution using ORDER BY RAND(),
is too slow but guarantees 3 really random records at each query:
SELECT ID, Name FROM Products WHERE HasImages=1 ORDER BY RAND() LIMIT 3;
*CPU about 0.10s, scanning 9690 rows because of WHERE clause, Using where; Using temporary; Using filesort, on Debian Squeeze Double-Core Linux box, not so bad but
not so scalable to a bigger table as temporary table and filesort are used, and takes me 8.52s for the first query on the test Windows7::MySQL system. With such a poor performance, to avoid for a webpage isn't-it ?
II. The bright solution of riedsio using JOIN ... RAND(),
from MySQL select 10 random rows from 600K rows fast, adapted here is only valid for a single random record, as the following query results in an almost always contiguous records. In effect it gets only a random set of 3 continuous records in IDs:
SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT (RAND() * (SELECT MAX(ID) FROM Products)) AS ID)
AS t ON Products.ID >= t.ID
WHERE (Products.HasImages=1)
ORDER BY Products.ID ASC
LIMIT 3;
*CPU about 0.01 - 0.19s, scanning 3200, 9690, 12000 rows or so randomly, but mostly 9690 records, Using where.
III. The best solution seems the following with WHERE ... RAND(),
seen on MySQL select 10 random rows from 600K rows fast proposed by bernardo-siu:
SELECT Products.ID, Products.Name FROM Products
WHERE ((Products.Hasimages=1) AND RAND() < 16 * 3/30000) LIMIT 3;
*CPU about 0.01 - 0.03s, scanning 9690 rows, Using where.
Here 3 is the number of wished rows, 30000 is the RecordCount of the table Products, 16 is the experimental coefficient to enlarge the selection in order to warrant the 3 records selection. I don't know on what basis the factor 16 is an acceptable approximation.
We so get at the majority of cases 3 random records and it's very quick, but it's not warranted: sometimes the query returns only 2 rows, sometimes even no record at all.
The three above methods scan all records of the table meeting WHERE clause, here 9690 rows.
A better SQL String?

Ugly, but quick and random. Can become very ugly very fast, especially with tuning described below, so make sure you really want it this way.
(SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)
UNION ALL
(SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)
UNION ALL
(SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT RAND()*(SELECT MAX(ID) FROM Products) AS ID) AS t ON Products.ID >= t.ID
WHERE Products.HasImages=1
ORDER BY Products.ID
LIMIT 1)
First row appears more often than it should
If you have big gaps between IDs in your table, rows right after such gaps will have bigger chance to be fetched by this query. In some cases, they will appear significatnly more often than they should. This can not be solved in general, but there's a fix for a common particular case: when there's a gap between 0 and the first existing ID in a table.
Instead of subquery (SELECT RAND()*<max_id> AS ID) use something like (SELECT <min_id> + RAND()*(<max_id> - <min_id>) AS ID)
Remove duplicates
The query, if used as is, may return duplicate rows. It is possible to avoid that by using UNION instead of UNION ALL. This way duplicates will be merged, but the query no longer guarantees to return exactly 3 rows. You can work around that too, by fetching more rows than you need and limiting the outer result like this:
(SELECT ... LIMIT 1)
UNION (SELECT ... LIMIT 1)
UNION (SELECT ... LIMIT 1)
...
UNION (SELECT ... LIMIT 1)
LIMIT 3
There's still no guarantee that 3 rows will be fetched, though. It just makes it more likely.

SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT (RAND() * (SELECT MAX(ID) FROM Products)) AS ID) AS t ON Products.ID >= t.ID
WHERE (Products.HasImages=1)
ORDER BY Products.ID ASC
LIMIT 3;
Of course the above is given "near" contiguous records you are feeding it the same ID every time without much regard to the seed of the rand function.
This should give more "randomness"
SELECT Products.ID, Products.Name
FROM Products
INNER JOIN (SELECT (ROUND((RAND() * (max-min))+min)) AS ID) AS t ON Products.ID >= t.ID
WHERE (Products.HasImages=1)
ORDER BY Products.ID ASC
LIMIT 3;
Where max and min are two values you choose, lets say for example sake:
max = select max(id)
min = 225

This statement executes really fast (19 ms on a 30k records table):
$db = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password');
$stmt = $db->query("SELECT p.ID, p.Name, p.HasImages
FROM (SELECT #count := COUNT(*) + 1, #limit := 3 FROM Products WHERE HasImages = 1) vars
STRAIGHT_JOIN (SELECT t.*, #limit := #limit - 1 FROM Products t WHERE t.HasImages = 1 AND (#count := #count -1) AND RAND() < #limit / #count) p");
$products = $stmt->fetchAll(PDO::FETCH_ASSOC);
The Idea is to "inject" a new column with randomized values, and then sort by this column. The generation of and sorting by this injected column is way faster than the "ORDER BY RAND()" command.
There "might" be one caveat: You have to include the WHERE query twice.

What about creating another table containing only items with image ? This table will be much lighter as it will contain only one-third of the items the original table has !
------------------------------------------
|ID | Item ID (on the original table)|
------------------------------------------
|0 | 0 |
------------------------------------------
|1 | 123 |
------------------------------------------
.
.
.
------------------------------------------
|10 000 | 30 000 |
------------------------------------------
You can then generate three random IDs in the PHP part of the code and just fetch'em the from the database.

I've been testing the following bunch of SQLs on a 10M-record, poorly designed database.
SELECT COUNT(ID)
INTO #count
FROM Products
WHERE HasImages = 1;
PREPARE random_records FROM
'(
SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
) UNION (
SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
) UNION (
SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1
)';
SET #l1 = ROUND(RAND() * #count);
SET #l2 = ROUND(RAND() * #count);
SET #l3 = ROUND(RAND() * #count);
EXECUTE random_records USING #l1
, #l2
, #l3;
DEALLOCATE PREPARE random_records;
It took almost 7 minutes to get the three results. But I'm sure its performance will be much better in your case. Yet if you are looking for a better performance I suggest the following ones as they took less than 30 seconds for me to get the job done (on the same database).
SELECT COUNT(ID)
INTO #count
FROM Products
WHERE HasImages = 1;
PREPARE random_records FROM
'SELECT * FROM Products WHERE HasImages = 1 LIMIT ?, 1';
SET #l1 = ROUND(RAND() * #count);
SET #l2 = ROUND(RAND() * #count);
SET #l3 = ROUND(RAND() * #count);
EXECUTE random_records USING #l1;
EXECUTE random_records USING #l2;
EXECUTE random_records USING #l3;
DEALLOCATE PREPARE random_records;
Bear in mind that both these commands require MySQLi driver in PHP if you want to execute them in one go. And their only difference is that the later one requires calling MySQLi's next_result method to retrieve all three results.
My personal belief is that this is the fastest way to do this.

On the off-chance that you're willing to accept an 'outside the box' type of answer, I'm going to repeat what I said in some of the comments.
The best way to approach your problem is to cache your data in advance (be that in an external JSON or XML file, or in a separate database table, possibly even an in-memory table).
This way you can schedule your performance-hit on the products table to times when you know the server will be quiet, and reduce your worry about creating a performance hit at "random" times when the visitor arrives to your site.
I'm not going to suggest an explicit solution, because there are far too many possibilities on how to build a solution. However, the answer suggested by #ahmed is not silly. If you don't want to create a join in your query, then simply load more of the data that you require into the new table instead.

Need to check which row has most likes between two tables

I'm fairly new to MYSQL!
I need to make a SQL query where i check how many likes a row has (between two tables)
I found another question that looked like mine, but i can't get it to return anything (even though it doesn't create an error.
query:
SELECT *
FROM likes
INNER JOIN (SELECT likes.like_id,
COUNT(*) AS likes
FROM likes
INNER JOIN uploads ON likes.upload_id=uploads.upload_id
WHERE uploads.upload_date >= DATE_SUB(CURDATE(), INTERVAL 8 DAY)
GROUP BY uploads.upload_id) x ON x.like_id = likes.like_id
ORDER BY x.likes DESC
Link to the original question:
MySQL, Need to select rows that has the most frequent values in another table
Help is much appreciated
Kind regards,
Mathias

Since you didn't post your table structure I'll have to guess..
select someid, count(*) cnt from
(
select * from table1 t1 join table2 t2 on t1.someid = t2.someid
) as q0 group by someid order by cnt desc;
It will need tweaking to fit your schema.

trying to optimize mysql query but when i add ORDER BY its takes long time

this is my query
SELECT U.id AS user_id,C.name AS country,
CASE
WHEN U.facebook_id > 0 THEN CONCAT(F.first_name,' ',F.last_name)
WHEN U.twitter_id > 0 THEN T.name
WHEN U.regular_id > 0 THEN CONCAT(R.first,' ',R.last)
END AS name,
FROM user U LEFT OUTER JOIN regular R
ON U.regular_id = R.id
LEFT OUTER JOIN twitter T
ON U.twitter_id = T.id
LEFT OUTER JOIN facebook F
ON U.facebook_id = F.id
LEFT OUTER JOIN country C
ON U.country_id = C.id
WHERE (CONCAT(F.first_name,' ',F.last_name) LIKE '%' OR T.name LIKE '%' OR CONCAT(R.first,' ',R.last) LIKE '%') AND U.active = 1
LIMIT 100
its realy fast, but in the EXPLAIN it don't show me it uses INDEXES (there is indexes).
but when i add ORDER BY 'name' before the LIMIT its takes long time why? there is a way to solve it?
tables: users 150000, regular 50000, facebook 50000, twitter 50000, country 250 and growing!

It takes a long time because it's a composite column, not a table column. The name column is a result of a case selection, and unlike simple selects with multiple join, MySQL has to use a different sorting algorithm for this kind of data.
I'm talking from ignorance here, but you could store the data in a temporary table and then sort it. It may go faster since you can create indexes for it but it won't be as fast, because of the different storage type.
UPDATE 2011-01-26
CREATE TEMPORARY TABLE `short_select`
SELECT U.id AS user_id,C.name AS country,
CASE
WHEN U.facebook_id > 0 THEN CONCAT(F.first_name,' ',F.last_name)
WHEN U.twitter_id > 0 THEN T.name
WHEN U.regular_id > 0 THEN CONCAT(R.first,' ',R.last)
END AS name,
FROM user U LEFT OUTER JOIN regular R
ON U.regular_id = R.id
LEFT OUTER JOIN twitter T
ON U.twitter_id = T.id
LEFT OUTER JOIN facebook F
ON U.facebook_id = F.id
LEFT OUTER JOIN country C
ON U.country_id = C.id
WHERE (CONCAT(F.first_name,' ',F.last_name) LIKE '%' OR T.name LIKE '%' OR CONCAT(R.first,' ',R.last) LIKE '%') AND U.active = 1
LIMIT 100;
ALTER TABLE `short_select` ADD INDEX(`name`); --add successive columns if you are going to order by them as well.
SELECT * FROM `short_select`
ORDER BY 'name'; -- same as above
Remember temporary tables are dropped upon connection termination, so you don't have to clean them, but you should anyway.

Without actually knowing your DB structure, and assuming you have all of the proper indexes on everything. An Order By statement takes some variable amount of time to sort the elements being returned by a query (index or not). If it is only 10 rows, it will seem almost instant, if you get 2000 rows, it will be a little slower, if you are sorting 15k rows joined across multiple tables, it is going to take some time to sort the returned result. Also make sure your adding indexes to the fields your sorting by. You may want to take the desired result and store everything in a presorted stub table for faster querying later as well (if you query this sorted result set often)

You need to create first 100 records from each name table separately, then union the results, join them with user and country, order and limit the output:
SELECT u.id AS user_id, c.name AS country, n.name
FROM (
SELECT facebook_id AS id, CONCAT(F.first_name, ' ', F.last_name) AS name
FROM facebook
ORDER BY
first_name, last_name
LIMIT 100
UNION ALL
SELECT twitter_id, name
FROM twitter
WHERE twitter_id NOT IN
(
SELECT facebook_id
FROM facebook
)
ORDER BY
name
LIMIT 100
UNION ALL
SELECT regular_id, CONCAT(R.first, ' ', R.last)
FROM regular
WHERE regular_id NOT IN
(
SELECT facebook_id
FROM facebook
)
AND
regular_id NOT IN
(
SELECT twitter_id
FROM twitter
)
ORDER BY
first, last
LIMIT 100
) n
JOIN user u
ON u.id = n.id
JOIN country с
ON c.id = u.country_id
Create the following indexes:
facebook (first_name, last_name)
twitter (name)
regular (first, last)
Note that this query orders slightly differently from your original one: in this query, 'Ronnie James Dio' would be sorted after 'Ronnie Scott'.

The use of functions on the columns prevent indexes from being used.
CONCAT(F.first_name,' ',F.last_name)
The result of the function is not indexed, even though the individual columns may be. Either you have to rewrite the conditions to query the name columns individually, or you have to store and index the result of that function (such as a "full name" column).
The index on [user.active] is unlikely to help you if most of the users are active.
I don't know what your application is all about, but I wonder if it hadn't been easier if you ditched the foreign keys in User table and instead put the UserID as a foreign key in the other tables instead.

Find rownum from mySQL result set for ranking

I have a table that stores scores from users of my game - what I want to be able to do if possible is find their rank using mySQL alone (because if the amount of players increases exponentially the php loop times to parse the entire database will increase dramatically).
So far I have been able to get this statement
select #rownum:=#rownum+1 'rank', s.* from top100 s, (select #rownum:=0) r order by score desc
to return a result set with rankings applied - what I then need to be able to do is find a single item within that using a subquery to find the players last insert_id from a previous insert.
Any help would be greatly appreciated.

SELECT t.*,
(SELECT COUNT(*)
FROM top100 t2
WHERE t2.score > t.score) AS rank
FROM top100 t
WHERE id = LAST_INSERT_ID()

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

inner join large table with small table , how to speed up - php

Do not use inner join unless the two tables share the same primary key, or you'll get duplicate values (and of course a slower query).

Related

SQL Execution tooks too long

How to quickly SELECT 3 random records from a 30k MySQL table with a where filter by a single query?

Need to check which row has most likes between two tables

trying to optimize mysql query but when i add ORDER BY its takes long time

Find rownum from mySQL result set for ranking

Categories

Resources