query order by rand() too slow - php

i have a large table in a database called offers(over 300.000 rows).
when i execute the below query it takes over 3 secs.
$sql = "SELECT * FROM `offers` WHERE (`start_price` / `price` >= 2) ORDER BY RAND() LIMIT 1";
Table offers
`id` int(11) NOT NULL,
`title` text NOT NULL,
`description` text NOT NULL,
`image` text NOT NULL,
`price` float NOT NULL,
`start_price` float NOT NULL,
`brand` text NOT NULL
is there any way to make it faster? i want to select one random row (start_price / price >= 2)

I think your problem is that your query requires a full table scan for the WHERE clause. The order by does make things worse -- depending on the volume that pass the filter.
You might consider storing this number in the table and adding an index to it:
alter table offers add column start_to_price float;
update offers
set start_to_price = start_price / price;
create index idx_offers_s2p on offers(start_to_price);
Then, your query might be fast:
SELECT o.*
FROM `offers` o
WHERE start_to_price >= 2
ORDER BY RAND()
LIMIT 1;
If performance is still a problem, then I would be likely to use a where clause first:
SELECT o.*
FROM `offers` o CROSS JOIN
(select COUNT(*) as cnt from offers where start_to_price >= 2) oo
WHERE rand() <= 10 / cnt
ORDER BY RAND()
LIMIT 1;
This pulls about 10 rows at random and then chooses one of them.
If these don't work, then there are other solutions that get progressively more complicated.

One option to make this faster is to ensure that you leverage indexing:
How does database indexing work?
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
So in this case ensure that you have an index for start_price together with price and in that exact order.
Another way is to optimise the coalition that is in use for the database and tables, so choose utf8mb4 over utf8 and if sorting/localisation is not being an issue for you and you want to be completely anal then general_ci over unicode_ci:
What's the difference between utf8_general_ci and utf8_unicode_ci
Despite the MyISAM storage engine delivering faster read speeds (http://www.rackspace.com/knowledge_center/article/mysql-engines-myisam-vs-innodb) I have found that there are various tweaks available to the InnoDB storage engine that can speed things up more so than I was able to achieve using MyISAM:
https://dba.stackexchange.com/questions/5666/possible-to-make-mysql-use-more-than-one-core?lq=1
So something like the following would be another option:
[mysqld] // Don't play here unless you have read and understand what is going on
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_buffer_pool_size=2G
Yet another option is to take a look at alternate storage engines: https://www.percona.com/software/mysql-database/percona-server/benchmarks
You could also see the other answers for refactoring of your query :)

There are alternatives. The one I have used is described here:-
http://jan.kneschke.de/projects/mysql/order-by-rand/
Essentially you generate a random number that is between your min and max id, and then join that against your result set (using >=), with a limit of 1. So you get a result set starting from a random point in your full results and then just grab the first record.
Down side is that if you id fields are not equally distributed then it isn't truly random
Quick example code, assuming your offers table has a unique key called id:-
SELECT offers.*
FROM offers
INNER JOIN
(
SELECT RAND( ) * ( MAX( Id ) - MIN( Id ) ) + MIN( Id ) AS Id
FROM offers
WHERE (`start_price` / `price` >= 2)
) AS r2
ON offers.Id >= r2.Id
WHERE (`start_price` / `price` >= 2)
ORDER BY offers.Id LIMIT 1

Related

Mysql where between query optimization

Below is the format of the database of Autonomous System Numbers ( download and parsed from this site! ).
range_start range_end number cc provider
----------- --------- ------ -- -------------------------------------
16778240 16778495 56203 AU AS56203 - BIGRED-NET-AU Big Red Group
16793600 16809983 18144 AS18144
745465 total rows
A Normal query looks like this:
select * from table where 3232235520 BETWEEN range_start AND range_end
Works properly but I query a huge number of IPs to check for their AS information which ends up taking too many calls and time.
Profiler Snapshot:
Blackfire profiler snapshot
I've two indexes:
id column
a combine index on the range_start and range_end column as both the make unique row.
Questions:
Is there a way to query a huge number of IPs in a single query?
multiple where (IP between range_start and range_end) OR where (IP between range_start and range_end) OR ... works but I can't get the IP -> row mapping or which rows are retrieved for which IP.
Any suggestions to change the database structure to optimize the query speed and decrease the time?
Any help will be appreciated! Thanks!
It is possible to query more than one IP address. Several approaches we could take. Assuming range_start and range_end are defined as integer types.
For a reasonable number of ip addresses, we could use an inline view:
SELECT i.ip, a.*
FROM ( SELECT 3232235520 AS ip
UNION ALL SELECT 3232235521
UNION ALL SELECT 3232235522
UNION ALL SELECT 3232235523
UNION ALL SELECT 3232235524
UNION ALL SELECT 3232235525
) i
LEFT
JOIN ip_to_asn a
ON a.range_start <= i.ip
AND a.range_end >= i.ip
ORDER BY i.ip
This approach will work for a reasonable number of IP addresses. The inline view could be extended with more UNION ALL SELECT to add additional IP addresses. But that's not necessarily going to work for a "huge" number.
When we get "huge", we're going to run into limitations in MySQL... maximum size of a SQL statement limited by max_allowed_packet, there may be a limit on the number of SELECT that can appear.
The inline view could be replaced with a temporary table, built first.
DROP TEMPORARY TABLE IF EXISTS _ip_list_;
CREATE TEMPORARY TABLE _ip_list_ (ip BIGINT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO _ip_list_ (ip) VALUES (3232235520),(3232235521),(3232235522),...;
...
INSERT INTO _ip_list_ (ip) VALUES (3232237989),(3232237990);
Then reference the temporary table in place of the inline view:
SELECT i.ip, a.*
FROM _ip_list_ i
LEFT
JOIN ip_to_asn a
ON a.range_start <= i.ip
AND a.range_end >= i.ip
ORDER BY i.ip ;
And then drop the temporary table:
DROP TEMPORARY TABLE IF EXISTS _ip_list_ ;
Some other notes:
Churning database connections is going to degrade performance. There's a significant amount overhead in establishing and tearing down a connection. That overhead get noticeable if the application is repeatedly connecting and disconnecting, if its doing that for every SQL statement being issued.
And running an individual SQL statement also has overhead... the statement has to be sent to the server, the statement parsed for syntax, evaluated from semantics, choose an execution plan, execute the plan, prepare a resultset, return the resultset to the client. And this is why it's more efficient to process set wise rather than row wise. Processing RBAR (row by agonizing row) can be very slow, compared to sending a statement to the database and letting it process a set in one fell swoop.
But there's a tradeoff there. With ginormous sets, things can start to get slow again.
Even if you can process two IP addresses in each statement, that halves the number of statements that need to be executed. If you do 20 IP addresses in each statement, that cuts down the number of statements to 5% of the number that would be required a row at a time.
And the composite index already defined on (range_start,range_end) is appropriate for this query.
FOLLOWUP
As Rick James points out in a comment, the index I earlier said was "appropriate" is less than ideal.
We could write the query a little differently, that might make more effective use of that index.
If (range_start,range_end) is UNIQUE (or PRIMARY) KEY, then this will return one row per IP address, even when there are "overlapping" ranges. (The previous query would return all of the rows that had a range_start and range_end that overlapped with the IP address.)
SELECT t.ip, a.*
FROM ( SELECT s.ip
, s.range_start
, MIN(e.range_end) AS range_end
FROM ( SELECT i.ip
, MAX(r.range_start) AS range_start
FROM _ip_list_ i
LEFT
JOIN ip_to_asn r
ON r.range_start <= i.ip
GROUP BY i.ip
) s
LEFT
JOIN ip_to_asn e
ON e.range_start = s.range_start
AND e.range_end >= s.ip
GROUP BY s.ip, s.range_start
) t
LEFT
JOIN ip_to_asn a
ON a.range_start = t.range_start
AND a.range_end = t.range_end
ORDER BY t.ip ;
With this query, for the innermost inline view query s, the optimizer might be able to make effective use of an index with a leading column of range_start, to quickly identify the "highest" value of range_start (that is less than or equal to the IP address). But with that outer join, and with the GROUP BY on i.ip, I'd really need to look at the EXPLAIN output; it's only conjecture what the optimizer might do; what is important is what the optimizer actually does.)
Then, for inline view query e, MySQL might be able to make more effective use of the composite index on (range_start,range_end), because of the equality predicate on the first column, and the inequality condition on MIN aggregate on the second column.
For the outermost query, MySQL will surely be able to make effective use of the composite index, due to the equality predicates on both columns.
A query of this form might show improved performance, or performance might go to hell in a handbasket. The output of EXPLAIN should give a good indication of what's going on. We'd like to see "Using index for group-by" in the Extra column, and we only want to see a "Using filesort" for the ORDER BY on the outermost query. (If we remove the ORDER BY clause, we want to not see "Using filesort" in the Extra column.)
Another approach is to make use of correlated subqueries in the SELECT list. The execution of correlated subqueries can get expensive when the resultset contains a large number of rows. But this approach can give satisfactory performance for some use cases.
This query depends on no overlapping ranges in the ip_to_asn table, and this query will not produce the expected results when overlapping ranges exist.
SELECT t.ip, a.*
FROM ( SELECT i.ip
, ( SELECT MAX(s.range_start)
FROM ip_to_asn s
WHERE s.range_start <= i.ip
) AS range_start
, ( SELECT MIN(e.range_end)
FROM ip_to_asn e
WHERE e.range_end >= i.ip
) AS range_end
FROM _ip_list_ i
) r
LEFT
JOIN ip_to_asn a
ON a.range_start = r.range_start
AND a.range_end = r.range_end
As a demonstration of why overlapping ranges will be a problem for this query, given a totally goofy, made up example
range_start range_end
----------- ---------
.101 .160
.128 .244
Given an IP address of .140, the MAX(range_start) subquery will find .128, the MIN(range_end) subquery will find .160, and then the outer query will attempt to find a matching row range_start=.128 AND range_end=.160. And that row just doesn't exist.
This is a duplicate of the question here however I'm not voting to close it, as the accepted answer in that question is not very helpful; the answer by Quassnoi is much better (but it only links to the solution).
A linear index is not going to help resolve a database of ranges. The solution is to use geospatial indexing (available in MySQL and other DBMS). An added complication is that MySQL geospatial indexing only works in 2 dimensions (while you have a 1-D dataset) so you need to map this to 2-dimensions.
Hence:
CREATE TABLE IF NOT EXISTS `inetnum` (
`from_ip` int(11) unsigned NOT NULL,
`to_ip` int(11) unsigned NOT NULL,
`netname` varchar(40) default NULL,
`ip_txt` varchar(60) default NULL,
`descr` varchar(60) default NULL,
`country` varchar(2) default NULL,
`rir` enum('APNIC','AFRINIC','ARIN','RIPE','LACNIC') NOT NULL default 'RIPE',
`netrange` linestring NOT NULL,
PRIMARY KEY (`from_ip`,`to_ip`),
SPATIAL KEY `rangelookup` (`netrange`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii;
Which might be populated with....
INSERT INTO inetnum
(from_ip, to_ip
, netname, ip_txt, descr, country
, netrange)
VALUES
(INET_ATON('127.0.0.0'), INET_ATON('127.0.0.2')
, 'localhost','127.0.0.0-127.0.0.2', 'Local Machine', '.',
GEOMFROMWKB(POLYGON(LINESTRING(
POINT(INET_ATON('127.0.0.0'), -1),
POINT(INET_ATON('127.0.0.2'), -1),
POINT(INET_ATON('127.0.0.2'), 1),
POINT(INET_ATON('127.0.0.0'), 1),
POINT(INET_ATON('127.0.0.0'), -1))))
);
Then you might want to create a function to wrap the rather verbose SQL....
DROP FUNCTION `netname2`//
CREATE DEFINER=`root`#`localhost` FUNCTION `netname2`(p_ip VARCHAR(20) CHARACTER SET ascii) RETURNS varchar(80) CHARSET ascii
READS SQL DATA
DETERMINISTIC
BEGIN
DECLARE l_netname varchar(80);
SELECT CONCAT(country, '/',netname)
INTO l_netname
FROM inetnum
WHERE MBRCONTAINS(netrange, GEOMFROMTEXT(CONCAT('POINT(', INET_ATON(p_ip), ' 0)')))
ORDER BY (to_ip-from_ip)
LIMIT 0,1;
RETURN l_netname;
END
And therefore:
SELECT netname2('127.0.0.1');
./localhost
Which uses the index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE inetnum range rangelookup rangelookup 34 NULL 1 Using where; Using filesort
(and takes around 10msec to find a record from the combined APNIC,AFRINIC,ARIN,RIPE and LACNIC datasets on the very low spec VM I'm using here)
You can compare IP ranges using MySQL. This question might contain an answer you're looking for: MySQL check if an IP-address is in range?
SELECT * FROM TABLE_NAME WHERE (INET_ATON("193.235.19.255") BETWEEN INET_ATON(ipStart) AND INET_ATON(ipEnd));
You will likely want to index your database. This optimizes the time it takes to search your database, similar to the index you will find in the back of a textbook, but for databases:
ALTER TABLE `table` ADD INDEX `name` (`column_id`)
EDIT: Apparently INET_ATON cannot be used on indexed databases, so you would have to pick one of these!

Caching big data, alternative query or other indexes?

I'm with a problem, I am working on highscores, and for those highscores you need to make a ranking based on skill experience and latest update time (to see who got the highest score first incase skill experience is the same).
The problem is that with the query I wrote, it takes 28 (skills) x 0,7 seconds to create a personal highscore page to see what their rank is on the list. Requesting this in the browser is just not doable, it takes way too long for the page to load and I need a solution for my issue.
MySQL version: 5.5.47
The query I wrote:
SELECT rank FROM
(
SELECT hs.playerID, (#rowID := #rowID + 1) AS rank
FROM
(
SELECT hs.playerID
FROM highscores AS hs
INNER JOIN overall AS o ON hs.playerID = o.playerID
WHERE hs.skillID = ?
AND o.game_mode = ?
ORDER BY hs.skillExperience DESC,
hs.updateTime ASC
) highscore,
(SELECT #rowID := 0) r
) data
WHERE data.playerID = ?
As you can see I first have to create a whole resultset that gives me a full ranking for that game mode and skill, and then I have to select the rank based on the playerID after that, the problem is that I cannot let the query run untill it finds the result, because mysql doesn't offer such function, if I'd specifiy where data.playerID = ? in the query above, it would give back 1 result, meaning the ranking will be 1 as well.
The highscores table has 550k rows
What I have tried was storing the resultset for each skillid/gamemode combination in a temp table json_encoded, tried storing on files, but it ended up being quite slow as well, because the files are really huge and it takes time to process.
Highscores table:
CREATE TABLE `highscores` (
`playerID` INT(11) NOT NULL,
`skillID` INT(10) NOT NULL,
`skillLevel` INT(10) NOT NULL,
`skillExperience` INT(10) NOT NULL,
`updateTime` BIGINT(20) NOT NULL,
PRIMARY KEY (`playerID`, `skillID`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Overall table has got 351k rows
Overall table:
CREATE TABLE `overall` (
`playerID` INT(11) NOT NULL,
`playerName` VARCHAR(50) NOT NULL,
`totalLevel` INT(10) NOT NULL,
`totalExperience` BIGINT(20) NOT NULL,
`updateTime` BIGINT(20) NOT NULL,
`game_mode` ENUM('REGULAR','IRON_MAN','IRON_MAN_HARDCORE') NOT NULL DEFAULT 'REGULAR',
PRIMARY KEY (`playerID`, `playerName`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Explain Select result from the query:
Does anybody have a solution for me?
No useful index for WHERE
The last 2 lines of the EXPLAIN (#3 DERIVED):
WHERE hs.skillID = ?
AND o.game_mode = ?
Since neither table has a suitable index to use for the WHERE clause, to optimizer decided to do a table scan of one of them (overall), then reach into the other (highscores). Having one of these indexes would help, at least some:
highscores: INDEX(skillID)
overall: INDEX(game_mode, ...) -- note that an index only on a low-cardinality ENUM is rarely useful.
(More in a minute.)
No useful index for ORDER BY
The optimizer sometimes decides to use an index for the ORDER BY instead of for the WHERE. But
ORDER BY hs.skillExperience DESC,
hs.updateTime ASC
cannot use an index, even though both are in the same table. This is because DESC and ASC are different. Changing ASC to DESC would have an impact on the resultset, but would allow
INDEX(skillExperience, updateTime)
to be used. Still, this may not be optimal. (More in a minute.)
Covering index
Another form of optimization is to build a "covering index". That is an index that has all the columns that the SELECT needs. Then the query can be performed entirely in the index, without reaching over to the data. The SELECT in question is the innermost:
( SELECT hs.playerID
FROM highscores AS hs
INNER JOIN overall AS o ON hs.playerID = o.playerID
WHERE hs.skillID = ?
AND o.game_mode = ?
ORDER BY hs.skillExperience DESC, hs.updateTime ASC
) highscore,
For hs: INDEX(skillID, skillExperience, updateTime, playerID) is "covering" and has the most important item (skillID, from the WHERE) first.
For o: INDEX(game_mode, playerID) is "covering". Again, game_mode must be first.
If you change the ORDER BY to be DESC and DESC, then add another index for hs: INDEX(skillExperience, updateTime, skillID, playerID). Now the first 2 columns must be in that order.
Conclusion
It is not obvious which of those indexes the optimizer would prefer. I suggest you add both and let it choose.
I believe that (1) the innermost query is consuming the bulk of time, and (2) there is nothing to optimize in the outer SELECTs. So, I leave that as my recommendation.
Much of this is covered in my Indexing Cookbook.
Important subanswer: How frequently change rank of all players? Hmm.. Need explain.. You want realtime statistics? No, you dont want realtime )) You must select time interval for update statistics, e.g. 10 minutes. For this case you can run cronjob for insert new rank statistics into separated table like this:
/* lock */
TRUNCATE TABLE rank_stat; /* maybe update as unused/old for history) instead truncate */
INSERT INTO rank_stat (a, b, c, d) <your query here>;
/* unlock */
and users (browsers) will select readonly statistics from this table (can be split to pages).
But if rank stat not frequently change, e.g. you can recalculate it for all wanted game events and/or acts/achievs of players.
This is recommedations only. Because you not explain full environment. But I think you can found right solution with this recommendations.
It doesn't look like you really need to rank everyone, you just want to find out how many people are ahead of the current player. You should be able to get a simple count of how many players have better scores & dates than the current player which represents the current player's ranking.
SELECT count(highscores.id) as rank FROM highscores
join highscores playerscore
on playerscore.skillID = highscores.skillID
and playerscore.gamemode = highscores.gamemode
where highscores.skillID = ?
AND highscores.gamemode = ?
and playerscore.playerID = ?
and (highscores.skillExperience > playerscore.skillExperience
or (highscores.skillExperience = playerscore.skillExperience
and highscores.updateTime > playerscore.updateTime));
(I joined the table to itself and aliased the second instance as playerscore so it was slightly less confusing)
You could probably even simplify it to one query by grouping and parsing the results within your language of choice.
SELECT
highscores.gamemode as gamemode,
highscores.skillID as skillID,
count(highscores.id) as rank
FROM highscores
join highscores playerscore
on playerscore.skillID = highscores.skillID
and playerscore.gamemode = highscores.gamemode
where playerscore.playerID = ?
and (highscores.skillExperience > playerscore.skillExperience
or (highscores.skillExperience = playerscore.skillExperience
and highscores.updateTime > playerscore.updateTime));
group by highscores.gamemode, highscores.skillID;
Not quite sure about the grouping bit though.

How to improve Mysql database performance without changing the db structure

I have a database that is already in use and I have to improve the performance of the system that's using this database.
There are 2 major queries running about 1000 times in a loop and this queries have inner joins to 3 other tables each. This in turn is making the system very slow.
I tried actually to remove the query from the loop and fetch all the data only once and process it in PHP. But this is putting to much load on the memory (RAM) and the system is hanging if 2 or more clients try to use the system.
There is a lot of data in the tables even after removing the expired data .
I have attached the query below.
Can anyone help me with this issue ?
select * from inventory
where (region_id = 38 or region_id = -1)
and (tour_opp_id = 410 or tour_opp_id = -1)
and room_plan_id = 141 and meal_plan_id = 1 and bed_type_id = 1 and hotel_id = 1059
and FIND_IN_SET(supplier_code, 'QOA,QTE,QM,TEST,TEST1,MQE1,MQE3,PERR,QKT')
and ( ('2014-11-14' between from_date and to_date) )
order by hotel_id desc ,supplier_code desc, region_id desc,tour_opp_id desc,inventory.inventory_id desc
SELECT * ,pinfo.fri as pi_day_fri,pinfoadd.fri as pa_day_fri,pinfochld.fri as pc_day_fri
FROM `profit_markup`
inner join profit_markup_info as pinfo on pinfo.profit_id = profit_markup.profit_markup_id
inner join profit_markup_add_info as pinfoadd on pinfoadd.profit_id = profit_markup.profit_markup_id
inner join profit_markup_child_info as pinfochld on pinfochld.profit_id = profit_markup.profit_markup_id
where profit_markup.hotel_id = 1059 and (`booking_channel` = 1 or `booking_channel` = 2)
and (`rate_region` = -1 or `rate_region` = 128)
and ( ( period_from <= '2014-11-14' and period_to >= '2014-11-14' ) )
ORDER BY profit_markup.hotel_id DESC,supplier_code desc, rate_region desc,operators_list desc, profit_markup_id DESC
Since we have not seen your SHOW CREATE TABLES; and EXPLAIN EXTENDED plan it is hard to give you 1 answer
But generally speaking in regard to your query "BTW I re-wrote below"
SELECT
hotel_id, supplier_code, region_id, tour_opp_id, inventory_id
FROM
inventory
WHERE
region_id IN (38, -1)
AND tour_opp_id IN (410, -1)
AND room_plan_id IN (141, 1)
AND bed_type_id IN (1, 1059)
AND supplier_code IN ('QOA', 'QTE', 'QM', 'TEST', 'TEST1', 'MQE1', 'MQE3', 'PERR', 'QKT')
AND ('2014-11-14' BETWEEN from_date AND to_date )
ORDER BY
hotel_id DESC, supplier_code DESC, region_id DESC, tour_opp_id DESC, inventory_id DESC
Do not use * to get all the columns. You should list the column that you really need. Using * is just a lazy way of writing a query. limiting the columns will limit the data size that is being selected.
How often is the records in the inventory are being updates/inserted/delete? If not too often then you can use consider using SQL_CACHE. However, caching a query will cause you problems if you use it and the inventory table is updated very often. In addition, to use query cache you must check the value of query_cache_type on your server. SHOW GLOBAL VARIABLES LIKE 'query_cache_type';. If this is set to "0" then the cache feature is disabled and SQL_CACHE will be ignored. If it is set to 1 then the server will cache all queries unless you tell it not too using NO_SQL_CACHE. If the option is set to 2 then MySQL will cache the query only where SQL_CACHE clause is used. here is documentation about query_cache_type
If you have an index on those following column in this order it will help you (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id)
ALTER TABLE inventory
ADD INDEX (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id);
If possible increase sort_buffer_size on your server as most likely you issue here is that your are doing too much sorting.
As for the second query "BTW I re-wrote below"
SELECT
*, pinfo.fri as pi_day_fri,
pinfoadd.fri as pa_day_fri,
pinfochld.fri as pc_day_fri
FROM
profit_markup
INNER JOIN
profit_markup_info AS pinfo ON pinfo.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_add_info AS pinfoadd ON pinfoadd.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_child_info AS pinfochld ON pinfochld.profit_id = profit_markup.profit_markup_id
WHERE
profit_markup.hotel_id = 1059
AND booking_channel IN (1, 2)
AND rate_region IN (-1, 128)
AND period_from <= '2014-11-14'
AND period_to >= '2014-11-14'
ORDER BY
profit_markup.hotel_id DESC, supplier_code DESC, rate_region DESC,
operators_list DESC, profit_markup_id DESC
Again eliminate the use of * from your query
Make sure that the following columns have the same type/collation and same size. pinfo.profit_id, profit_markup.profit_markup_id, pinfoadd.profit_id, pinfochld.profit_id and each one have to have an index on every table. If the columns have different types then MySQL will have to convert the data every time to join the records. Even if you have index it will be slower. Also, if those column are characters type (ie. VARCHAR()) make sure they are of the CHAR() with a collation of latin1_general_ci as this will be faster for finding ID, but if you are using INT() even better.
Use the 3rd and 4th trick I listed for the previous query
Try using STRAIGHT_JOIN "you must know what your doing here or it will bite you!" Here is a good thread about this When to use STRAIGHT_JOIN with MySQL
I hope this helps.
For the first query, I am not sure if you can do much (assuming you have already indexed the fields you are ordering by) apart from replacing the * with column names (Don't expect this to increase the performance drastically).
For the second query, before you go through the loop and put in selection arguments, you could create a view with all the tables joined and ordered then make a prepared statement to select from the view and bind arguments in the loop.
Also, if your php server and the database server are in two different places, it is better if you did the selection through a stored procedure in the database.
(If nothing works out, then memcache is the way to go... Although I have personally never done this)
Here you have increase query performance not an database performance.
For both queries first check index is available on WHERE and ON(Join) clause columns, if index is missing then you have to add index to improve query performance.
Check explain plane before create index.
If possible show me the explain plane of both query that will help us.

MySQL 5.0 slow for order by index

I have this query:
SELECT * FROM projects WHERE current = 1 ORDER BY id DESC LIMIT 0,20
id is the primary key, and there is an index created for current and status field
CREATE INDEX currentstatus ON projects (current, status)
The table has 30,000+ rows, and this query runs in about 4.0s
I would like to use force index for group by (id) but MySQL 5.0 does not support it.
Explain shows the currentstatus is used for the query, but I would have liked to use currentstatus for the WHERE clause, and PRIMARY for the ORDER BY clause
We cannot upgrade the database as this is heavily used and we cannot have any downtime. I am wondering if there is a way to optimize this query in MySQL 5.0 so it will use the primary key index when sorting?
EDIT: The ideal query, as an example, in MySQL 5.1+ would be
SELECT * FROM projects FORCE INDEX FOR ORDER BY (PRIMARY) WHERE current = 1 ORDER BY id DESC LIMIT 0,20;
In MySQL 5.0 as you have mentioned it is not possible to force / use a specific index for only the ORDER BY clause.
http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
And MySQL 5.1+ does indeed allow this.
It is for this reason they probably added this feature in for MySQL 5.1
I have found a solution that is more of a workaround for the issue. This will at least allow for much faster queries:
SELECT * FROM projects WHERE current = 1 AND id > ((SELECT MAX(id) FROM projects) - 1000) ORDER BY id DESC LIMIT 20;
This will default to using the PRIMARY key id for the order by / where clauses and it is a small enough subset to scan through that this query runs in 0.00s vs 4.35s on my current data set. 1000 rows will guarantee at least 20 records with current = 1 for this case anyway.

Returning random rows from mysql database without using rand()

I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?
Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.
Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.
You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);
For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.
Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)

Categories