Get a random row from the table using Kohana3 framework - fast

Get a random row from the table using Kohana3 framework - fast - php

How to make this code worked in Kohana3 framework? I have a syntax problem with it.
ORM::factory('table1')
->where ( 'id', '=', ceil( DB::expr('rand()') * [SELECT max(id) from table1] ) )
->find();
That's how I want to select a random row from the table.
This works fine for small tables (that contains than 1000 rows), but not for big tables:
ORM::factory('table1')
->order_by(DB::expr('RAND()'))
->find();
The standard mysql equivalent of what I want is something like this:
SELECT name
FROM table1 JOIN
(SELECT CEIL(RAND() *
(SELECT MAX(id)
FROM table1)) AS id
) AS r2
USING (id);
So, how to convert that into a working code for Kohana3 framework?
P.S. This method works fine if there are no holes (no deleted rows) in the table, and that's fine in my case.

What you are trying to do is really a problem with MySQL. You need the right query to have it perform for any size table. A more scalable implementation is to use PHP to generate a random number less than the max ID and then select on that.
Get the highest ID:
SELECT MAX(id) max_id FROM table1
In PHP get the "id" to select:
$rand_id = mt_rand(0, $max_id-1);
Then create your query to select a random record, using a LIMIT 1 so it works even if there are holes.
SELECT * FROM table1 WHERE id>=$rand_id ORDER BY id LIMIT 1
Doing it all in MySQL can be done, but it's not going to be easy to read or implement with a DB abstraction layer.

Related

How to order the ORDER BY using the IN() mysql? [duplicate]

I am wondering if there is away (possibly a better way) to order by the order of the values in an IN() clause.
The problem is that I have 2 queries, one that gets all of the IDs and the second that retrieves all the information. The first creates the order of the IDs which I want the second to order by. The IDs are put in an IN() clause in the correct order.
So it'd be something like (extremely simplified):
SELECT id FROM table1 WHERE ... ORDER BY display_order, name
SELECT name, description, ... WHERE id IN ([id's from first])
The issue is that the second query does not return the results in the same order that the IDs are put into the IN() clause.
One solution I have found is to put all of the IDs into a temp table with an auto incrementing field which is then joined into the second query.
Is there a better option?
Note: As the first query is run "by the user" and the second is run in a background process, there is no way to combine the 2 into 1 query using sub queries.
I am using MySQL, but I'm thinking it might be useful to have it noted what options there are for other DBs as well.

Use MySQL's FIELD() function:
SELECT name, description, ...
FROM ...
WHERE id IN([ids, any order])
ORDER BY FIELD(id, [ids in order])
FIELD() will return the index of the first parameter that is equal to the first parameter (other than the first parameter itself).
FIELD('a', 'a', 'b', 'c')
will return 1
FIELD('a', 'c', 'b', 'a')
will return 3
This will do exactly what you want if you paste the ids into the IN() clause and the FIELD() function in the same order.

See following how to get sorted data.
SELECT ...
FROM ...
WHERE zip IN (91709,92886,92807,...,91356)
AND user.status=1
ORDER
BY provider.package_id DESC
, FIELD(zip,91709,92886,92807,...,91356)
LIMIT 10

Two solutions that spring to mind:
order by case id when 123 then 1 when 456 then 2 else null end asc
order by instr(','||id||',',',123,456,') asc
(instr() is from Oracle; maybe you have locate() or charindex() or something like that)

If you want to do arbitrary sorting on a query using values inputted by the query in MS SQL Server 2008+, it can be done by creating a table on the fly and doing a join like so (using nomenclature from OP).
SELECT table1.name, table1.description ...
FROM (VALUES (id1,1), (id2,2), (id3,3) ...) AS orderTbl(orderKey, orderIdx)
LEFT JOIN table1 ON orderTbl.orderKey=table1.id
ORDER BY orderTbl.orderIdx
If you replace the VALUES statement with something else that does the same thing, but in ANSI SQL, then this should work on any SQL database.
Note:
The second column in the created table (orderTbl.orderIdx) is necessary when querying record sets larger than 100 or so. I originally didn't have an orderIdx column, but found that with result sets larger than 100 I had to explicitly sort by that column; in SQL Server Express 2014 anyways.

SELECT ORDER_NO, DELIVERY_ADDRESS
from IFSAPP.PURCHASE_ORDER_TAB
where ORDER_NO in ('52000077','52000079','52000167','52000297','52000204','52000409','52000126')
ORDER BY instr('52000077,52000079,52000167,52000297,52000204,52000409,52000126',ORDER_NO)
worked really great

Ans to get sorted data.
SELECT ...
FROM ...
ORDER BY FIELD(user_id,5,3,2,...,50) LIMIT 10

The IN clause describes a set of values, and sets do not have order.
Your solution with a join and then ordering on the display_order column is the most nearly correct solution; anything else is probably a DBMS-specific hack (or is doing some stuff with the OLAP functions in standard SQL). Certainly, the join is the most nearly portable solution (though generating the data with the display_order values may be problematic). Note that you may need to select the ordering columns; that used to be a requirement in standard SQL, though I believe it was relaxed as a rule a while ago (maybe as long ago as SQL-92).

Use MySQL FIND_IN_SET function:
SELECT *
FROM table_name
WHERE id IN (..,..,..,..)
ORDER BY FIND_IN_SET (coloumn_name, .., .., ..);

For Oracle, John's solution using instr() function works. Here's slightly different solution that worked -
SELECT id
FROM table1
WHERE id IN (1, 20, 45, 60)
ORDER BY instr('1, 20, 45, 60', id)

I just tried to do this is MS SQL Server where we do not have FIELD():
SELECT table1.id
...
INNER JOIN
(VALUES (10,1),(3,2),(4,3),(5,4),(7,5),(8,6),(9,7),(2,8),(6,9),(5,10)
) AS X(id,sortorder)
ON X.id = table1.id
ORDER BY X.sortorder
Note that I am allowing duplication too.

Give this a shot:
SELECT name, description, ...
WHERE id IN
(SELECT id FROM table1 WHERE...)
ORDER BY
(SELECT display_order FROM table1 WHERE...),
(SELECT name FROM table1 WHERE...)
The WHEREs will probably take a little tweaking to get the correlated subqueries working properly, but the basic principle should be sound.

My first thought was to write a single query, but you said that was not possible because one is run by the user and the other is run in the background. How are you storing the list of ids to pass from the user to the background process? Why not put them in a temporary table with a column to signify the order.
So how about this:
The user interface bit runs and inserts values into a new table you create. It would insert the id, position and some sort of job number identifier)
The job number is passed to the background process (instead of all the ids)
The background process does a select from the table in step 1 and you join in to get the other information that you require. It uses the job number in the WHERE clause and orders by the position column.
The background process, when finished, deletes from the table based on the job identifier.

I think you should manage to store your data in a way that you will simply do a join and it will be perfect, so no hacks and complicated things going on.
I have for instance a "Recently played" list of track ids, on SQLite i simply do:
SELECT * FROM recently NATURAL JOIN tracks;

How to improve Mysql database performance without changing the db structure

I have a database that is already in use and I have to improve the performance of the system that's using this database.
There are 2 major queries running about 1000 times in a loop and this queries have inner joins to 3 other tables each. This in turn is making the system very slow.
I tried actually to remove the query from the loop and fetch all the data only once and process it in PHP. But this is putting to much load on the memory (RAM) and the system is hanging if 2 or more clients try to use the system.
There is a lot of data in the tables even after removing the expired data .
I have attached the query below.
Can anyone help me with this issue ?
select * from inventory
where (region_id = 38 or region_id = -1)
and (tour_opp_id = 410 or tour_opp_id = -1)
and room_plan_id = 141 and meal_plan_id = 1 and bed_type_id = 1 and hotel_id = 1059
and FIND_IN_SET(supplier_code, 'QOA,QTE,QM,TEST,TEST1,MQE1,MQE3,PERR,QKT')
and ( ('2014-11-14' between from_date and to_date) )
order by hotel_id desc ,supplier_code desc, region_id desc,tour_opp_id desc,inventory.inventory_id desc
SELECT * ,pinfo.fri as pi_day_fri,pinfoadd.fri as pa_day_fri,pinfochld.fri as pc_day_fri
FROM `profit_markup`
inner join profit_markup_info as pinfo on pinfo.profit_id = profit_markup.profit_markup_id
inner join profit_markup_add_info as pinfoadd on pinfoadd.profit_id = profit_markup.profit_markup_id
inner join profit_markup_child_info as pinfochld on pinfochld.profit_id = profit_markup.profit_markup_id
where profit_markup.hotel_id = 1059 and (`booking_channel` = 1 or `booking_channel` = 2)
and (`rate_region` = -1 or `rate_region` = 128)
and ( ( period_from <= '2014-11-14' and period_to >= '2014-11-14' ) )
ORDER BY profit_markup.hotel_id DESC,supplier_code desc, rate_region desc,operators_list desc, profit_markup_id DESC

Since we have not seen your SHOW CREATE TABLES; and EXPLAIN EXTENDED plan it is hard to give you 1 answer
But generally speaking in regard to your query "BTW I re-wrote below"
SELECT
hotel_id, supplier_code, region_id, tour_opp_id, inventory_id
FROM
inventory
WHERE
region_id IN (38, -1)
AND tour_opp_id IN (410, -1)
AND room_plan_id IN (141, 1)
AND bed_type_id IN (1, 1059)
AND supplier_code IN ('QOA', 'QTE', 'QM', 'TEST', 'TEST1', 'MQE1', 'MQE3', 'PERR', 'QKT')
AND ('2014-11-14' BETWEEN from_date AND to_date )
ORDER BY
hotel_id DESC, supplier_code DESC, region_id DESC, tour_opp_id DESC, inventory_id DESC
Do not use * to get all the columns. You should list the column that you really need. Using * is just a lazy way of writing a query. limiting the columns will limit the data size that is being selected.
How often is the records in the inventory are being updates/inserted/delete? If not too often then you can use consider using SQL_CACHE. However, caching a query will cause you problems if you use it and the inventory table is updated very often. In addition, to use query cache you must check the value of query_cache_type on your server. SHOW GLOBAL VARIABLES LIKE 'query_cache_type';. If this is set to "0" then the cache feature is disabled and SQL_CACHE will be ignored. If it is set to 1 then the server will cache all queries unless you tell it not too using NO_SQL_CACHE. If the option is set to 2 then MySQL will cache the query only where SQL_CACHE clause is used. here is documentation about query_cache_type
If you have an index on those following column in this order it will help you (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id)
ALTER TABLE inventory
ADD INDEX (hotel_id, supplier_code, region_id, tour_opp_id, inventory_id);
If possible increase sort_buffer_size on your server as most likely you issue here is that your are doing too much sorting.
As for the second query "BTW I re-wrote below"
SELECT
*, pinfo.fri as pi_day_fri,
pinfoadd.fri as pa_day_fri,
pinfochld.fri as pc_day_fri
FROM
profit_markup
INNER JOIN
profit_markup_info AS pinfo ON pinfo.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_add_info AS pinfoadd ON pinfoadd.profit_id = profit_markup.profit_markup_id
INNER JOIN
profit_markup_child_info AS pinfochld ON pinfochld.profit_id = profit_markup.profit_markup_id
WHERE
profit_markup.hotel_id = 1059
AND booking_channel IN (1, 2)
AND rate_region IN (-1, 128)
AND period_from <= '2014-11-14'
AND period_to >= '2014-11-14'
ORDER BY
profit_markup.hotel_id DESC, supplier_code DESC, rate_region DESC,
operators_list DESC, profit_markup_id DESC
Again eliminate the use of * from your query
Make sure that the following columns have the same type/collation and same size. pinfo.profit_id, profit_markup.profit_markup_id, pinfoadd.profit_id, pinfochld.profit_id and each one have to have an index on every table. If the columns have different types then MySQL will have to convert the data every time to join the records. Even if you have index it will be slower. Also, if those column are characters type (ie. VARCHAR()) make sure they are of the CHAR() with a collation of latin1_general_ci as this will be faster for finding ID, but if you are using INT() even better.
Use the 3rd and 4th trick I listed for the previous query
Try using STRAIGHT_JOIN "you must know what your doing here or it will bite you!" Here is a good thread about this When to use STRAIGHT_JOIN with MySQL
I hope this helps.

For the first query, I am not sure if you can do much (assuming you have already indexed the fields you are ordering by) apart from replacing the * with column names (Don't expect this to increase the performance drastically).
For the second query, before you go through the loop and put in selection arguments, you could create a view with all the tables joined and ordered then make a prepared statement to select from the view and bind arguments in the loop.
Also, if your php server and the database server are in two different places, it is better if you did the selection through a stored procedure in the database.
(If nothing works out, then memcache is the way to go... Although I have personally never done this)

Here you have increase query performance not an database performance.
For both queries first check index is available on WHERE and ON(Join) clause columns, if index is missing then you have to add index to improve query performance.
Check explain plane before create index.
If possible show me the explain plane of both query that will help us.

Multiple 'Select column where ...' vs Select all the column

I have to select 4 rows randomly from a column.
Is is better to generate randomly 4 id and to perform 4 requests 'select column from database where id = ... '
Or to select all the rows in one request and to choose after?

If you are capable of generating random existing id's, I think the best approach is to use a clause like where id in (id1, id2, id3, id4). This will result in getting 4 records in one query, so no unnecessary query's or records are fetched.

As told before, where id in (id1, id2, id3, id4) is the fastest way from the MySQL perspective. How ever, you will need some logic in the application generating those IDs : All 4 IDs shall exist, be randomly distributed, and you want to avoid duplicates. In worst case you will be retrieving a list of all existend IDs with a huge query, extracting 4 random values, and querying again.
With all that logic to be done, it can be wise to move selection into MySQL:
SELECT * FROM foobar
ORDER BY RAND()
LIMIT 4;
You must understand that this is slow in mysql, but you have a speed gain in the application logic and can be sure to get random values equally seed all over your table.
EDIT:
The comment asks if PHP is fasten in this task then MySQL. Answer is no.
It is not done by "using rand". You need to have an array containing all those IDs in PHP. That is a huge query, lots of TCP traffic, huge array to be buildt in php, huge btree to be buildt by zend engine. Then, with the IDs, you must fire a second query to get the rows for those IDs.

Although the RAND() function may be slow, so far I have not had significant problems with speed. MY strategy is actually to join the database back to a query of itself returning a list of random IDs with a limit.
SELECT *
FROM table AS t1
JOIN (
SELECT rowID
FROM table
ORDER BY RAND()
LIMIT 4
) AS t2
WHERE t1.rowID = t2.rowID
There is also a more robust solution that exist - try checking out this question (asked in 2010).

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.

Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Returning random rows from mysql database without using rand()

I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?

Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.

Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.

You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);

For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.

Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.