I'm currently displaying a random row from all the entries and that works fine.
SELECT * FROM $db_table where live = 1 order by rand() limit 1
now, i'd like to limit it to the last 100 entries in the db.
every row in the db has an ID and a timestamp.
it's a small database, so overhead-minimization is not a priority.
thanks!
EDIT:
Still can't get it running.. I get a mysql_fetch_array error:
"Warning: mysql_fetch_array(): supplied argument is not a valid MySQL result resource
Here's all of my code:
<?php $sql = "SELECT * FROM
(SELECT * FROM $db_table ORDER BY $datetime DESC LIMIT 100)
ORDER BY rand() LIMIT 1";
$query = mysql_query($sql);
while($row = mysql_fetch_array($query)) {
echo "".$row['familyname']."";
} ?>
Thanks again!
This is what I came up with off the top of my head. I've tested it and it works in SQLite, so you shouldn't have much trouble with MySQL. The only change was that SQLite's random function is random() not rand():
SELECT * FROM
(SELECT * FROM $db_table ORDER BY $timestamp DESC LIMIT 100)
ORDER BY rand() LIMIT 1
This page has a pretty detailed writeup on how to optimize an ORDER BY RAND()-type query. It's actually too involved for me to explain adequately on SO (also, I don't fully understand some of the SQL commands used, though the general concept makes sense), but the final optimized query makes use of several optimizations:
First, ORDER BY RAND(), which uses a filesort algorithm on the entire table, is dropped. Instead, a query is constructed to simply generate a single random id.
At this stage, an index scan is being used, which is even less efficient than a filesort in many cases, so this is optimized away with a subquery.
The WHERE clause is replaced with a JOIN to reduce the number of rows fetched by the outer SELECT, and the number of times the subquery is executed, to just 1.
In order to account for holes in the ids (from deletions) and to ensure an equal distribution, a mapping table is created to map row numbers to ids.
Triggers are used to automatically update & maintain the mapping table.
Lastly, stored procedures are created to allow multiple rows to be selected at once. (Here, ORDER BY is reintroduced, but used only on the result rows.)
Here are the performance figures:
Q1. ORDER BY RAND()
Q2. RAND() * MAX(ID)
Q3. RAND() * MAX(ID) + ORDER BY ID
100 1.000 10.000 100.000 1.000.000
Q1 0:00.718s 0:02.092s 0:18.684s 2:59.081s 58:20.000s
Q2 0:00.519s 0:00.607s 0:00.614s 0:00.628s 0:00.637s
Q3 0:00.570s 0:00.607s 0:00.614s 0:00.628s 0:00.637s
Related
Lets start by saying that I cant use INDEXING as I need the INSERT, DELETE and UPDATE for this table to be super fast, which they are.
I have a page that displays a summary of order units collected in a database table. To populate the table an order number is created and then individual units associated with that order are scanned into the table to recored which units are associated with each order.
For the purposes of this example the table has the following columns.
id, UID, order, originator, receiver, datetime
The individual unit quantities can be in the 1000's per order and the entire table is growing to hundreds of thousands of units.
The summary page displays the number of units per order and the first and last unit number for each order. I limit the number of orders to be displayed to the last 30 order numbers.
For example:
Order 10 has 200 units. first UID 1510 last UID 1756
Order 11 has 300 units. first UID 1922 last UID 2831
..........
..........
Currently the response time for the query is about 3 seconds as the code performs the following:
Look up the last 30 orders by by id and sort by order number
While looking at each order number in the array
-- Count the number of database rows that have that order number
-- Select the first UID from all the rows as first
-- Select the last UID from all the rows as last
Display the result
I've determined the majority of the time is taken by the Count of the number of units in each order ~1.8 seconds and then determining the first and last numbers in each order ~1 second.
I am really interested in if there is a way to speed up these queries without INDEXING. Here is the code with the queries.
First request selects the last 30 orders processed selected by id and grouped by order number. This gives the last 30 unique order numbers.
$result = mysqli_query($con, "SELECT order, ANY_VALUE(receiver) AS receiver, ANY_VALUE(originator) AS originator, ANY_VALUE(id) AS id
FROM scandb
GROUP BY order
ORDER BY id
DESC LIMIT 30");
While fetching the last 30 order numbers count the number of units and the first and last UID for each order.
while($row=mysqli_fetch_array($result)){
$count = mysqli_fetch_array(mysqli_query($con, "SELECT order, COUNT(*) as count FROM scandb WHERE order ='".$row['order']."' "));
$firstLast = mysqli_fetch_array(mysqli_query($con, "SELECT (SELECT UID FROM scandb WHERE orderNumber ='".$row['order']."' ORDER BY UID LIMIT 1) as 'first', (SELECT UID FROM barcode WHERE order ='".$row['order']."' ORDER BY UID DESC LIMIT 1) as 'last'"));
echo "<td align= center>".$count['count']."</td>";
echo "<td align= center>".$firstLast['first']."</td>";
echo "<td align= center>".$firstLast['last']."</td>";
}
With 100K lines in the database this whole query is taking about 3 seconds. The majority of the time is in the $count and $firstlast queries. I'd like to know if there is a more efficient way to get this same data in a faster time without Indexing the table. Any special tricks that anyone has would be greatly appreciated.
Design your database with caution
This first tip may seems obvious, but the fact is that most database problems come from badly-designed table structure.
For example, I have seen people storing information such as client info and payment info in the same database column. For both the database system and developers who will have to work on it, this is not a good thing.
When creating a database, always put information on various tables, use clear naming standards and make use of primary keys.
Know what you should optimize
If you want to optimize a specific query, it is extremely useful to be able to get an in-depth look at the result of a query. Using the EXPLAIN statement, you will get lots of useful info on the result produced by a specific query, as shown in the example below:
EXPLAIN SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;
Don’t select what you don’t need
A very common way to get the desired data is to use the * symbol, which will get all fields from the desired table:
SELECT * FROM wp_posts;
Instead, you should definitely select only the desired fields as shown in the example below. On a very small site with, let’s say, one visitor per minute, that wouldn’t make a difference. But on a site such as Cats Who Code, it saves a lot of work for the database.
SELECT title, excerpt, author FROM wp_posts;
Avoid queries in loops
When using SQL along with a programming language such as PHP, it can be tempting to use SQL queries inside a loop. But doing so is like hammering your database with queries.
This example illustrates the whole “queries in loops” problem:
foreach ($display_order as $id => $ordinal) {
$sql = "UPDATE categories SET display_order = $ordinal WHERE id = $id";
mysql_query($sql);
}
Here is what you should do instead:
UPDATE categories
SET display_order = CASE id
WHEN 1 THEN 3
WHEN 2 THEN 4
WHEN 3 THEN 5
END
WHERE id IN (1,2,3)
Use join instead of subqueries
As a programmer, subqueries are something that you can be tempted to use and abuse. Subqueries, as show below, can be very useful:
SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post FROM authors a
Although subqueries are useful, they often can be replaced by a join, which is definitely faster to execute.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id
Source: http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
I'm trying to make a quiz using mysql and php
I want to fetch questions randomly from the database
I have a truble with rand function
sometimes it seturns no value and returns also duplications
I tried to find a solution on the net but couldn't make it
this is the part of code that generate the problem
$link=mysqli_connect("localhost","root","","database");
$req="SELECT DISTINCT *
FROM `qst_s`
WHERE `id_qst` = ROUND( RAND()*49 ) + 1 AND `level` = '1' LIMIT 40";
$result=mysqli_query($link,$req);
$question=$result->fetch_assoc();
I have 50 questions in my database level 1 by the way
The simplest way of selecting random rows from the MySQL database is to use "ORDER BY RAND()" clause in the query.
SELECT * FROM `table` ORDER BY RAND() LIMIT 0,1;
The problem with this method is that it is very slow. The reason for it being so slow is that MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Your query would look like this:
$req="SELECT DISTINCT *
FROM `qst_s`
WHERE `level` = '1'
ORDER BY RAND()
LIMIT 40";
I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?
You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.
Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound
First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.
For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4
I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?
Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.
Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.
You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);
For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.
Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)
To randomly select records from one table; do I have to always set a temporary variable in PHP? I need some help with selecting random rows within a CodeIgniter model, and then display three different ones in a view every time my homepage is viewed. Does anyone have any thoughts on how to solve this issue? Thanks in advance!
If you don't have a ton of rows, you can simply:
SELECT * FROM myTable ORDER BY RAND() LIMIT 3;
If you have many rows, this will get slow, but for smaller data sets it will work fine.
As Steve Michel mentions in his answer, this method can get very ugly for large tables. His suggestion is a good place to jump off from. If you know the approximate maximum integer PK on the table, you can do something like generating a random number between one and your max PK value, then grab random rows one at a time like:
$q="SELECT * FROM table WHERE id >= {$myRandomValue}";
$row = $db->fetchOne($q); //or whatever CI's interface to grab a single is like
Of course, if you need 3 random rows, you'll have three queries here, but as they're entirely on the PK, they'll be fast(er than randomizing the whole table).
I would do something like:
SELECT * FROM table ORDER BY RAND() LIMIT 1;
This will put the data in a random order and then return only the first row from that random order.
I have this piece of code in production to get a random quote. Using MySQL's RAND function was super slow. Even with 100 quotes in the database, I was noticing a lag time on the website. With this, there was no lag at all.
$result = mysql_query('SELECT COUNT(*) FROM quotes');
$count = mysql_fetch_row($result);
$id = rand(1, $count[0]);
$result = mysql_query("SELECT author, quote FROM quotes WHERE id=$id");
you need a query like this:
SELECT *
FROM tablename
WHERE somefield='something'
ORDER BY RAND() LIMIT 3
It is taken from the second result of
http://www.google.com/search?q=mysql+random
and it should work ;)
Ordering a big table by rand() can be very expensive if the table is very large. MySQL will need to build a temporary table and sort it. If you have primary key and you know how many rows are in the table, use LIMIT x,1 to grab a random row, where x is the number of the row you want to get.