Currently what I'm trying to do is get a chunk of 30 numbers from a set of 200. For example as this will be used with MySQL, I want to select 30 random images from a database of 200 images. I'd like to be able to have two numbers (a high and low) I could use in the limit statement that would return 30 rows of data like this: SELECT * FROM 'images' LIMIT 20,50 or SELECT * FROM 'images' LIMIT 10,40. I know this probably sounds like a stupid question though my brain is just kinda stumped right now. All help is greatly appreciated! Thanks :)
Simply add ORDER BY RAND() to your query. It is "sufficiently" random.
SELECT FOO FROM BAR ORDER BY RAND() LIMIT 30
Using ORDER BY RAND() is considered an antipattern because you're forcing the database to perform a full table scan and expensive sort operation.
To benefit from query caching, you could make a hybrid solution like:
// take all image ids
$keys = $db->query('SELECT image_id FROM images')->fetchALL(PDO::FETCH_COLUMN, 0);
// pick 30
$random_keys = join(',', array_rand(array_flip($keys), 30));
// query those images only
$values = $db->query("SELECT * FROM images WHERE image_id IN ($random_keys)")->fetchAll(PDO::FETCH_ASSOC);
The above query for the keys can be cached in PHP, so it can be used for more frequent requests. However, when your table becomes in the range of 100k or more, I would suggest creating a separate table with the image ids in a randomized order that you can join against the images table. You can populate this once or a few times per day using the ORDER BY RAND().
I would suggest using PHP's array_rand().
http://php.net/manual/en/function.array-rand.php
Put whatever you want to choose from in an array, and let it pick 20 entries for you. Then, you can use whatever file names you want without having to rely on them being in numerical order.
Related
So I have a table with 45 records (but can be dynamic) and I use mysql_fetch_array() to get the data from the database. What is the best way to output 5 records at a time? So I need to do record 1-5, then have a link for records 6-10, 11-15, and so on. I thought about doing something with array_chunk but not sure how to keep track of the record number. Thanks for hints.
To get the first 5 results form a table:
SELECT * FROM table ORDER BY table.column_name ASC LIMIT 0, 5
Selects from `table`
Ordered by the column name Ascending
Limit 0,5 selects the first 5 results, starting at 0.
Change LIMIT 0,5 to 5,5 to list results 6-10 (start at record 5, and continue for 5 records.)
Ordering is just good practice to ensure consistency. Under most circumstances set this to 'id' if you have an auto-increment 'id' column. If you want results sorted by date, order by a timestamp column. If you want data reversed, order by DESC.
You can keep track of where your queries are though PHP Sessions, Passing GET parameters, temporary database tables, and probably a few more I missed.
Other solution:
Use the array returned from the mysql_fetch_array() and utilite http://php.net/manual/en/language.types.array.php
The obvious disadvantage to this approach is the fact that it fetches all rows in the table. This is okay if you'll NEVER have more than a manageable number of rows. In your case, 45 should be fine, assuming they're not gigantic rows. This approach may also may useful if you want data pre-loaded.
I'd suggest using limits and incremental offsets in your query. Your first query would then be:
select * from TABLE limit 0,5;
Your link has a parameter referencing the next offset so the next query would be:
select * from TABLE limit 5,5;
And so on.
You need in your query LIMIT 0,5. Search web for php paginator.
Currently, I am using this query in my PHP script:
SELECT * FROM `ebooks` WHERE `id`!=$ebook[id] ORDER BY RAND() LIMIT 125;
The database will be about 2500 rows big at max, but I've read that ORDER BY RAND() eventually will slow down the processing time as the data in the database grows.
So I am looking for an alternate method for my query to make things still run smoothly.
Also, I noticed that ORDER BY RAND() is not truly randomizing the rows, because often I see that it follows some kind of pattern that sometimes repeats over and over again.
Is there any method to truly randomize the rows?
The RAND() function is a pseudo-random number generator and if you do not initialize it with different values will give you the same sequence of numbers, so what you should do is:
SELECT * FROM `ebooks` WHERE `id`!=$ebook[id] ORDER BY RAND(UNIX_TIMESTAMP()) LIMIT 125;
which will seed the random number generator from the current time and will give you a different sequence of numbers.
RAND() will slow down the SELECT's ORDER BY clause since it has to generate a random number every time and then sort by it. I would suggest you have the data returned to the calling program and randomize it there using something like array_rand.
This question has already been answered:
quick selection of a random row from a large table in mysql
Here too:
http://snippetsofcode.wordpress.com/2011/08/01/fast-php-mysql-random-rows/
Please note I am a beginner to this.
I have two questions:
1) How can I order the results of a query randomly.
example query:
$get_questions = mysql_query("SELECT * FROM item_bank_tb WHERE item_type=1 OR item_type=3 OR item_type=4");
2) The best method to select random rows from a table. So lets say I want to grab 10 rows at random from a table.
Many thanks,
If you don't mind sacrificing complexity on the insert/update/delete operations for speed on the select, you can always add a sequence number and make sure it's maintained on insert/update/delete, then whenever you do a select, simply select on one or more random numbers from within this range. If the "sequence" column is indexed, I think that's about as fast as you'll get.
An alternative is "shuffling". Add a sequence column, insert random values into this column, and whenever you select records, order by the sequence column and update the selected record sequences to new random values. The update should only affect the records you've retrieved, so shouldn't be too costly ... but it may be worth running some tests against your dataset.
This may be a fairly evil thing to say, but I'll say it anyway ... is there ever a need to display 'random' data? if you're trying to display random records, you may be doing something wrong.
Think about Amazon ... do they display random products, or do they display popular ones, and 'things other people bought when they looked at this'. Does SO give you a list of random questions to the right of here, or a list of related ones? Just some food for thought.
SELECT * FROM item_bank_tb WHERE item_type in(1,3,4) order by rand() limit 10
Beware that order by rand() is very slow on large recordset.
EDIT. Take a look at this very interesting article that presents a different approach.
http://explainextended.com/2009/03/01/selecting-random-rows/
which is more efficient (when managing over 100K records):
A. Mysql
SELECT * FROM user ORDER BY RAND();
of course, after that i would already have all the fields from that record.
B. PHP
use memcached to have $cache_array hold all the data from "SELECT id_user FROM user ORDER BY id_user" for 1 hour or so... and then:
$id = array_rand($cache_array);
of course, after that i have to make a MYSQL call with:
SELECT * FROM user WHERE id_user = $id;
so... which is more efficient? A or B?
The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.
Having said that, ORDER BY RAND() is known to be slow because it's impossible for MySQL to use an index. MySQL will basically run the RAND() function once for each row in the table and then sort the rows based on what came back from RAND().
Your other idea of storing all user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you're adding some complexity to your application. I would try to look for another way.
I'll give you a third option which might outperform both your suggestions: Select a count(user_id) of the rows in your user table and then have php generate a random number between 0 and the result of count(user_id) minus 1, inclusive. Then do a SELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.
Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.
The first one is incredibly slow because
MySQL creates a temporary table with
all the result rows and assigns each
one of them a random sorting index.
The results are then sorted and
returned.
It's elaborated more on this blog post.
$random_no = mt_rand(0, $total_record_count);
$query = "SELECT * FROM user ORDER BY __KEY__ LIMIT {$random_no}, 1";
I've been reading a lot about the disadvantages of using "order by rand" so I don't need update on that.
I was thinking, since I only need a limited amount of rows retrieved from the db to be randomized, maybe I should do:
$r = $db->query("select * from table limit 500");
for($i;$i<500;$i++)
$arr[$i]=mysqli_fetch_assoc($r);
shuffle($arr);
(i know this only randomizes the 500 first rows, be it).
would that be faster than
$r = $db->("select * from table order by rand() limit 500");
let me just mention, say the db tables were packed with more than...10,000 rows.
why don't you do it yourself?!? - well, i have, but i'm looking for your experienced opinion.
thanks!
500 or 10K, the sample size is too small to be able to draw tangible conclusions. At 100K, you're still looking at the 1/2 second region on this graph. If you're still concerned with performance, look at the two options for a randomized number I provided in this answer.
We don't have your data or setup, so it's left to you to actually test the situation. There are numerous pages for how to calculate elapsed time in PHP - create two pages, one using shuffle and the other using the RAND() query. Run at least 10 of each, & take a look.
I am looking at this from experience with MySQL.
Let's talk about the first piece of code:
$r = $db->query("select * from table");
for($i=0;$i<500;$i++){
$arr[$i] = mysqli_fetch_assoc($r);
}
shuffle($arr);
Clearly it would be more efficient to LIMIT the number of rows in the SQL statement instead of doing it on PHP.
Thus:
$r = $db->query("SELECT * FROM table LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
shuffle($arr);
SQL operation would be faster than doing it in PHP, especially when you have such large amount of rows. One good way to find out is to do benchmarking and find out which of the two would be faster. My bet is that the SQL would be faster than shuffling in PHP.
So my vote goes for:
$r = $db->query("SELECT * FROM table ORDER BY RAND() LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
I'm pretty sure the shuffle takes longer in your case, but you may wanna see this link for examples on fast random sets from the database. It requires a bit of extra SQL, but if speed is important to you, then do this.
http://devzone.zend.com/article/4571-Fetching-multiple-random-rows-from-a-database