SELECT 'random' results with set parameters - php

I am developing a system which selects questions from a database to generate a test/exam.
Each question has a set 'question type' ('q_type'). I have a database of questions and need to select 4 questions, each with a different 'q_type'.
The basic query to select 4 random questions at the moment is:
SELECT * FROM questions ORDER BY RAND() LIMIT 0,4
This obviously does not take into account the fact that each question should have a different 'q_type'.
I would like to be able to do something that follows this logic (i need something to fill in the square brackets):
SELECT * FROM questions WHERE ['q_type' is DISTINCT] ORDER BY RAND() LIMIT 0,4
I have tried using GROUP BY 'q_type', but that simply gives the first question for each 'q_type', not a different question of that type each time.
Any help would be great, as I am completely stumped at the moment (I am working with a overblown PHP loop which simply queries the DB 4 times, with an updated WHERE 'q_type'!=XX each time).
Thanks for any help!

I don't think there's an easy way to do this with simple queries. You could probably do it with a mysql stored procedure, or to save some development time do it in php.
If the pool of questions isn't something extremely large, and this isn't something that happens too frequently, you should be ok with the 4 separate queries (one for each q_type), or even with getting the entire question set into php and then playing around with it to get 4 random questions, one in each q_type.
With the following (untested) code you'll be getting all questions ordered randomly and loop through them to get the first question for each q_type, then stop. It's not the most elegant solution but, again, for small enough data sets and frequency it should be ok.
$questions = array() ;
$res = mysql_query('SELECT * FROM questions ORDER BY RAND()') ;
while ($row = mysql_fetch_assoc($res)) {
//if we have all 4, stop
if (count($questions) == 4) {
break ;
}
$currType = $row['q_type'] ;
$currQuestion = $row['questionTitle'] ;
if (isset($questions[$currType])) {
//already have it, continue to the next
continue ;
}
$questions[$currType] = $currQuestion ;
}

i'm by no means an SQL expert but does this work?
SELECT DISTINCT `q_type`, * FROM questions ORDER BY RAND() LIMIT 0, 4

Related

Having trouble spitting out mysqli random numbers

I need to populate 2x user names from a MySQL db randomly.
Say I have clinton, sanders and trump as users in my db (sorry couldn't help myself).
I want just two users randomly entered into variables. I have read too many scripts and got myself horribly confused.
$result = mysqli_query($mdb, "SELECT user FROM `table`");
$row = mysqli_fetch_row(array_rand($result,2));
echo $row[0] . "\n";
echo $row[1] . "\n";
I'm actually going to hit the sack as my teddy bear is lonely, so will review and reply to answers tomorrow morning. Thanks.
I think you should leave your teddy bear alone once more and use this query directly instead
$result = mysqli_query($mdb, "SELECT user FROM `table` ORDER BY RAND() LIMIT 0, 2");
This would automatically give you two random entries from the db and you wont need to use array_rand() anymore
With reference to the question in the comments regarding the implications on a large base, upon research it would have a big implication. I found a comment on this http://www.petefreitag.com/item/466.cfm that provides a better option
If you have a table with many rows, you can make this query much
faster. A simple SELECT * FROM table ORDER BY RAND() has to do a sort
in memory- expensive. If you change the query to SELECT * FROM table
WHERE RAND()>0.9 ORDER BY RAND() then you've reduced the overhead of
the sort by 90%. This will have the overhead of many RAND()
statements, but on DB2 mainframes, it is a much faster technique.

PHP/MYSQL: Slowly iterate through 6k rows and for every row create new records - Algorithm

I'm sorry for stupid question, but I have one of these days, where I feel like the dumbest programmer. I need your help. I'm currently developing with PHP and MYSQL, where I'm like super low skilled and I'm working on inherited project.
I have database table with almost 6k records in it, let's say TABLE_A, and I need to iterate through the records in TABLE A and for every record create two new records in TABLE B where the PK from TABLE_A(Id) is FK in TABLE_B. Nothing special right? So I have one more thing, this is happening, don't blame please, in production DB. So I got a request to run the insertion into table B only for 10 records every 1 second. Furthermore, I have list of Ids which looks like this: 1,2,4,6,7,8,9,11,12,15,16,.. to 6k. So I cannot basically do:
for ($i = 1; $i <= MAX(id); $i++) {
//create two new records in TABLE B
}
I have spent some time with the research and I need to talk about it with you guys, to come up with some ideas. I don't want from you the exact solution, but I want to learn how to think about that and how to come up with the solution. I was thinking about it on my way home. So I just created the algorithm in my head. Here is step-by-step process in my head about what I know and what I will probably use:
I know that I can run just 10 inserts per 1 second - so I need to limit the select from TABLE A for just 5 rows in one batch.
So I can probably use MySQL syntax: LIMIT and OFFSET, for example: select * from t LIMIT 5 OFFSET 0
This means that I have to store the id of the last record from the previous batch.
After finishing current batch, I need to wait for 1 second( I'm think about using PHP method sleep()) before starting new batch.
I need loop
The exact number of rows in TABLE_A is for now unusable
The insertion of new records is simple. Focus on the iteration.
So here is something I have on the paper and I'm not quite sure if it is going to work or not, because I really want to learn something from this problem. I will skip the things around, like connect DB,etc and will focus just on the algorithm and will write in some hybrid PHP/Mysql/Pseudo code.
$limit=5
$offset=0;
function insert($limit, $offset){
$stm = $db->prepare("SELECT id FROM tableA LIMIT :limit OFFSET :offset");
$stm->execute(array('limit' => $limit, 'offset' => $offset));
while($stm->rowCount() > 0){
$data = $stm->fatchAll();
foreach($data as $row){
// insert into TABLE_B
}
sleep(1);
$offset +=5;
$this->insert($limit, $offset);
}
}
I'm not totally sure, if this recursion will work. On paper it looks feasible. But what about performance? It's a problem in this case?
Maybe the main question is: Am I over thinking this? Do you know about better solution how to do that?
Thank you for any comments, thoughts, suggestions, ideas and detail descriptions of your procedure how to come up with feasible solution. Probably I should dig more into some algorithm analysis and design. Do you know any good resources?
(Sorry for grammar mistakes, I'm not a native speaker)
I don't know why you have to insert into table B for 10 records per 1 second, but let's assume that this condition can not be changed.
Your sources code are right, however recursion is not necessary here, we should do something like that.
limit=5
offset=0
while (itemsA = fetch_from_a(limit, offset)) {
# you should do a batch insertion here, see MySQL's documentation.
insert_into_B(itemsA);
sleep(1);
offset += 5;
}
# prototype
# fetch some records from table A, return array of found items
# or an empty array if nothing was found.
function fetch_from_a(limit, offset);

Using count() to count results from PDO in PHP?

I'm in the process of converting an old mysql_X site to work with PDO, and so far, so good (I think). I have a question regarding the reliability of counting the results of queries, however.
Currently, I've got everything going like this (I've removed the try / catch error code to make this easier to read):
$stm = $db->prepare("SELECT COUNT(*) FROM table WHERE somevar = '1'");
$stm->execute();
$count = $stm->fetchColumn();
if ($count > 0){
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
}
There might be stupid problems with doing it this way, and I invite you to tell me if there are, but my question is really about cutting down on database queries. I've noticed that if I cut the first statement out, run the second by itself, and then use PHP's count() to count the results, I still seem to get a reliable row count, with only one query, like this:
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
$count = count($result);
if ($count > 0){
//do whatever
}
Are there any pitfalls to doing it this way instead? Is it reliable? And are there any glaring, stupid mistakes in my PDO here? Thanks for the help!
Doing the count in MySQL is preferable, especially if the count value is the only result you're interested in. Compare your versions to equivalent question "how many chocolate bars does the grocery store have in stock?"
1) count in the db: SELECT count(*) .... Drive to the store, count the chocolate bars, write down the number, drive home, read the number off your slip of paper
2) count in PHP: SELECT * .... Drive to the store. Buy all the chocolate bars. Truck them home. Count them on your living room floor. Write the results on a piece of paper. Throw away the chocolate bars. Read number off the paper.
which one is more efficient/less costly? Not a big deal if your db/table only has a few records. When you start reaching the thousands/millions of records, version 2) is absolutely ludicrious and likely to burn through your bandwidth, blow up your PHP memory limit, and drive your CPU usage into the stratosphere.
That being said, there's no point in running two queries, one to just count how many records you MAY get. Such a system is vulnerable to race conditions. e.g. you do your count and get (say) 1 record. by the time you go to run the second query and fetch that record, some OTHER parallel process has gone and inserted another record, or deleted the one you'd wanted.
In first case you are counting using MYSQL, and in second case you are counting using PHP. Both are essentialy same results.
Your usage of the queries is correct. The only problem will appear when you use LIMIT, because the COUNT(*) and the count($result) will be different.
COUNT(*) will count all the rows that the query would have returned (given that the counting query is the same and not using LIMIT)
count($result) will count just the returned rows, so if you use LIMIT, you will just get the results up to the given limit.
Yes it's reliable in this use case!

Sorting by ratings in a database - Where to put this SQL? (PHP/MySQL)

OK - I'll get straight to the point - here's the PHP code in question:
<h2>Highest Rated:</h2>
<?php
// Our query base
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
?>
<h3><?php echo $row->title." ID: ";echo $row->id; ?></h3>
<p class="author"><?php $query2 = $this->db->query("SELECT email FROM users WHERE id = ".$row->author);
echo $query2->row('email');?></p>
<?php echo ($this->bbcode->Parse($row->code)); ?>
<?php } ?>
Sorry it's a bit messy, it's still a draft. Anyway, I researched ways to use a Ratings system - previously I had a single 'rating' field as you can see by SELECT * FROM code ORDER BY rating DESC. However I quickly realised calculating averages like that wasn't feasible, so I created five new columns - rating1, rating2, rating3, rating4, rating5. So when 5 users rating something 4 stars, rating4 says 5... does that make sense? Each ratingx column counts the number of times the rating was given.
So anyway: I have this SQL statement:
SELECT id, (ifnull(rating1,0) + ifnull(rating2,0) + ifnull(rating3,0) + ifnull(rating4,0) + ifnull(rating5,0)) /
((rating1 IS NOT NULL) + (rating2 IS NOT NULL) + (rating3 IS NOT NULL) + (rating4 IS NOT NULL) + (rating5 IS NOT NULL)) AS average FROM code
Again messy, but hey. Now what I need to know is how can I incorporate that SQL statement into my script? Ideally you'd think the overall query would be 'SELECT * FROM code ORDER BY (that really long query i just stated) DESC' but I can't quite see that working... how do I do it? Query, store the result in a variable, something like that?
If that makes no sense sorry! But I really appreciate the help :)
Jack
You should go back to the drawing board completely.
<?php
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
$this->db->query("SELECT email FROM users WHERE id = ".$row->author;
}
Anytime you see this in your code, stop what you're doing immediately. This is what JOINs are for. You almost never want to loop over the results of a query and issue multiple queries from within that loop.
SELECT code.*, users.email
FROM code
JOIN users ON users.id = code.author
ORDER BY rating DESC
This query will grab all that data in a single resultset, removing the N+1 query problem.
I'm not addressing the rest of your question until you clean up your question some and clarify what you're trying to do.
if you would like to change your tables again, here is my suggestion:
why don't you store two columns: RatingTotal and RatingCount, each user that rates it will increment RatingCount by one, and whatever they vote (5,4,4.2, etc) is added to RatingTotal. You could then just ORDER BY RatingTotal/RatingCount
also, I hope you store which users rated each item, so they don't vote multiple times! and swing the average their way.
First, I'd decide whether your application is write-heavy or read-heavy. If there are a lot more reads than writes, then you want to minimize the amount of work you do on reads (like this script, for example). On the assumption that it's read-heavy, since most webapps are, I'd suggest maintaining the combined average in a separate column and recalculating it whenever a user adds a new rating.
Other options are:
Try ordering by the calculated column name 'average'. SQL Server supports this. . not sure about mysql.
Use a view. You can create a view on your base table that does the average calculation for you and you can query against that.
Also, unrelated to your question, don't do a separate query for each user in your loop. Join the users table to the code table in the original query.
You should include it in the SELECT part:
SELECT *, (if ....) AS average FROM ... ORDER BY average
Edit: assuming that your ifnull statement actually works...
You might also want to look into joins to avoid querying the database again for every user; you can do everything in 1 select statement.
Apart from that I would also say that you only need one average and the number of total votes, that should give you all the information you need.
Some excellent ideas, but I think the best way (as sidereal said that it's more read heavy that write heavy) would be to have columns rating and times_rated, and just do something like this:
new_rating = ((times_rated * rating) + current_rating) / (times_rated + 1)
current_rating being the rating being applied when the person clicks the little stars. This simply weights the current user's rating in an average with the current rating.

PHP, MySQL - would results-array shuffle be quicker than "select... order by rand()"?

I've been reading a lot about the disadvantages of using "order by rand" so I don't need update on that.
I was thinking, since I only need a limited amount of rows retrieved from the db to be randomized, maybe I should do:
$r = $db->query("select * from table limit 500");
for($i;$i<500;$i++)
$arr[$i]=mysqli_fetch_assoc($r);
shuffle($arr);
(i know this only randomizes the 500 first rows, be it).
would that be faster than
$r = $db->("select * from table order by rand() limit 500");
let me just mention, say the db tables were packed with more than...10,000 rows.
why don't you do it yourself?!? - well, i have, but i'm looking for your experienced opinion.
thanks!
500 or 10K, the sample size is too small to be able to draw tangible conclusions. At 100K, you're still looking at the 1/2 second region on this graph. If you're still concerned with performance, look at the two options for a randomized number I provided in this answer.
We don't have your data or setup, so it's left to you to actually test the situation. There are numerous pages for how to calculate elapsed time in PHP - create two pages, one using shuffle and the other using the RAND() query. Run at least 10 of each, & take a look.
I am looking at this from experience with MySQL.
Let's talk about the first piece of code:
$r = $db->query("select * from table");
for($i=0;$i<500;$i++){
$arr[$i] = mysqli_fetch_assoc($r);
}
shuffle($arr);
Clearly it would be more efficient to LIMIT the number of rows in the SQL statement instead of doing it on PHP.
Thus:
$r = $db->query("SELECT * FROM table LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
shuffle($arr);
SQL operation would be faster than doing it in PHP, especially when you have such large amount of rows. One good way to find out is to do benchmarking and find out which of the two would be faster. My bet is that the SQL would be faster than shuffling in PHP.
So my vote goes for:
$r = $db->query("SELECT * FROM table ORDER BY RAND() LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
I'm pretty sure the shuffle takes longer in your case, but you may wanna see this link for examples on fast random sets from the database. It requires a bit of extra SQL, but if speed is important to you, then do this.
http://devzone.zend.com/article/4571-Fetching-multiple-random-rows-from-a-database

Categories