I need to populate 2x user names from a MySQL db randomly.
Say I have clinton, sanders and trump as users in my db (sorry couldn't help myself).
I want just two users randomly entered into variables. I have read too many scripts and got myself horribly confused.
$result = mysqli_query($mdb, "SELECT user FROM `table`");
$row = mysqli_fetch_row(array_rand($result,2));
echo $row[0] . "\n";
echo $row[1] . "\n";
I'm actually going to hit the sack as my teddy bear is lonely, so will review and reply to answers tomorrow morning. Thanks.
I think you should leave your teddy bear alone once more and use this query directly instead
$result = mysqli_query($mdb, "SELECT user FROM `table` ORDER BY RAND() LIMIT 0, 2");
This would automatically give you two random entries from the db and you wont need to use array_rand() anymore
With reference to the question in the comments regarding the implications on a large base, upon research it would have a big implication. I found a comment on this http://www.petefreitag.com/item/466.cfm that provides a better option
If you have a table with many rows, you can make this query much
faster. A simple SELECT * FROM table ORDER BY RAND() has to do a sort
in memory- expensive. If you change the query to SELECT * FROM table
WHERE RAND()>0.9 ORDER BY RAND() then you've reduced the overhead of
the sort by 90%. This will have the overhead of many RAND()
statements, but on DB2 mainframes, it is a much faster technique.
Related
I'm in the process of converting an old mysql_X site to work with PDO, and so far, so good (I think). I have a question regarding the reliability of counting the results of queries, however.
Currently, I've got everything going like this (I've removed the try / catch error code to make this easier to read):
$stm = $db->prepare("SELECT COUNT(*) FROM table WHERE somevar = '1'");
$stm->execute();
$count = $stm->fetchColumn();
if ($count > 0){
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
}
There might be stupid problems with doing it this way, and I invite you to tell me if there are, but my question is really about cutting down on database queries. I've noticed that if I cut the first statement out, run the second by itself, and then use PHP's count() to count the results, I still seem to get a reliable row count, with only one query, like this:
$stm = $db->prepare("SELECT * FROM table WHERE somevar = '1'");
$stm->execute();
$result = $stm->fetchAll();
$count = count($result);
if ($count > 0){
//do whatever
}
Are there any pitfalls to doing it this way instead? Is it reliable? And are there any glaring, stupid mistakes in my PDO here? Thanks for the help!
Doing the count in MySQL is preferable, especially if the count value is the only result you're interested in. Compare your versions to equivalent question "how many chocolate bars does the grocery store have in stock?"
1) count in the db: SELECT count(*) .... Drive to the store, count the chocolate bars, write down the number, drive home, read the number off your slip of paper
2) count in PHP: SELECT * .... Drive to the store. Buy all the chocolate bars. Truck them home. Count them on your living room floor. Write the results on a piece of paper. Throw away the chocolate bars. Read number off the paper.
which one is more efficient/less costly? Not a big deal if your db/table only has a few records. When you start reaching the thousands/millions of records, version 2) is absolutely ludicrious and likely to burn through your bandwidth, blow up your PHP memory limit, and drive your CPU usage into the stratosphere.
That being said, there's no point in running two queries, one to just count how many records you MAY get. Such a system is vulnerable to race conditions. e.g. you do your count and get (say) 1 record. by the time you go to run the second query and fetch that record, some OTHER parallel process has gone and inserted another record, or deleted the one you'd wanted.
In first case you are counting using MYSQL, and in second case you are counting using PHP. Both are essentialy same results.
Your usage of the queries is correct. The only problem will appear when you use LIMIT, because the COUNT(*) and the count($result) will be different.
COUNT(*) will count all the rows that the query would have returned (given that the counting query is the same and not using LIMIT)
count($result) will count just the returned rows, so if you use LIMIT, you will just get the results up to the given limit.
Yes it's reliable in this use case!
I have a user with his unique username in a mysql table, but I have to test and do many queries to find it. I wonder if its a better way to avoid all does queries to the db.
I have multiple rows in the table with columns like user1, user2, user3, user4 up to 30.
for ($x=0; $x < 30; $x ++){
$user = "user";
$user .= $x; //generate user1, user2, user3 etc
$result=mysql_fetch_object(mysql_query("SELECT * FROM table WHERE ".$user."='".$_SESSION['username']."'"));
if ($result){
Now if the $_SESSION['username'] is user30 in the table I do 29 queries before $result is true and I can work with the results. Is there a better way to do this? How important is this anyway. Is there a big difference in cpu demand for 1 query and 30 queries?
While you should really have just one user column, you could use a loop to build one big query rather than doing 30 small ones.
<?php
$query = "SELECT * FROM table WHERE ";
foreach(range(1,30) as $num) {
$query .= " user$num = '{$_SESSION['username']}'";
if($num < 30) $query .= " OR ";
}
print $query;
$result=mysql_fetch_object(mysql_query($query));
?>
As Johan said, you need to normalise your database. That means removing repeated columns by creating additional tables. There are many examples of how to do this; perhaps the canonical one is the wikipedia version... http://en.wikipedia.org/wiki/First_normal_form
Because you're not wildcarding the search, you could perform this search in a single query:
SELECT t.*
FROM table t
WHERE '".$_SESSION['username']."' IN (t.user1, t.use2, t.user3, t.user4, t.user5,
-- rest of the columns in the same pattern
t.user26, t.user27, t.user28, t.user29, t.user30)
Is There A Big Difference Between using 1 vs 30 queries?
Yes - database queries should be considered expensive. First, there's overhead involved with the application code sending the request to the server - SQL is transmitted over TCP. The smaller the query, the shorter the TCP transmission to the server - might be milli or nanoseconds when testing a single query, but in a multi-user setup that can really add up. Then there's the overhead of the query itself - get only what you need, which means not using "SELECT *". Or why would you hit the database 30 separate times when you could do it once?
This things are hard to conceive when you're dealing with a system that is supporting yourself (or maybe a handful of people), and the application and database are on the same host (VM would be a different matter). But costs do add up.
You need to normalize that DB. After I made this same kind of error (user_1, ..., user_n in a single row) a few times, I learned about normalization and everything DB-related suddenly became a thousand time easier. Intro here: http://en.wikipedia.org/wiki/Database_normalization
You shall have one column named 'user', in which you store the username. Then you only need one select to get that row:
"SELECT * FROM table WHERE user='".$_SESSION['username']."'"
That's if I understand you correct here... You have one column for each user, which is really bad design in about 100% of the cases. You must only have one user column with the username as the value.
I have no idea why you need to have user1...user30 in your table, but anyway, to reduce the query time, you might try the following
<?php
$selects = array();
for($i=1; $i<=30; $i++) {
$selects[] = "SELECT * FROM `table` WHERE user$i = '" . $_SESSION['username']} . "'";
}
$query = join(" UNION ", $selects);
print $query;
UNION method is preferred as indexes (user1...user30) can be used.
which is more efficient (when managing over 100K records):
A. Mysql
SELECT * FROM user ORDER BY RAND();
of course, after that i would already have all the fields from that record.
B. PHP
use memcached to have $cache_array hold all the data from "SELECT id_user FROM user ORDER BY id_user" for 1 hour or so... and then:
$id = array_rand($cache_array);
of course, after that i have to make a MYSQL call with:
SELECT * FROM user WHERE id_user = $id;
so... which is more efficient? A or B?
The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.
Having said that, ORDER BY RAND() is known to be slow because it's impossible for MySQL to use an index. MySQL will basically run the RAND() function once for each row in the table and then sort the rows based on what came back from RAND().
Your other idea of storing all user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you're adding some complexity to your application. I would try to look for another way.
I'll give you a third option which might outperform both your suggestions: Select a count(user_id) of the rows in your user table and then have php generate a random number between 0 and the result of count(user_id) minus 1, inclusive. Then do a SELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.
Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.
The first one is incredibly slow because
MySQL creates a temporary table with
all the result rows and assigns each
one of them a random sorting index.
The results are then sorted and
returned.
It's elaborated more on this blog post.
$random_no = mt_rand(0, $total_record_count);
$query = "SELECT * FROM user ORDER BY __KEY__ LIMIT {$random_no}, 1";
OK - I'll get straight to the point - here's the PHP code in question:
<h2>Highest Rated:</h2>
<?php
// Our query base
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
?>
<h3><?php echo $row->title." ID: ";echo $row->id; ?></h3>
<p class="author"><?php $query2 = $this->db->query("SELECT email FROM users WHERE id = ".$row->author);
echo $query2->row('email');?></p>
<?php echo ($this->bbcode->Parse($row->code)); ?>
<?php } ?>
Sorry it's a bit messy, it's still a draft. Anyway, I researched ways to use a Ratings system - previously I had a single 'rating' field as you can see by SELECT * FROM code ORDER BY rating DESC. However I quickly realised calculating averages like that wasn't feasible, so I created five new columns - rating1, rating2, rating3, rating4, rating5. So when 5 users rating something 4 stars, rating4 says 5... does that make sense? Each ratingx column counts the number of times the rating was given.
So anyway: I have this SQL statement:
SELECT id, (ifnull(rating1,0) + ifnull(rating2,0) + ifnull(rating3,0) + ifnull(rating4,0) + ifnull(rating5,0)) /
((rating1 IS NOT NULL) + (rating2 IS NOT NULL) + (rating3 IS NOT NULL) + (rating4 IS NOT NULL) + (rating5 IS NOT NULL)) AS average FROM code
Again messy, but hey. Now what I need to know is how can I incorporate that SQL statement into my script? Ideally you'd think the overall query would be 'SELECT * FROM code ORDER BY (that really long query i just stated) DESC' but I can't quite see that working... how do I do it? Query, store the result in a variable, something like that?
If that makes no sense sorry! But I really appreciate the help :)
Jack
You should go back to the drawing board completely.
<?php
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
$this->db->query("SELECT email FROM users WHERE id = ".$row->author;
}
Anytime you see this in your code, stop what you're doing immediately. This is what JOINs are for. You almost never want to loop over the results of a query and issue multiple queries from within that loop.
SELECT code.*, users.email
FROM code
JOIN users ON users.id = code.author
ORDER BY rating DESC
This query will grab all that data in a single resultset, removing the N+1 query problem.
I'm not addressing the rest of your question until you clean up your question some and clarify what you're trying to do.
if you would like to change your tables again, here is my suggestion:
why don't you store two columns: RatingTotal and RatingCount, each user that rates it will increment RatingCount by one, and whatever they vote (5,4,4.2, etc) is added to RatingTotal. You could then just ORDER BY RatingTotal/RatingCount
also, I hope you store which users rated each item, so they don't vote multiple times! and swing the average their way.
First, I'd decide whether your application is write-heavy or read-heavy. If there are a lot more reads than writes, then you want to minimize the amount of work you do on reads (like this script, for example). On the assumption that it's read-heavy, since most webapps are, I'd suggest maintaining the combined average in a separate column and recalculating it whenever a user adds a new rating.
Other options are:
Try ordering by the calculated column name 'average'. SQL Server supports this. . not sure about mysql.
Use a view. You can create a view on your base table that does the average calculation for you and you can query against that.
Also, unrelated to your question, don't do a separate query for each user in your loop. Join the users table to the code table in the original query.
You should include it in the SELECT part:
SELECT *, (if ....) AS average FROM ... ORDER BY average
Edit: assuming that your ifnull statement actually works...
You might also want to look into joins to avoid querying the database again for every user; you can do everything in 1 select statement.
Apart from that I would also say that you only need one average and the number of total votes, that should give you all the information you need.
Some excellent ideas, but I think the best way (as sidereal said that it's more read heavy that write heavy) would be to have columns rating and times_rated, and just do something like this:
new_rating = ((times_rated * rating) + current_rating) / (times_rated + 1)
current_rating being the rating being applied when the person clicks the little stars. This simply weights the current user's rating in an average with the current rating.
I've been reading a lot about the disadvantages of using "order by rand" so I don't need update on that.
I was thinking, since I only need a limited amount of rows retrieved from the db to be randomized, maybe I should do:
$r = $db->query("select * from table limit 500");
for($i;$i<500;$i++)
$arr[$i]=mysqli_fetch_assoc($r);
shuffle($arr);
(i know this only randomizes the 500 first rows, be it).
would that be faster than
$r = $db->("select * from table order by rand() limit 500");
let me just mention, say the db tables were packed with more than...10,000 rows.
why don't you do it yourself?!? - well, i have, but i'm looking for your experienced opinion.
thanks!
500 or 10K, the sample size is too small to be able to draw tangible conclusions. At 100K, you're still looking at the 1/2 second region on this graph. If you're still concerned with performance, look at the two options for a randomized number I provided in this answer.
We don't have your data or setup, so it's left to you to actually test the situation. There are numerous pages for how to calculate elapsed time in PHP - create two pages, one using shuffle and the other using the RAND() query. Run at least 10 of each, & take a look.
I am looking at this from experience with MySQL.
Let's talk about the first piece of code:
$r = $db->query("select * from table");
for($i=0;$i<500;$i++){
$arr[$i] = mysqli_fetch_assoc($r);
}
shuffle($arr);
Clearly it would be more efficient to LIMIT the number of rows in the SQL statement instead of doing it on PHP.
Thus:
$r = $db->query("SELECT * FROM table LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
shuffle($arr);
SQL operation would be faster than doing it in PHP, especially when you have such large amount of rows. One good way to find out is to do benchmarking and find out which of the two would be faster. My bet is that the SQL would be faster than shuffling in PHP.
So my vote goes for:
$r = $db->query("SELECT * FROM table ORDER BY RAND() LIMIT 500");
while($arr[] = mysqli_fetch_assoc($r)){}
I'm pretty sure the shuffle takes longer in your case, but you may wanna see this link for examples on fast random sets from the database. It requires a bit of extra SQL, but if speed is important to you, then do this.
http://devzone.zend.com/article/4571-Fetching-multiple-random-rows-from-a-database