I would like to SELECT certain data out of my mysql DB. I am working with a php loop and a sql statement with a LIMIT and UNION.
Problem: The speed of my query is terrible. One UNION statement tooks 2-4 seconds. Due to the loop the Overall-Query takes 3 Minutes.
Is there a chance to optimize my query?
I tried to separate the "three" statements and merge the results. But this is not really faster. So I think that the UNION is not my problem.
PHP/SQL:
My code is running through two-foreach-loops. The code is working properly. But the performance is the problem.
$sql_country = "SELECT country FROM country_list";
foreach ($db->query($sql_country) as $row_country) { //first loop (150 entries)
$sql_color = "SELECT color FROM color_list";
foreach ($db->query($sql_color) as $row_color) { //second loop (10 entries)
$sql_all = "(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 2.5 AND 4.5
order by price DESC LIMIT 2)
UNION
(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 5.5 AND 8.2
order by price DESC LIMIT 2)
UNION
(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 8.5 AND 10.8
order by price DESC LIMIT 2)";
foreach ($db->query($sql_all) as $row_all) {
$shopID[] = $row_all['ID']; //I just need these IDs
}
}
}
Do you have any idea or hints to get this faster?
An index on (country, color, price, ID) should improve the performance of single queries from seconds to a couple of milliseconds or even less. But you still have the problem of executing 1500 queries. Depending on your system, a single query execution can add an overhead of about 10 ms, which would add up to 15 seconds in your case. You need to find a way to minimize the number of queries - In best case to a single query.
For low limits (like 2 in your case), you can combine multiple LIMIT 1 subqueries with different offsets. I would generate such a query dynamically.
$priceRanges = [
['2.5', '4.5'],
['5.5', '8.2'],
['8.5', '10.8'],
];
$limit = 2;
$offsets = range(0, $limit - 1);
$queryParts = [];
foreach ($priceRanges as $range) {
$rangeFrom = $range[0];
$rangeTo = $range[1];
foreach ($offsets as $offset) {
$queryParts[] = "
select (
select ID
from company cmp
where cmp.country = cnt.country
and cmp.color = clr.color
and cmp.price between {$rangeFrom} AND {$rangeTo}
order by cmp.price desc
limit 1
offset {$offset}
) as ID
from country_list cnt
cross join color_list clr
having ID is not null
";
}
}
$query = implode(' UNION ALL ', $queryParts);
This will generate a quite long UNION query. You can see a PHP demo on rexester.com and SQL demo on db-fiddle.com.
I can't guarantee it will be any faster. But it's worth a try.
I'm trying to give my users a rank based on the amount of posts they posted. I made a database containing a rankName row with "beginner, novice, itermediate,... to master" and a minimum row with some numbers. I tried to compare the amount of posts ($qtyPosts) with the minimum rows.
For example: When a user has 9 posts, he gets the rank Novice (which has a minimum of 5 posts).
This is the code i wrote for that.
PHP code
// calculate number of posts from user
$rowsPosts = $user->getQuantityOfPosts($userID);
$qtyPosts = 0;
foreach ($rowsPosts as $q) {
$qtyPosts++;
}
//status
$conn = db::getInstance();
$rank = "";
$statementRank = $conn->prepare("SELECT * FROM rank WHERE rank.minimum >= $qtyPosts");
$statementRank->execute();
while($row = $statementRank->fetch(PDO::FETCH_ASSOC) ){
$rank = $row['rankName'];
}
HTML code
<h3>Status: <?php echo $rank; ?></h3>
However, It doesn't post the right rank, instead it posts the latest one, "master". Anyone any idea?
Consider your WHERE clause:
WHERE rank.minimum >= $qtyPosts
If the user is at the lowest rank, then all ranks will be >= that user's post count.
You can keep the same logic, but simply add an order and limit. Something like this:
WHERE rank.minimum >= $qtyPosts ORDER BY rank.minimum LIMIT 1
This would sort the ranks from lowest to highest and just select the first one.
I guess you should have a maximum column as well in the table and change the query to
SELECT * FROM rank WHERE rank.minimum >= $qtyPosts AND rank.maximum < $qtyPosts
Since according to your current post, you'll get all the ranks with $qtyPosts > the minimun number.
I'm trying to display and sort an array by an average created using data from a database. I'm retrieving three variables from the database and creating an average from these values. This value is then placed inside a new array to be sorted along with the rest of the database data.
Am I right in thinking that having the SQL query inside the loop isn't a great idea? (Performance issue?)
Is there any alternative that's available? I've attached the code below:
^ database connection/query string to retrieve all data...
$result = $stmt_business_list->fetchAll(PDO::FETCH_ASSOC);
$items = array();
foreach($result as $row){
$single_business_id = $row['id'];
$name = $row['name'];
//Query to get ALL the service, value and quality ratings for certain business
$test_query = "SELECT * FROM rating WHERE business_id = $single_business_id";
$test_query_stmt = $dbh->prepare($test_query);
$test_query_stmt->execute();
$test_row = $test_query_stmt->fetchAll(PDO::FETCH_ASSOC);
$total_value = $total_quality = $total_service = 0;
foreach($test_row as $review)
{
$total_value += $review['value'];
$total_quality += $review['quality'];
$total_service += $review['service'];
}
$bayesian_value = (($set_site_average_review_count * $set_site_average_review_score) + $total_value) / ($set_site_average_review_count + $business_review_count);
$bayesian_quality = (($set_site_average_review_count * $set_site_average_review_score) + $total_quality) / ($set_site_average_review_count + $business_review_count);
$bayesian_service = (($set_site_average_review_count * $set_site_average_review_score) + $total_service) / ($set_site_average_review_count + $business_review_count);
$average_bayesian_rating = ($bayesian_value + $bayesian_quality + $bayesian_service) / 3;
$average_bayesian_rating = $average_bayesian_rating;
array_push($items, array(
"id"=>"$single_business_id",
"name"=>"$name",
"value"=>"$total_value",
"quality"=>"$total_quality",
"service"=>"$total_service",
"average"=>"$average_bayesian_rating"));
echo
'Name: '.$name.'<br>
Value: '.$total_value.'<br>
Quality: '.$total_quality.'<br>
Service: '.$total_service.'<br>
Average: '.$average_bayesian_rating.'<br><br>';
}
}
The page will be split up by a separate pagination script and will only display 6 objects at a time, but over time this may change so I do have an eye on performance as much as I can.
SQL aggregate queries are made for this kind of thing.
Use this query to summarize the results
SELECT b.name, b.id,
SUM(value) total_value,
SUM(quality) total_quality,
SUM(service) total_service,
COUNT(*) review_count,
avg_reviews_per_biz
FROM business b
JOIN ratings r ON b.id = r.business_id
JOIN (
SELECT COUNT(DISTINCT business_id) / COUNT(*) avg_reviews_per_biz
FROM ratings
) a ON 1=1
GROUP BY b.name, b.id, avg_review_per_biz
This will give you one row per business showing the summed ratings and the number of ratings. This result set will have the following columns
name business name
id business id
total_value sum of value ratings for that business
total_quality sum of quality ditto
total_service sum of service ditto
review_count number of reviews for business "id"
avg_reviews_per_biz avg number of reviews per business
The last column has the same value for all rows of your query.
You can then loop over these row one business at a time doing your statistical computations.
I can't tell from your question where you're getting variables like $set_site_average_review_count, so I can't help with those computations.
You'll find that SQL aggregate querying is very powerful indeed.
I'm really stuck and have been searching for a while, but with no success.
Anyway, I have this formula known as the Bayesian Estimate: (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
And I have three database tables: persons, reviews, ratings.
The persons table is quite basic, but for the sake of this question, it only has one field: ID
The reviews table has id, personID, description where personID is the ID of the person.
The ratings table has id, personID, reviewID, ratingX, ratingY, ratingZ where person is the ID of the person and reviewID is the ID of the review. ratingX/Y/Z are three different ratings for the person, and in my page shows the average of the three numbers.
THE FORM THAT LISTS THEM, however, sorts them by the Bayesian Estimate formula. I do not know how to do this and it seems beyond me, since you cannot ORDER BY $bayesian_formula or anything like that. The script looks something like this:
<?php
$result = $db->query("SELECT * FROM persons");
$m =; //SQL to get average number of reviews of all persons
$c =; //SQL to get average rating of all persons
while( $row = $result->fetch_array() ){
$r =; //SQL to get average ratings of person
$v =; //SQL to get total number of reviews of person
$formula = ($v / ($v+$m)) * $r + ($m / ($v+$m)) * $c;
$result2 = $db->query("SELECT * FROM ratings ORDER BY $formula");
while( $row2 = $result2->fetch_array() ){
$result3 = $db->query("SELECT * FROM persons WHERE id='$row2[person]'");
$row3 = $result3->fetch_array();
echo $row2['description']."<br> rating: ".round( ($row['ratingX'] + $row['ratingY'] + $row['ratingZ']) / 3 );
}
}
?>
$formula loops to the database result's weighted rating each iteration.
Obviously that isn't correct. How would I make this work? Would I have to revise my entire script? The actual one is much longer and detailed.
edit:
sqls are:
$c_query=$db->query("SELECT ((ratingX + ratingY + ratingZ) / 3) as avg_rate FROM ratings");
$c_ = $c_query->fetch_array();
$c = $c_['avg_rate'];
$m_query=$db->query("SELECT COUNT(id) AS count FROM ratings");
$m_ = $m_query->fetch_array();
$m = $m_['count'];
$v_query=$db->query("SELECT COUNT(id) AS count FROM ratings WHERE person='$row[id]'");
$v_ = $v_query->fetch_array();
$v = $v_['count'];
$r_query=$db->query("SELECT ((ratingX + ratingY + ratingZ) / 3) as avg_rate FROM ratings WHERE person='$row[id]'");
$r_ = $r_query->fetch_array();
$r = $r_['avg_rate'];
You can use CREATE PROCEDURE
I've got a MySQL table with a bunch of entries in it, and a column called "Multiplier." The default (and most common) value for this column is 0, but it could be any number.
What I need to do is select a single entry from that table at random. However, the rows are weighted according to the number in the "Multiplier" column. A value of 0 means that it's not weighted at all. A value of 1 means that it's weighted twice as much, as if the entry were in the table twice. A value of 2 means that it's weighted three times as much, as if the entry were in the table three times.
I'm trying to modify what my developers have already given me, so sorry if the setup doesn't make a whole lot of sense. I could probably change it but want to keep as much of the existing table setup as possible.
I've been trying to figure out how to do this with SELECT and RAND(), but don't know how to do the weighting. Is it possible?
This guy asks the same question. He says the same as Frank, but the weightings don't come out right and in the comments someone suggests using ORDER BY -LOG(1.0 - RAND()) / Multiplier, which in my testing gave pretty much perfect results.
(If any mathematicians out there want to explain why this is correct, please enlighten me! But it works.)
The disadvantage would be that you couldn't set the weighting to 0 to temporarily disable an option, as you would end up dividing by zero. But you could always filter it out with a WHERE Multiplier > 0.
For a much better performance (specially on big tables), first index the weight column and use this query:
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY -LOG(1-RAND())/weight LIMIT 10) AS t2 ON t1.id = t2.id
On 40MB table the usual query takes 1s on my i7 machine and this one takes 0.04s.
For explanation of why this is faster see MySQL select 10 random rows from 600K rows fast
Don't use 0, 1 and 2 but 1, 2 and 3. Then you can use this value as a multiplier:
SELECT * FROM tablename ORDER BY (RAND() * Multiplier);
Well, I would put the logic of weights in PHP:
<?php
$weight_array = array(0, 1, 1, 2, 2, 2);
$multiplier = $weight_array[array_rand($weight_array)];
?>
and the query:
SELECT *
FROM `table`
WHERE Multiplier = $multiplier
ORDER BY RAND()
LIMIT 1
I think it will work :)
While I realise this is an question on MySQL, the following may be useful for someone using SQLite3 which has subtly different implementations of RANDOM and LOG.
SELECT * FROM table ORDER BY (-LOG(abs(RANDOM() % 10000))/weight) LIMIT 1;
weight is a column in table containing integers (I've used 1-100 as the range in my table).
RANDOM() in SQLite produces numbers between -9.2E18 and +9.2E18 (see SQLite docs for more info). I used the modulo operator to get the range of numbers down a bit.
abs() will remove the negatives to avoid problems with LOG which only handles non-zero positive numbers.
LOG() is not actually present in a default install of SQLite3. I used the php SQLite3 CreateFunction call to use the php function in SQL. See the PHP docs for info on this.
For others Googling this subject, I believe you can also do something like this:
SELECT strategy_id
FROM weighted_strategies AS t1
WHERE (
SELECT SUM(weight)
FROM weighted_strategies AS t2
WHERE t2.strategy_id<=t1.strategy_id
)>#RAND AND
weight>0
LIMIT 1
The total sum of weights for all records must be n-1, and #RAND should be a random value between 0 and n-1 inclusive.
#RAND could be set in SQL or inserted as a integer value from the calling code.
The subselect will sum up all the preceeding records' weights, checking it it exceeds the random value supplied.
<?php
/**
* Demonstration of weighted random selection of MySQL database.
*/
$conn = mysql_connect('localhost', 'root', '');
// prepare table and data.
mysql_select_db('test', $conn);
mysql_query("drop table if exists temp_wrs", $conn);
mysql_query("create table temp_wrs (
id int not null auto_increment,
val varchar(16),
weight tinyint,
upto smallint,
primary key (id)
)", $conn);
$base_data = array( // value-weight pair array.
'A' => 5,
'B' => 3,
'C' => 2,
'D' => 7,
'E' => 6,
'F' => 3,
'G' => 5,
'H' => 4
);
foreach($base_data as $val => $weight) {
mysql_query("insert into temp_wrs (val, weight) values ('".$val."', ".$weight.")", $conn);
}
// calculate the sum of weight.
$rs = mysql_query('select sum(weight) as s from temp_wrs', $conn);
$row = mysql_fetch_assoc($rs);
$sum = $row['s'];
mysql_free_result($rs);
// update range based on their weight.
// each "upto" columns will set by sub-sum of weight.
mysql_query("update temp_wrs a, (
select id, (select sum(weight) from temp_wrs where id <= i.id) as subsum from temp_wrs i
) b
set a.upto = b.subsum
where a.id = b.id", $conn);
$result = array();
foreach($base_data as $val => $weight) {
$result[$val] = 0;
}
// do weighted random select ($sum * $times) times.
$times = 100;
$loop_count = $sum * $times;
for($i = 0; $i < $loop_count; $i++) {
$rand = rand(0, $sum-1);
// select the row which $rand pointing.
$rs = mysql_query('select * from temp_wrs where upto > '.$rand.' order by id limit 1', $conn);
$row = mysql_fetch_assoc($rs);
$result[$row['val']] += 1;
mysql_free_result($rs);
}
// clean up.
mysql_query("drop table if exists temp_wrs");
mysql_close($conn);
?>
<table>
<thead>
<th>DATA</th>
<th>WEIGHT</th>
<th>ACTUALLY SELECTED<br />BY <?php echo $loop_count; ?> TIMES</th>
</thead>
<tbody>
<?php foreach($base_data as $val => $weight) : ?>
<tr>
<th><?php echo $val; ?></th>
<td><?php echo $weight; ?></td>
<td><?php echo $result[$val]; ?></td>
</tr>
<?php endforeach; ?>
<tbody>
</table>
if you want to select N rows...
re-calculate the sum.
reset range ("upto" column).
select the row which $rand pointing.
previously selected rows should be excluded on each selection loop. where ... id not in (3, 5);
SELECT * FROM tablename ORDER BY -LOG(RAND()) / Multiplier;
Is the one which gives you the correct distribution.
SELECT * FROM tablename ORDER BY (RAND() * Multiplier);
Gives you the wrong distribution.
For example, there are two entries A and B in the table. A is with weight 100 while B is with weight 200.
For the first one (exponential random variable), it gives you Pr(A winning) = 1/3 while the second one gives you 1/4, which is not correct.
I wish I can show you the math. However I do not have enough rep to post relevant link.
Whatever you do, it is giong to be terrible because it will involve:
* Getting the total "weights" for all columns as ONE number (including applying the multiplier).
* Getting a random number between 0 and that total.
* Getting all entries and runing them along, deducting the weight from the random number and choosing the one entry when you run out of items.
In average you will run along half the table. Performance - unless the table is small, then do it outside mySQL in memory - will be SLOW.
The result of the pseudo-code (rand(1, num) % rand(1, num)) will get more toward 0 and less toward num. Subtract the result from num to get the opposite.
So if my application language is PHP, it should look something like this:
$arr = mysql_fetch_array(mysql_query(
'SELECT MAX(`Multiplier`) AS `max_mul` FROM tbl'
));
$MaxMul = $arr['max_mul']; // Holds the maximum value of the Multiplier column
$mul = $MaxMul - ( rand(1, $MaxMul) % rand(1, $MaxMul) );
mysql_query("SELECT * FROM tbl WHERE Multiplier=$mul ORDER BY RAND() LIMIT 1");
Explanation of the code above:
Fetch the highest value in the Multiplier column
calculate a random Multiplier value (weighted toward the maximum value in the Multiplier column)
Fetch a random row which has that Multiplier value
It's also achievable merely by using MySQL.
Proving that the pseudo-code (rand(1, num) % rand(1, num)) will weight toward 0:
Execute the following PHP code to see why (in this example, 16 is the highest number):
$v = array();
for($i=1; $i<=16; ++$i)
for($k=1; $k<=16; ++$k)
isset($v[$i % $k]) ? ++$v[$i % $k] : ($v[$i % $k] = 1);
foreach($v as $num => $times)
echo '<div style="margin-left:', $times ,'px">
times: ',$times,' # num = ', $num ,'</div>';
#ali 's answer works great but you can not control how much your result skews toward higher or lower weights, you can change multiplier but it's not a very dynamic approach.
i optimized the code by adding POWER(weight,skewIndex) instead of weight which makes higher weights to appear more with values more than 1 for skewIndex and appear less with values between 0 and 1.
SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY -LOG(1-RAND())/POWER(weight,skewIndex) LIMIT 10) AS t2 ON t1.id = t2.id
you can analyze query results with
SELECT AVG(weight) FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY -LOG(1-RAND())/POWER(weight,skewIndex) LIMIT 10) AS t2 ON t1.id = t2.id
for example setting skewIndex to 3 gives me average of 78% while skewIndex of 1 gives average of 65%