I would like to SELECT certain data out of my mysql DB. I am working with a php loop and a sql statement with a LIMIT and UNION.
Problem: The speed of my query is terrible. One UNION statement tooks 2-4 seconds. Due to the loop the Overall-Query takes 3 Minutes.
Is there a chance to optimize my query?
I tried to separate the "three" statements and merge the results. But this is not really faster. So I think that the UNION is not my problem.
PHP/SQL:
My code is running through two-foreach-loops. The code is working properly. But the performance is the problem.
$sql_country = "SELECT country FROM country_list";
foreach ($db->query($sql_country) as $row_country) { //first loop (150 entries)
$sql_color = "SELECT color FROM color_list";
foreach ($db->query($sql_color) as $row_color) { //second loop (10 entries)
$sql_all = "(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 2.5 AND 4.5
order by price DESC LIMIT 2)
UNION
(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 5.5 AND 8.2
order by price DESC LIMIT 2)
UNION
(SELECT ID, price FROM company
WHERE country = '".$row_country['country']."'
AND color = '".$row_color['color']."'
AND price BETWEEN 8.5 AND 10.8
order by price DESC LIMIT 2)";
foreach ($db->query($sql_all) as $row_all) {
$shopID[] = $row_all['ID']; //I just need these IDs
}
}
}
Do you have any idea or hints to get this faster?
An index on (country, color, price, ID) should improve the performance of single queries from seconds to a couple of milliseconds or even less. But you still have the problem of executing 1500 queries. Depending on your system, a single query execution can add an overhead of about 10 ms, which would add up to 15 seconds in your case. You need to find a way to minimize the number of queries - In best case to a single query.
For low limits (like 2 in your case), you can combine multiple LIMIT 1 subqueries with different offsets. I would generate such a query dynamically.
$priceRanges = [
['2.5', '4.5'],
['5.5', '8.2'],
['8.5', '10.8'],
];
$limit = 2;
$offsets = range(0, $limit - 1);
$queryParts = [];
foreach ($priceRanges as $range) {
$rangeFrom = $range[0];
$rangeTo = $range[1];
foreach ($offsets as $offset) {
$queryParts[] = "
select (
select ID
from company cmp
where cmp.country = cnt.country
and cmp.color = clr.color
and cmp.price between {$rangeFrom} AND {$rangeTo}
order by cmp.price desc
limit 1
offset {$offset}
) as ID
from country_list cnt
cross join color_list clr
having ID is not null
";
}
}
$query = implode(' UNION ALL ', $queryParts);
This will generate a quite long UNION query. You can see a PHP demo on rexester.com and SQL demo on db-fiddle.com.
I can't guarantee it will be any faster. But it's worth a try.
Related
I could not get rid of a problem and want your help.
This is what I want to do briefly;
There are job postings in Table A, and these job posts have ratings in table B. When I list the job postings, I want to list the job postings by the highest percentage percentage.
$query = $db->query("SELECT job_listing.id,
FROM job_listing
LEFT JOIN job_order_ratings ON job_order_ratings.job_id = job_listing.id
WHERE job_listing.job_status = 2
ORDER BY *** ASC")->fetchAll(PDO::FETCH_ASSOC);
score table
I normally calculate the score rating of a job this way. I want to list the points according to the total rate while listing
public function rateTotal($rate1, $rate2, $rate3, $count= '')
{
$rate_total = $rate1 + $rate2 + $rate3;
$total = $rate_total / $count;
return number_format(round(($total / 3), 1), 1, '.', '');
}
Since there are three ratings, I divide it into the last three
When I list job postings in sql query, I want it to list by percentage. It has to calculate just like I did in the rateTotal function.
I'm sorry for my bad english
$sql = "SELECT
job_listing.id,
FORMAT((SUM(column1 + column2 + column3)/ COUNT(column)) / 3, 2) AS 'rating
FROM job_listing
LEFT JOIN job_order_ratings ON job_order_ratings.job_id =
job_listing.id
WHERE job_listing.job_status = 2 ORDER BY job_order_ratings.field DESC"
Did you try this way?
When u use ASC (ascending), you will order numbers from smaller to bigger, if you use DESC (descending) is from bigger to smaller number 10,9,8,7,6,5...
I'm trying to display and sort an array by an average created using data from a database. I'm retrieving three variables from the database and creating an average from these values. This value is then placed inside a new array to be sorted along with the rest of the database data.
Am I right in thinking that having the SQL query inside the loop isn't a great idea? (Performance issue?)
Is there any alternative that's available? I've attached the code below:
^ database connection/query string to retrieve all data...
$result = $stmt_business_list->fetchAll(PDO::FETCH_ASSOC);
$items = array();
foreach($result as $row){
$single_business_id = $row['id'];
$name = $row['name'];
//Query to get ALL the service, value and quality ratings for certain business
$test_query = "SELECT * FROM rating WHERE business_id = $single_business_id";
$test_query_stmt = $dbh->prepare($test_query);
$test_query_stmt->execute();
$test_row = $test_query_stmt->fetchAll(PDO::FETCH_ASSOC);
$total_value = $total_quality = $total_service = 0;
foreach($test_row as $review)
{
$total_value += $review['value'];
$total_quality += $review['quality'];
$total_service += $review['service'];
}
$bayesian_value = (($set_site_average_review_count * $set_site_average_review_score) + $total_value) / ($set_site_average_review_count + $business_review_count);
$bayesian_quality = (($set_site_average_review_count * $set_site_average_review_score) + $total_quality) / ($set_site_average_review_count + $business_review_count);
$bayesian_service = (($set_site_average_review_count * $set_site_average_review_score) + $total_service) / ($set_site_average_review_count + $business_review_count);
$average_bayesian_rating = ($bayesian_value + $bayesian_quality + $bayesian_service) / 3;
$average_bayesian_rating = $average_bayesian_rating;
array_push($items, array(
"id"=>"$single_business_id",
"name"=>"$name",
"value"=>"$total_value",
"quality"=>"$total_quality",
"service"=>"$total_service",
"average"=>"$average_bayesian_rating"));
echo
'Name: '.$name.'<br>
Value: '.$total_value.'<br>
Quality: '.$total_quality.'<br>
Service: '.$total_service.'<br>
Average: '.$average_bayesian_rating.'<br><br>';
}
}
The page will be split up by a separate pagination script and will only display 6 objects at a time, but over time this may change so I do have an eye on performance as much as I can.
SQL aggregate queries are made for this kind of thing.
Use this query to summarize the results
SELECT b.name, b.id,
SUM(value) total_value,
SUM(quality) total_quality,
SUM(service) total_service,
COUNT(*) review_count,
avg_reviews_per_biz
FROM business b
JOIN ratings r ON b.id = r.business_id
JOIN (
SELECT COUNT(DISTINCT business_id) / COUNT(*) avg_reviews_per_biz
FROM ratings
) a ON 1=1
GROUP BY b.name, b.id, avg_review_per_biz
This will give you one row per business showing the summed ratings and the number of ratings. This result set will have the following columns
name business name
id business id
total_value sum of value ratings for that business
total_quality sum of quality ditto
total_service sum of service ditto
review_count number of reviews for business "id"
avg_reviews_per_biz avg number of reviews per business
The last column has the same value for all rows of your query.
You can then loop over these row one business at a time doing your statistical computations.
I can't tell from your question where you're getting variables like $set_site_average_review_count, so I can't help with those computations.
You'll find that SQL aggregate querying is very powerful indeed.
I would like get number of records in a table then divide them by 4, after dividing them by 4 i want to create sql statements with limit ranges based on my result. For example I have a table with 8 records I divide by 4, I will create 2 sql statements with a limit range like limit 0,4 and limit 4,8
Final results will look like
Select * from prop where id=123 LIMIT 0,4
Select * from prop where id=123 LIMIT 4,8
My approach was to have for loop which will count the number of sql statements to be made.
Then in the loop: first circle 0-4 and second will be 4-8
Am struggling on the limit 0-4 and limit 4-8
PHP script
include('connect.php');
$query_1 = "Select COUNT(*) as Total from prop where ref = 'SB2004'";
$results_query_1 = mysql_query($query_1);
while($row_query_1 = mysql_fetch_array($results_query_1))
{
$cnt = $row_query_1['Total'];
}
echo $cnt;
echo "<br>";
$num_grps = 0;
if ($cnt % 4 == 0 )
{
echo $num_grps = $cnt / 4 ;
}
$count_chk= $num_grps * 4;
for ($i=1;$i<=$num_grps;$i++)
{
//for loop for range
for()
{
$range = '0,4';
echo "SELECT prop_ref from prop limit".$range;
}
}
Either you've not understood the problem or haven't explained it very well.
The most immediate problem here is that you have misunderstood the syntax for the LIMIT clause. The first argument specifies the offset to start at and the second defines the number of rows to return, hence LIMIT 4,8 will return 8 rows (assuming there are 12 or more rows in the dataset).
The next issue is that you've not said if the results need to be reproducible - e.g. if you have rows with primary keys 1 and 2, should these always be returned in the same query. In the absence of an explicit ORDER BY clause, the rows will be returned based on the order in which they are found by the query.
The next issue is that you've not explained how you want to deal with the last case when the total number of rows is not an even multiple of 4.
The code you've provided counts the number of rows where ref = 'SB2004' but then creates queries which are not filtered - why?
The code you've provided does not change the limit in the queries - why?
The next issue is that there is never a good reason for running SELECT queries inside a loop like this. You didn't exlpain what you intend doing with the select queries. But based on the subsequent update....
include('connect.php');
$query_1 = "Select COUNT(*) as Total from prop where ref = 'SB2004'";
$cnt = mysql_fetch_assoc(mysql_query($query_1));
$blocks=$cnt['Total']/4 + (0 == $cnt['Total'] % 4 ? 0 : 1);
$qry2="SELECT * FROM prop where ref='SB2004' ORDER BY primary_key";
$res=mysql_fetch_assoc($qry2);
for ($x=0; $x<$blocks; $x++) {
print "<div>\n$block<br />\n";
for ($y=0; $y<4; $y++) {
print implode(",", #mysql_fetch_assoc($res)). "\n";
}
print "</div>\n";
}
It's trivial to refine this further to only issue a single query to the database.
If you really must generate individual SELECTs....
include('connect.php');
$query_1 = "Select COUNT(*) as Total from prop where ref = 'SB2004'";
$cnt = mysql_fetch_assoc(mysql_query($query_1));
$blocks=$cnt['Total']/4 + (0 == $cnt['Total'] % 4 ? 0 : 1);
for ($x=0; $x<$blocks; $x++) {
$y=$x*4;
print "SELECT * FROM prop where ref='SB2004'
ORDER BY primary_key LIMIT $y,4<br />\n"
}
i have this code:
while ($sum<16 || $sum>18){
$totala = 0;
$totalb = 0;
$totalc = 0;
$ranka = mysql_query("SELECT duration FROM table WHERE rank=1 ORDER BY rand() LIMIT 1");
$rankb = mysql_query("SELECT duration FROM table WHERE rank=2 ORDER BY rand() LIMIT 1");
$rankc = mysql_query("SELECT duration FROM table WHERE rank=3 ORDER BY rand() LIMIT 1");
while ($rowa = mysql_fetch_array($ranka)) {
echo $rowa['duration'] . "<br/>";
$totala = $totala + $rowa['duration'];
}
while ($rowb = mysql_fetch_array($rankb)) {
$totalb = $totalb + $rowb['duration'];
}
while ($rowc = mysql_fetch_array($rankc)) {
$totalc = $totalc + $rowc['duration'];
}
$sum=$totala+$totalb+$totalc;
}
echo $sum;
It works fine, But the problem is until "$sum=16" the "echo $rowa['duration']" executes, the question is, is there a away to "echo" only the latest executed code in the "while ($rowa = mysql_fetch_array($ranka))" i this while loop?
Because most of the times returns all the numbers until the "$sum=16"
You are explicitly echoing the $rowa['duration'] in the first inner while loop. If you only want to print the last duration from the $ranka set, simple change the echo to $rowa_duration = $rowa['duration'] then echo it outside the loop.
while ($rowa = mysql_fetch_array($ranka)) {
$rowa_duration = $rowa['duration'];
$totala = $totala + $rowa['duration'];
}
echo $rowa_duration . '<br/>';
What you are doing there is bad on multiple levels. And your english horrid. Well .. practice makes perfect. You could try joining ##php chat room on FreeNode server. That would improve both your english and php skills .. it sure helped me a lot. Anyway ..
The SQL
First of all, to use ORDER BY RAND() is extremely ignorant (at best). As your tables begin the get larger, this operation will make your queries slower. It has n * log2(n) complexity, which means that selecting querying table with 1000 entries will take ~3000 times longer then querying table with 10 entries.
To learn more about it , you should read this blog post, but as for your current queries , the solution would look like:
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
This would select random duration from the table.
But since you you are actually selecting data with 3 different ranks ( 1, 2 and 3 ), it would make sense to create a UNION of three queries :
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 1
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 2
LIMIT 1
UNION ALL
SELECT duration
FROM table
JOIN (SELECT CEIL(RAND()*(SELECT MAX(id) FROM table)) AS id) as choice
WHERE
table.id >= choice.id
rank = 3
LIMIT 1
Look scary, but it actually will be faster then what you are currently using, and the result will be three entries from duration column.
PHP with SQL
You are still using the old mysql_* functions to access database. This form of API is more then 10 years old and should not be used, when writing new code. The old functions are not maintained (fixed and/or improved ) anymore and even community has begun the process of deprecating said functions.
Instead you should be using either PDO or MySQLi. Which one to use depends on your personal preferences and what is actually available to you. I prefer PDO (because of named parameters and support for other RDBMS), but that's somewhat subjective choice.
Other issue with you php/mysql code is that you seem to pointlessly loop thought items. Your queries have LIMIT 1, which means that there will be only one row. No point in making a loop.
There is potential for endless loop if maximum value for duration is 1. At the start of loop you will have $sum === 15 which fits the first while condition. And at the end that loop you can have $sum === 18 , which satisfies the second loop condition ... and then it is off to the infinity and your SQL server chokes.
And if you are using fractions for duration, then the total value of 3 new results needs to be even smaller. Just over 2. Start with 15.99 , ends with 18.01 (that's additional 2.02 in duration or less the 0.7 per each). Again .. endless loop.
Suggestion
Here is how i would do it:
$pdo = new PDO('mysql:dbname=my_db;host=localhost', 'username', 'password');
$pdo->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
$sum = 0;
while ( $sum < 16 )
{
$query = 'that LARGE query above';
$statement = $pdo->prepare( $query );
if ( $statement->execute() )
{
$data = $statement->fetchAll( PDO::FETCH_ASSOC );
$sum += $data[0]['duration']+$data[1]['duration']+$data[2]['duration'];
}
}
echo $data[0]['duration'];
This should do what your code did .. or at least, what i assume, was your intentions.
I want to show a random record from the database. I would like to be able to show X number of random records if I choose. Therefore I need to select the top X records from a randomly selected list of IDs
(There will never be more than 500 records involved to choose from, unless the earth dramatically increases in size. Currently there are 66 possibles.)
This function works, but how can I make it better?
/***************************************************/
/* RandomSite */
//****************/
// Returns an array of random site IDs or NULL
/***************************************************/
function RandomSite($intNumberofSites = 1) {
$arrOutput = NULL;
//open the database
GetDatabaseConnection('dev');
//inefficient
//$strSQL = "SELECT id FROM site_info WHERE major <> 0 ORDER BY RAND() LIMIT ".$intNumberofSites.";";
//Not wonderfully random
//$strSQL = "SELECT id FROM site_info WHERE major <> 0 AND id >= (SELECT FLOOR( COUNT(*) * RAND()) FROM site_info ) ORDER BY id LIMIT ".$intNumberofSites.";";
//Manual selection from available pool of candidates ?? Can I do this better ??
$strSQL = "SELECT id FROM site_info WHERE major <> 0;";
if (is_numeric($intNumberofSites))
{
//excute my query
$result = #mysql_query($strSQL);
$i=-1;
//create an array I can work with ?? Can I do this better ??
while ($row = mysql_fetch_array($result, MYSQL_NUM))
{
$arrResult[$i++] = $row[0];
}
//mix them up
shuffle($arrResult);
//take the first X number of results ?? Can I do this better ??
for ($i=0;$i<$intNumberofSites;$i++)
{
$arrOutput[$i] = $arrResult[$i];
}
}
return $arrOutput;
}
UPDATE QUESTION:
I know about the ORDER BY RAND(), I just don't want to use it because there are rumors it isn't the best at scaling and performance. I am being overly critical of my code. What I have works, ORDER BY RAND() works, but can I make it better?
MORE UPDATE
There are holes in the IDs. There is not a ton of churn, but any churn that happens needs to be approved by our team, and therefore could handled to dump any caching.
Thanks for the replies!
Why not use the Rand Function in an orderby in your database query? Then you don't have to get into randomizing etc in code...
Something like (I don't know if this is legal)
Select *
from site_info
Order by Rand()
LIMIT N
where N is the number of records you want...
EDIT
Have you profiled your code vs. the query solution? I think you're just pre-optimizing here.
If you dont want to select with order by rand().
Instead of shuffeling, use array_rand on the result:
$randKeys = array_rand($arrResult, $intNumberofSites);
$arrOutput = array_intersect_key(array_flip($randKeys), $arrResult);
edit: return array of keys not new array with key => value
Well, I don't think that ORDER BY RAND() would be that slow in a table with only 66 rows, but you can look into a few different solutions anyway.
Is the data really sparse and/or updated often (so there are big gaps in the ids)?
Assuming it's not very sparse, you could select the max id from the table, use PHP's built-in random function to pick N distinct numbers between 1 and the max id, and then attempt to fetch the rows with those ids from the table. If you get back less rows than you picked numbers, get more random numbers and try again, until you have the number of rows needed. This may not be particularly fast either.
If the data is sparse, I would set up a secondary "id-type" column that you make sure is sequential. So if there are 66 rows in the table, ensure that the new column contains the values 1-66. Whenever rows are added to or removed from the table, you will have to do some work to adjust the values in this column. Then use the same technique as above, picking random IDs in PHP, but you don't have to worry about the "missing ID? retry" case.
Here are the three functions I wrote and tested
My answer
/***************************************************/
/* RandomSite1 */
//****************/
// Returns an array of random rec site IDs or NULL
/***************************************************/
function RandomSite1($intNumberofSites = 1) {
$arrOutput = NULL;
GetDatabaseConnection('dev');
$strSQL = "SELECT id FROM site_info WHERE major <> 0;";
if (is_numeric($intNumberofSites))
{
$result = #mysql_query($strSQL);
$i=-1;
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$arrResult[$i++] = $row[0]; }
//mix them up
shuffle($arrResult);
for ($i=0;$i<$intNumberofSites;$i++) {
$arrOutput[$i] = $arrResult[$i]; }
}
return $arrOutput;
}
JPunyon and many others
/***************************************************/
/* RandomSite2 */
//****************/
// Returns an array of random rec site IDs or NULL
/***************************************************/
function RandomSite2($intNumberofSites = 1) {
$arrOutput = NULL;
GetDatabaseConnection('dev');
$strSQL = "SELECT id FROM site_info WHERE major<>0 ORDER BY RAND() LIMIT ".$intNumberofSites.";";
if (is_numeric($intNumberofSites))
{
$result = #mysql_query($strSQL);
$i=0;
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$arrOutput[$i++] = $row[0]; }
}
return $arrOutput;
}
OIS with a creative solution meeting the intend of my question.
/***************************************************/
/* RandomSite3 */
//****************/
// Returns an array of random rec site IDs or NULL
/***************************************************/
function RandomSite3($intNumberofSites = 1) {
$arrOutput = NULL;
GetDatabaseConnection('dev');
$strSQL = "SELECT id FROM site_info WHERE major<>0;";
if (is_numeric($intNumberofSites))
{
$result = #mysql_query($strSQL);
$i=-1;
while ($row = mysql_fetch_array($result, MYSQL_NUM)) {
$arrResult[$i++] = $row[0]; }
$randKeys = array_rand($arrResult, $intNumberofSites);
$arrOutput = array_intersect_key($randKeys, $arrResult);
}
return $arrOutput;
}
I did a simple loop of 10,000 iterations where I pulled 2 random sites. I closed and opened a new browser for each function, and cleared the cached between run. I ran the test 3 times to get a simple average.
NOTE - The third solution failed at pulling less than 2 sites as the array_rand function has different output if it returns a set or single result. I got lazy and didn't fully implement the conditional to handle that case.
1 averaged: 12.38003755 seconds
2 averaged: 12.47702177 seconds
3 averaged: 12.7124153 seconds
mysql_query("SELECT id FROM site_info WHERE major <> 0 ORDER BY RAND() LIMIT $intNumberofSites")
EDIT
Damn, JPunyon was a bit quicker :)
Try this:
SELECT
#nv := #min + (RAND() * (#max - #min)) / #lc,
(
SELECT
id
FROM site_info
FORCE INDEX (primary)
WHERE id > #nv
ORDER BY
id
LIMIT 1
),
#max,
#min := #nv,
#lc := #lc - 1
FROM
(
SELECT #min := MIN(id)
FROM site_info
) rmin,
(
SELECT #max := MAX(id)
FROM site_info
) rmax,
(
SELECT #lc := 5
) l,
site_info
LIMIT 5
This will select a random ID on each iteration using index, in descending order.
There is slight chance, though, that you get less results that you wanted, as it gives no second chance to the missed id's.
The more percent of rows you select, the bigger is the chance.
I would simply use the rand() function (I assume you are using MySQL)...
SELECT id, rand() as rand_idx FROM site_info WHERE major <> 0 ORDER BY rand_idx LIMIT x;
I'm with JPunyon. Use ORDER BY RAND() LIMIT $N. I think you'll get a bigger performance hit from $arrResult having and shuffling so many (unused) entries than from using the MySQL RAND() function.
function getSites ( $numSites = 5 ) {
// Sanitize $numSites if necessary
$result = mysql_query("SELECT id FROM site_info WHERE major <> 0 "
."ORDER BY RAND() LIMIT $numSites");
$arrResult = array();
while ( $row = mysql_fetch_array($result,MYSQL_NUM) ) {
$arrResult[] = $row;
}
return $arrResult;
}