I am currently working on a project that requires me to scan the Public Whip Raw Data and return a list of MP's names (who have voted for a policy that matches the keywords that have been input, eg "fox hunting). The current SQL query takes about 30 seconds to finish executing, which is way too long.
This is the SQL query that looks in the "distance" table and the "policy" table. (This is what is taking too long to execute)
$sql = "SELECT DISTINCT distance.mp_id from distance WHERE distance.distance < 0.2 AND distance.dream_id IN (SELECT dream_id from policy WHERE UPPER(policy.title) LIKE UPPER('%".$keyword."%')) ORDER BY distance.distance LIMIT 5";
This is the rest of the code that just echo's out the mp names
$results = mysql_query($sql);
echo "<ul>";
while ($row = mysql_fetch_array($results)) {
$mpid = $row['mp_id'];
$sql = "SELECT mp.first_name,mp.last_name FROM mp WHERE mp_id = ".$mpid;
$result = mysql_query($sql);
$result = mysql_fetch_assoc($result);
echo "<li>".$result['first_name']." ".$result['last_name']."</li>\n";
}
echo "</ul>";
This is your query:
SELECT DISTINCT distance.mp_id
from distance
WHERE distance.distance < 0.2 AND
distance.dream_id IN (SELECT dream_id
from policy
WHERE UPPER(policy.title) LIKE UPPER('%".$keyword."%')
)
ORDER BY distance.distance
LIMIT 5;
In some versions of MySQL, the in with a subquery is inefficient. Let me also assume that mp_id is unique for the table distance. This query might work better:
SELECT d.mp_id
from distance d
WHERE d.distance < 0.2 AND
exists (select 1
from policy p
where UPPER(p.title) LIKE UPPER('%".$keyword."%') and
p.dream_id = d.dream_id
)
ORDER BY d.distance
LIMIT 5;
This query would be further improved by having an index on policy(dream_id) and possibly distance(distance).
Depending on how large the policy table is, one major impediment to performance is the expression UPPER(policy.title) LIKE UPPER('%".$keyword."%'). If you really mean equality, then use equality and not like with wildcards. If you are really storing multiple keywords in the title column, then consider either breaking these out into a separate table or using full text search.
Related
On my website, I want to use a lot of different data from my database. Currently, I'm using four queries to gather different data. But is there a way to make it more efficient and put them into one big query? And how would I do that?
Edit: So the answer was to simply put all queries together into one and use as much data manipulation as possible in the database queries, and not in php.
$qry = "SELECT COUNT(*) cnt,
AVG(level) avg_lvl,
SUM(IF(onlinestatus=1, 1, 0)) online_cnt,
(SELECT Max(time) FROM refreshes) refresh_time
FROM players";
foreach ($db->query($qry) as $row){
$amount_total = $row['cnt'];
$average_level = floor($row['avg_lvl']);
$online_amount = $row['online_cnt'];
$milliseconds = $row['refresh_time'] + 1800000;
$update_time = DateTime::createFromFormat('U', intval($milliseconds / 1000));
}
You could combine all queries into one, like this:
$qry = "SELECT COUNT(*) cnt,
AVG(level) avg_lvl,
SUM(IF(onlinestatus=1, 1, 0)) online_cnt,
(SELECT Max(time) FROM refreshes) refresh_time
FROM rookstayers";
foreach ($db->query($qry) as $row){
$amount_total = $row['cnt']
$level = $row['avg_lvl'];
$online_amount = $row['online_cnt'];
$milliseconds = $row['refresh_time'] + 1800000;
$update_time = DateTime::createFromFormat('U', intval($milliseconds / 1000));
}
The last query you have seems to assume there is only one record in the result, as the loop would overwrite the previous result in each iteration. And as there is no order by in that query, it would be a bit of a gamble what the outcome would be. So I have taken the most recent time from the table in case there are multiple records there.
Note that that the above loop only executes once, as there is a guarantee to get exactly one result from the query.
The first and third queries can be combined into one:
select count(*) as num, sum(onlinestatus = 1) as numOnline
from rookstayers;
The second should be an aggregation:
select level, count(*) as cnt
from rookstayers
group by level;
The fourth is also an aggregation; I'm not sure exactly what the data looks like, but it seems to be something like:
select sum(time + 1800000)
from refreshes;
In general, you should do as much data manipulation in the database as you can. That is what databases are designed for.
EDIT:
The first, second, and third can be combined into:
select count(*) as num, sum(onlinestatus = 1) as numOnline,
avg(level) as avgLevel
from rookstayers;
I have project in php + mysql (over 2 000 000 rows). Please view this php code.
<?php
for($i=0;$i<20;$i++)
{
$start = rand(1,19980);
$select_images_url_q = "SELECT * FROM photo_gen WHERE folder='$folder' LIMIT $start,2 ";
$result_select = (mysql_query($select_images_url_q));
while($row = mysql_fetch_array($result_select))
{
echo '<li class="col-lg-2 col-md-3 col-sm-3 col-xs-4" style="height:150px">
<img class="img-responsive" src="http://static.gif.plus/'.$folder.'/'.$row['code'].'_s.gif">
</li>';
}
}
?>
This code work very slowly in $start = rand(1,19980); position, Please help how I can make select request with mysql random function, thank you
Depending on what your code is doing with $folder, you may be vulnerable to SQL injection.
For better security, consider moving to PDO or MySQLi and using prepared statements. I wrote a library called EasyDB to make it easier for developers to adopt better security practices.
The fast, sane, and efficient way to select N distinct random elements from a database is as follows:
Get the number of rows that match your condition (i.e. WHERE folder = ?).
Generate a random number between 0 and this number.
Select a row with a given offset like you did.
Store the ID of the previously generated row in an ever-growing list to exclude from the results, and decrement the number of rows.
An example that uses EasyDB is as follows:
// Connect to the database here:
$db = \ParagonIE\EasyDB\Factory::create(
'mysql;host=localhost;dbname=something',
'username',
'putastrongpasswordhere'
);
// Maintain an array of previous record IDs in $exclude
$exclude = array();
$count = $db->single('SELECT count(id) FROM photo_gen WHERE folder = ?', $folder);
// Select _up to_ 40 values. If we have less than 40 in the folder, stop
// when we've run out of photos to load:
$max = $count < 40 ? $count : 40;
// The loop:
for ($i = 0; $i < $max; ++$i) {
// The maximum value will decrease each iteration, which makes
// sense given that we are excluding one more result each time
$r = mt_rand(0, ($count - $i - 1));
// Dynamic query
$qs = "SELECT * FROM photo_gen WHERE folder = ?";
// We add AND id NOT IN (2,6,7,19, ...) to prevent duplicates:
if ($i > 0) {
$qs .= " AND id NOT IN (" . implode(', ', $exclude) . ")";
}
$qs .= "ORDER BY id ASC LIMIT ".$r.", 1";
$row = $db->row($qs, $folder);
/**
* Now you can operate on $row here. Feel free to copy the
* contents of your while($row=...) loop in place of this comment.
*/
// Prevent duplicates
$exclude []= (int) $row['id'];
}
Gordon's answer suggests using ORDER BY RAND(), which in general is a bad idea and can make your queries very slow. Furthermore, although he says that you shouldn't need to worry about there being less than 40 rows (presumably, because of the probability involved), this will fail in edge cases.
A quick note about mt_rand(): It's a biased and predictable random number generator with only 4 billion possible seeds. If you want better results, look into random_int() (PHP 7 only, but I'm working on a compatibility layer for PHP 5 projects. See the linked answer for more information.)
Actually, even though the table has 2+ million rows, I'm guessing that a given folder has many fewer. Hence, this should be reasonable with an index on photo_gen(folder):
SELECT *
FROM photo_gen
WHERE folder = '$folder'
ORDER BY rand()
LIMIT 40;
If a folder can still have tens or hundreds of thousands of examples, I would suggest a slight variation:
SELECT pg.**
FROM photo_gen pg cross join
(select count(*) cnt from photo_gen where folder = $folder) as cnt
WHERE folder = '$folder' and
rand() < 500 / cnt
ORDER BY rand()
LIMIT 40;
The WHERE expression should get about 500 rows (subject to the vagaries of sample variation). There is a really high confidence that there will be at least 40 (you don't need to worry about it). The final sort should be fast.
There are definitely other methods, but they are complicated by the where clause. The index is probably the key thing you need for improved performance.
It's better to firstly compose your SQL query (as a string in PHP) once and then just execute it once.
Or you could use this way to select values if it fits your case: Select n random rows from SQL Server table
I have been running a foreach loop 1000 times on php page. The code inside the foreach loop looks like below:
$first = mysql_query("SELECT givenname FROM first_names order by rand() LIMIT 1");
$first_n = mysql_fetch_array($first);
$first_name = $first_n['givenname'];
$last = mysql_query("SELECT surname FROM last_name order by rand() LIMIT 1");
$last_n = mysql_fetch_array($last);
$last_name = $last_n['surname'];
$first_lastname = $first_name . " " . $last_name;
$add = mysql_query("SELECT streetaddress FROM user_addresss order by rand() LIMIT 1");
$addr = mysql_fetch_array($add);
$address = $addr['streetaddress'];
$unlisted = "unlisted";
$available = "available";
$arr = array(
$first_lastname,
$address,
$unlisted,
$available
);
Then I have been using array_rand function to get a randomized value each time the loop runs:
<td><?php echo $arr[array_rand($arr)] ?></td>
So loading the php page is taking a really long time. Is there a way I could optimize this code. As I need a unique value each time the loop runs
The problem is not your PHP foreach loop. If you order your MySQL table by RAND(), you are making a serious mistake. Let me explain to you what happens when you do this.
Every time you make a MySQL request, MySQL will attempt to map your search parameters (WHERE, ORDER BY) to indices to cut down on the data read. It will then load the relevant info in memory for processing. If the info is too large, it will default to writing it to disk and reading from disk to perform the comparison. You want to avoid disk reads at all costs as they are inefficient, slow, repetitive and can sometimes be flat-out wrong under specific circumstances.
When MySQL finds an index that is possible to be used, it will load the index table instead. An index table is a hash table between memory location and the value of the index. So, for instance, the index table for a primary key looks like this:
id location
1 0 bytes in
2 17 bytes in
3 34 bytes in
This is extremely efficient as even very large index tables can fit in tiny amounts of memory.
Why am I talking about indices? Because by using RAND(), you are preventing MySQL from using them. ORDER BY RAND() forces MySQL to create a new random value for each row. This requires MySQL to copy all your table data in what is called a temporary table, and to add a new field with the RAND() value. This table will be too big to store in memory, so it will be stored to disk.
When you tell MySQL to ORDER BY RAND(), and the table is created, MySQL will then compare every single row by pairs (MySQL sorting uses quicksort). Since the rows are too big, you're looking at quite a few disk reads for this operation. When it is done, it returns, and you get your data -at a huge cost.
There are plenty of ways to prevent this massive overhead SNAFU. One of them is to select ID from RAND() to maximum index and limit by 1. This does not require the creation of an extra field. There are plenty of similar Stack questions.
It has already been explained why ORDER BY RAND() should be avoided, so I simply provide a way to do it with some faster queries.
First get a random number based on your table size:
SELECT FLOOR(RAND()*COUNT(*)) FROM first_names
Second use the random number in a limit
SELECT * FROM first_names $pos,1
Unfortunately I don't think there is any way to combine the two queries into one.
Also you can do a SELECT COUNT(*) FROM first_names, store the number, and generate random $pos in PHP as many times as you like.
You should switch to using either mysqli or pdo if your host supports it but something like this should work. You will have to determine what you want to do if you don't have a enough record in either of the tables though (array_pad or wrap the indexes and restart)
function getRandomNames($qty){
$qty = (int)$qty;
$fnames = array();
$lnames = array();
$address = array();
$sel =mysql_query("SELECT givenname FROM first_names order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){$fnames[] = $rec[0]; }
$sel =mysql_query("SELECT surname FROM last_name order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){ $lnames[] = $rec[0]; }
$sel =mysql_query("SELECT streetaddress FROM user_addresss order by rand() LIMIT ".$qty);
while ($rec = mysql_fetch_array($sel)){ $address[] = $rec[0]; }
// lets stitch the results together
$results = array();
for($x = 0; $x < $qty; $x++){
$results[] = array("given_name"=>$fnames[$x], "surname"=>$lnames[$x], "streetaddress"=>$address[$x]);
}
return $results;
}
Hope this helps
UPDATE
Based on Sébastien Renauld's answer a more complete solution may be to structure the queries more like
"SELECT givenname from first_names where id in (select id from first_names order by rand() limit ".$qty.")";
Part of my page I have lots of small little queries, probably about 6 altogether, grabbing data from different tables. As an example:
$sql_result = mysql_query("SELECT * FROM votes WHERE voted_on='$p_id' AND vote=1", $db);
$votes_up = mysql_num_rows($sql_result);
$sql_result = mysql_query("SELECT * FROM votes WHERE voted_on='$p_id' AND vote=0", $db);
$votes_down = mysql_num_rows($sql_result);
$sql_result = mysql_query("SELECT * FROM kids WHERE (mother_id='$p_id' OR father_id='$p_id')", $db);
$kids = mysql_num_rows($sql_result);
Would it be better if these were all grabbed in one query to save trips to the database? One query is better than 6 isn't it?
Would it be some kind of JOIN or UNION?
Its not about number of queries but amount of useful datas you transfer. If you are running database on localhost, is better to let sql engine to solve queries instead computing results in additional programs. The same if you are thinking about who should be more bussy. Apache or mysql :)
Of course you can use some conditions:
SELECT catName,
SUM(IF(titles.langID=1, 1, 0)) AS english,
SUM(IF(titles.langID=2, 1, 0)) AS deutsch,
SUM(IF(titles.langID=3, 1, 0)) AS svensk,
SUM(IF(titles.langID=4, 1, 0)) AS norsk,
COUNT(*)
FROM titles, categories, languages
WHERE titles.catID = categories.catID
AND titles.langID = languages.
example used from MYSQL Bible :)
If you really want to lower the number of queries, you can put the first two together like this:
$sql_result = mysql_query("SELECT * FROM votes WHERE voted_on='$p_id'", $db);
while ($row = mysql_fetch_array($sql_result))
{
extract($row);
if ($vote=='0') ++$votes_up; else ++$votes_down;
}
The idea of joining tables is that these tables are expected to have something in between (a relation, for example).
Same is for the UNION SELECTS, which are prefered to be avoided.
If you want your solution to be clean and scalable in future, I suggest you to use mysqli, instead of mysql module of PHP.
Refer to: mysqli::multi_query. There is OOP variant, where you create mysqli object and call the function as method.
Then, your query should look like:
// I use ; as the default separator of queries, but it might be different in your case.
// The above could be set with sql statement: DELIMITER ;
$query = "
SELECT * FROM votes WHERE voted_on='$p_id' AND vote=1;
SELECT * FROM votes WHERE voted_on='$p_id' AND vote=0;
SELECT * FROM kids WHERE (mother_id='$p_id' OR father_id='$p_id');
";
$results = mysqli_multi_query($db, $query); // Returns an array of results
Fewer queries are (generally, not always) better, but it's also about keeping your code clear enough that others can understand the query. For example, in the code you provided, keep the first two together, and leave the last one separate.
$sql_result = mysql_query("SELECT vote, COUNT(*) AS vote_count
FROM votes
WHERE voted_on='$p_id'
GROUP BY vote", $db);
The above will return to you two rows, each containing the vote value (0 or 1) and the vote count for the value.
I have the following 3 tables in the database.
Programs_Table
Program_ID (Primary Key)
Start_Date
End_Date
IsCompleted
IsGoalsMet
Program_type_ID
Programs_Type_Table(different types of programs, supports a dropdown list in the form)
Program_type_ID (Primary Key)
Program_name
Program_description
Client_Program_Table
Client_ID (primary key)
Program_ID (primary key)
What is the best way to find out how many clients are in a specific program (program type)?
Would the following SQL statement be the best way, or even plausible?
SELECT Client_ID FROM Client_Program_Table
INNER JOIN Programs_Table
ON Client_Program_Table.Program_ID = Programs_Table.Program_ID
WHERE Programs_Table.Program_type_ID = "x"
where "x" is the Program_type_ID of the specific program we're interested in.
OR is the following a better way?
$result = mysql_query("SELECT Program_ID FROM Programs_Table
WHERE Program_type_ID = 'x'");
$row = mysql_fetch_assoc($result);
$ProgramID = $row['Program_ID'];
$result = mysql_query("SELECT * FROM Client_Program_Table
WHERE Program_ID = '$ProgramID'");
mysql_num_rows($result) // returns how many rows of clients we pulled.
Thank you in advance, please excuse my inexperience and any mistakes that I've made.
Here is how you can do it:
<?php
// always initialize a variable
$number_of_clients = 0;
// escape the string which will go in an SQL query
// to protect yourself from SQL injection
$program_type_id = mysql_real_escape_string('x');
// build a query, which will count how many clients
// belong to that program and put the value on the temporary colum "num_clients"
$query = "SELECT COUNT(*) `num_clients` FROM `Client_Program_Table` `cpt`
INNER JOIN `Programs_Table` `pt`
ON `cpt`.`Program_ID` = `pt`.`Program_ID`
AND `pt`.`Program_type_ID` = '$program_type_id'";
// execute the query
$result = mysql_query($query);
// check if the query executed correctly
// and returned at least a record
if(is_resource($result) && mysql_num_rows($result) > 0){
// turn the query result into an associative array
$row = mysql_fetch_assoc($result);
// get the value of the "num_clients" temporary created column
// and typecast it to an intiger so you can always be safe to use it later on
$number_of_clients = (int) $row['num_clients'];
} else{
// query did not return a record, so we have no clients on that program
$number_of_clients = 0;
}
?>
If you want to know how many clients are involved in a program, you'd rather want to use COUNT( * ). MySQL (with MyISAM) and SQL Server have a fast way to retrieve the total number of lines. Using a SELECT(*), then mysql_num_rows leads to unnecessary memory ressources and computing time. To me, this is the fastest, though not the "cleanest" way to write the query you want:
SELECT
COUNT(*)
FROM
Client_Program_Table
WHERE
Program_ID IN
(
SELECT
Program_ID
FROM
Programs_Table
WHERE
Program_type_ID = 'azerty'
)
Why is that?
Using JOIN make queries more readable, but subqueries often prove to be computed faster.
This returns a count of the clients in a specific program type (x):
SELECT COUNT(cpt.Client_ID), cpt.Program_ID
FROM Client_Program_Table cpt
INNER JOIN Programs_Table pt ON cpt.Program_ID=pt.Program_ID
WHERE pt.Program_type_ID = "x"
GROUP BY cpt.Program_ID