increase speed of database query with mysql

increase speed of database query with mysql - php

I have the following query for a type-ahead search (as you type into the form it displays matches in a drop down). This query worked well until I switched to a database with about a million records. Now it takes 15 seconds for the match to be displayed.
Because search hits are displayed as you type, the query is inside a loop. Is there anything about this query that can be changed to speed it up?
$diagnosis = isset($_GET['diagnosis']) ? $_GET['diagnosis'] : '';
$data = array();
if ($diagnosis) {
$query = explode(' ', $diagnosis);
for ($i = 0, $c = count($query); $i < $c; $i ++) {
$query[$i] = '+' . mysql_real_escape_string($query[$i]) . '*';
}
$query = implode(' ', $query);
$sql = "SELECT diagnosis, icd9, MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) AS relevance
FROM icd10 WHERE MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) HAVING relevance > 0 ORDER BY relevance ";
$r = mysql_query($sql);
while ($row = mysql_fetch_array($r)) {
$data[] = $row;
}
}
echo json_encode($data);
exit;

You can try some stuff:
First, make sure you have a fulltext index for diagnosis. Second, make sure you have a fulltext index for diagnosis! A million rows isn't that much (depending on the number of words in diagnosis of course), so that just already might be the problem.
Then try the following code:
SELECT diagnosis, icd9, MATCH(diagnosis) AGAINST('$query' IN BOOLEAN MODE) AS relevance
FROM icd10 ORDER BY relevance desc limit 30
(It might not be obvious that this is faster, and it might not be, so just try it).
If you need to support short words, e.g. if 3 digit icd9-codes are entered often, you should check your ft_min_word_len / innodb_ft_min_token_size-values (depending on your database) to make sure they are included in the index - but be aware it will increase your index size. Maybe check the stopwords.
You didn't specify your setup; you can often improve general database performance by e.g. changing settings, hdds or ram. Especially ram.
Some general ideas: You might want to call the function asynchronously (the user should be able to type while the query runs). As soon as you hit less than 30 results (or whatever limit you set), you can just filter the remaining results on the fly in php (as long as the query gets longer/no words are removed) - it's the closest you get to a cache. Or set the limit to 1000 and filter manually afterwards, php regex is fast too, you just need a score-function.
Depending on your data, you might want to not run the query when you just add a single letter to the query (every text will contain a word beginning with an "a", so you might not get a better result - that might not be the case for "q" though). That won't reduce runtime of the query, but you can just save one execution.

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}

To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.

The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.

$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.

Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

Optimization of search function MySQL or PHP wise

After running a few test I realized that my search method does not perform very well if some words of the query is short (2~3 letters).
The way I made the search is by making a MySQL query for every words in the string the visitor entered and then filtering result from each word to see if every words had that result. Once one result has been returned for all words its a match and il show that result to the visitor.
But I was wondering if that's an effective way to do it. Is there any better way while keeping the same functionality ?
Currently the code I have takes about .7Sec making MySQL queries. And the rest of the stuff is under .1Sec.
Normally I would not care much about my search taking .7Sec, But Id like to create a "LiveSearch" and is critical that it loads faster than that.
Here is my code
public static function Search($Query){
$Querys = explode(' ',$Query);
foreach($Querys as $Query)
{
$MatchingRow = \Database\dbCon::$dbCon -> prepare("
SELECT
`Id`
FROM
`product_products` as pp
WHERE
CONCAT(
`Id`,
' ',
(SELECT `Name` FROM `product_brands` WHERE `Id` = pp.BrandId),
' ',
`ModelNumber`,
' ',
`Title`,
IF(`isFreeShipping` = 1 OR `isFreeShippingOnOrder` = 1, ' FreeShipping', '')
)
LIKE :Title;
");
$MatchingRow -> bindValue(':Title','%'.$Query.'%');
try{
$MatchingRow -> execute();
foreach($MatchingRow -> fetchAll(\PDO::FETCH_ASSOC) as $QueryInfo)
$Matchings[$Query][$QueryInfo['Id']] = $QueryInfo['Id'];
}catch(\PDOException $e){
echo 'Error MySQL: '.$e->getMessage();
}
}
$TmpMatch = $Matchings;
$Matches = array_shift(array_values($TmpMatch));
foreach($TmpMatch as $Query)
{
$Matches = array_intersect($Matches,$Query);
}
foreach($Matches as $Match){
$Products[] = new Product($Match);
}
return $Products;
}

As others have already suggested, fulltext search is your friend.
The logic should go more or less like this.
Add a text column to your "product_products" table called, say, "FtSearch"
Write a small script that will run only once, in which you write, for each existing product, the text that has to be searched for into the "FtSearch" column. This is, of course, the text you compose in your query (id + brand name + title and so forth, including the FreeShipping part). You can probably do this with a single SQL statement, but I have no mysql at hand and can't provide you the code for that... you might be able to find it out by yourself.
Create a fulltext index on the "FtSearch" column (doing this after having populated the FtSearch column saves you a little execution time).
Now you have to add the code necessary to ensure that every time any of the fields involved in the search string is inserted/updated, you insert/update the search string as well. Pay attention here, since this includes not only the "Title", "ModelNumber" and "FreeShipping" of the "product_product", but as well the "Name" of the "product_brand". This means that if the name of a product_brand is updated, you will have to regenerate all search strings of all products having that brand. This might be a little slow, depending on how many products are bound to that brand. However I assume it does not happen too often that a brand changes its name, and if it does, it certainly happens in some sort of administration interface where such things are usually acceptable.
At this point you can query the table using the MATCH construct, which is way way faster you could ever get by using your current approach.

Limit results of all MySQL queries

I wrote a PHP/MySQLi frontend, in which the user can enter SQL queries, and the server then returns the results in a table (or prints OK on INSERTs and UPDATEs)
As printing the results can take a very long time (e.g. SELECT * FROM movies) in a IMDb extract with about 1.6M movies, 1.9M actors and 3.2M keywords, I limited the output to 50 rows by cancelling the printing for-loop after 50 iterations.
However, the queries themselves also take quite some time, so I hoped that it might be possible to set a global maximum row return value, nevertheless whether the LIMIT keyword is used or not. I only intended to use the server for my own practice, but as some people in my class are struggling with the frontend provided by the teacher (Windows EXE, but half of the class uses Mac/Linux), I decided to make it accessible to them, too. But I want to keep my Debian VM from crashing because of - well, basically it would be a DDoS.
For clarification (examples with a global limit of 50):
SELECT * FROM movies;
> First 50 rows
SELECT * FROM movies LIMIT 10;
> First 10 rows
SELECT * FROM movies LIMIT 50,100;
> 50 rows (from 50 to 99)
Is there any possibility to limit the number of returned values using either PHP/MySQLi or the MySQL server itself? Or would I have to append/replace LIMIT to/in the queries?

You can use there queries and add "LIMIT 50" to it.
And if they added LIMIT by them self just filter it out with regex and still add your LIMIT.

I believe you have to build yourself a paginator anyway, avoiding to use LIMIT statement is not really possible i believe.
Here is what I would suggest for a Paginator:
if($_REQUEST['page'] == ""){
$page = 1;
}else{
$page = $_REQUEST['page']; // perhaps double check if numeric
}
$perpage = 50;
$start = ($page - 1) * $perpage;
$limit_string = " LIMIT ". $start . "," . $perpage ;
$query = "SELECT * FROM movies";
$query .= $limit_string;
Hope that helps

You can create a function.
https://dev.mysql.com/doc/refman/5.0/en/create-function.html
Let us know if this helps.

Compare one by one characters from a mysql db with php

I'm trying to compare in my DB a row with another character by character and give as a result the id which best fits the given data. For example I have on my DB the user David with a AAA sequence and I want to compare it with one I give in which is a ABA so I'd like to receive a percentage (66.6% in this case) of match,
I have done until here but don't know how to go on:
$uname = $_POST['sequence'];
$query = "SELECT name FROM dna WHERE sequence = '$uname'";
$result = mysql_query($query);
while($row = mysql_fetch_array($result))
{
echo $row['name'];
}

In order to get the similarity in percent, you might use the PHP function similar_text().
The two strings are compared and the similarity percentage is returned, if the third parameter is passed to the function.
$string_1 = 'AAA';
$string_2 = 'ABA';
similar_text($string_1, $string_2, $percent);
echo $percent;
// 66.666666666667
The database part is a bit more work. A very basic implementation could look like this.
Keep in mind, that the real problem is, that you compare a string against 1 million rows.
In general: one wouldn't do that, because instead of chars, there a bits. And to compare bits, you would use simply bit-shifts. Anyway...
Here, when working with chars/strings, a rolling row requests or limited query could help, too.
That would mean, that you ask the db for chunks of let's say 500 rows and do the calc work.
It depends on the number of rows and the memory use of the dataset.
// incomming via user input
$string_1 = $_POST['sequence'];
// temporary var to store the highest similarity percentage and it's row_id
$bestValue = array('row_id' => 0, 'similarity' => '0');
// iterate over the "total number of rows" in the database
foreach($rows as $id => $row)
{
// get a new string_2 from db
$string_2 = $row['name'];
// calculate similarity
similar_text($string_1, $string_2, $percent);
// if calculated similarity is higher, then update the "best" value
if($percent > $bestValue['similarity']) {
$bestValue = array('row_id' = $id, 'similiarity' = $percent);
}
}
var_dump($bestValue);
After all db rows are processed, bestValue will containg the highest percentage and it's row id.
You can do all kinds of things here, for instance:
switch from first match update (<) to last match update (<=)
stop iteration on first match
store row_id's, which have the same similarity (multi row match)
if you don't need multi row match, you might drop the array and use two vars for row and percent
proper error handling, escaping, mysqli usage
Be warned: this isn't the most efficient approach, especially not, when working with large datasets. If you need this on a level, which is not hobby or homework, then simply pull a tool, which is optimized for this job, like EMBOSS (http://emboss.sourceforge.net/).

Extremely slow search page load (MySQL and PHP)

I made a simple search box on a page, where a user can type in keywords to look for photos of certain items, using PHP. I'm using an MySQL database. I trim the result and show only 10 to make the loading quicker, but certain set of keywords causes the browser to hang on both IE and Firefox. When this happens on IE, I can see outlines of photos (just the silhouette) beyond the 10 results with an "X" mark at the top right corner, similar to when you load a photo and the photo doesn't exist on a webpage, even though I wrote the code to show only 10 results. The database has over 10,000 entries, and I'm thinking maybe it's trying to display the entire set of photos in the database. Here are some code that I'm using.
I'm using the function below to create the query. $keyword is an array of the keywords that the user has typed in.
function create_multiword_query($keywords) {
// Creates multi-word text search query
$q = 'SELECT * FROM catalog WHERE ';
$num = 0;
foreach($keywords as $val) { // Multi-word search
$num++;
if ($num == 1) {
$q = $q . "name LIKE '%$val%'"; }
else {
$q = $q . " AND name LIKE '%$val%'";}
}
$q = $q . ' ORDER BY name';
return $q;
//$q = "SELECT * FROM catalog WHERE name LIKE \"%$trimmed%\" ORDER BY name";
}
And display the result. MAX_DISPLAY_NUM is 10.
$num = 0;
while (($row = mysqli_fetch_assoc($r)) && ($num < MAX_DISPLAY_NUM)) { // add max search result!
$num++;
print_images($row['img_url'], '/_', '.jpg'); // just prints photos
}
I'm very much a novice with PHP, but I can't seem to find anything wrong with my code. Maybe the way I wrote these algorithms are not quite right for PHP or MySQL? Can you guys help me out with this? I can post more code as necessary. TIA!!

Don't limit your search results in PHP, limit them in the SQL query with the LIMIT keyword.
As in:
select * form yourtable where ... order by ... limit 10;
BTW, those LIKE '%something%' can be expensive. Maybe you should look at Full text indexing and searching.
If you want to show a More... link or something like that, one way of doing it would be to limit your query to 11 and only show the first ten.

Apart from the LIMIT in your query, I would check out mysql full text search (if your tables have the MyISAM format).

Why don't use use MySQL to limit the number of search results returned?
http://dev.mysql.com/doc/refman/5.0/en/select.html

add LIMIT to your query.
you are retrieving all rows from DB (lot of bytes traveling from DB to server) and then you are filtering the first 10 rows.
try
$q = $q . ' ORDER BY name LIMIT 10';

LIKE is slow also according to Flickr(slides 24-26). You should first try to use FULL TEXT indexes instead. If your site still seems slow there are also some other really fast(er)/popular alternatives available:
sphinx
elasticsearch
solr
The only thing that is a little bit annoying that you need to learn/install these technologies, but are well worth the investment when needed.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.