Optimization of search function MySQL or PHP wise

Optimization of search function MySQL or PHP wise - php

After running a few test I realized that my search method does not perform very well if some words of the query is short (2~3 letters).
The way I made the search is by making a MySQL query for every words in the string the visitor entered and then filtering result from each word to see if every words had that result. Once one result has been returned for all words its a match and il show that result to the visitor.
But I was wondering if that's an effective way to do it. Is there any better way while keeping the same functionality ?
Currently the code I have takes about .7Sec making MySQL queries. And the rest of the stuff is under .1Sec.
Normally I would not care much about my search taking .7Sec, But Id like to create a "LiveSearch" and is critical that it loads faster than that.
Here is my code
public static function Search($Query){
$Querys = explode(' ',$Query);
foreach($Querys as $Query)
{
$MatchingRow = \Database\dbCon::$dbCon -> prepare("
SELECT
`Id`
FROM
`product_products` as pp
WHERE
CONCAT(
`Id`,
' ',
(SELECT `Name` FROM `product_brands` WHERE `Id` = pp.BrandId),
' ',
`ModelNumber`,
' ',
`Title`,
IF(`isFreeShipping` = 1 OR `isFreeShippingOnOrder` = 1, ' FreeShipping', '')
)
LIKE :Title;
");
$MatchingRow -> bindValue(':Title','%'.$Query.'%');
try{
$MatchingRow -> execute();
foreach($MatchingRow -> fetchAll(\PDO::FETCH_ASSOC) as $QueryInfo)
$Matchings[$Query][$QueryInfo['Id']] = $QueryInfo['Id'];
}catch(\PDOException $e){
echo 'Error MySQL: '.$e->getMessage();
}
}
$TmpMatch = $Matchings;
$Matches = array_shift(array_values($TmpMatch));
foreach($TmpMatch as $Query)
{
$Matches = array_intersect($Matches,$Query);
}
foreach($Matches as $Match){
$Products[] = new Product($Match);
}
return $Products;
}

As others have already suggested, fulltext search is your friend.
The logic should go more or less like this.
Add a text column to your "product_products" table called, say, "FtSearch"
Write a small script that will run only once, in which you write, for each existing product, the text that has to be searched for into the "FtSearch" column. This is, of course, the text you compose in your query (id + brand name + title and so forth, including the FreeShipping part). You can probably do this with a single SQL statement, but I have no mysql at hand and can't provide you the code for that... you might be able to find it out by yourself.
Create a fulltext index on the "FtSearch" column (doing this after having populated the FtSearch column saves you a little execution time).
Now you have to add the code necessary to ensure that every time any of the fields involved in the search string is inserted/updated, you insert/update the search string as well. Pay attention here, since this includes not only the "Title", "ModelNumber" and "FreeShipping" of the "product_product", but as well the "Name" of the "product_brand". This means that if the name of a product_brand is updated, you will have to regenerate all search strings of all products having that brand. This might be a little slow, depending on how many products are bound to that brand. However I assume it does not happen too often that a brand changes its name, and if it does, it certainly happens in some sort of administration interface where such things are usually acceptable.
At this point you can query the table using the MATCH construct, which is way way faster you could ever get by using your current approach.

Related

function render makes website 500% slow! can anyone fix that please?

Function render makes website 500% slow! Can anyone fix that please ?
Someone told me :
because it sends a database request on each iteration of the loop (it's not the only problem with this chunk of code but it's the most taxing one)
Yes I understand what that means. His way is:
you need to get all of the data before you start building the menu,
then you just insert the data instead of requesting more data on each
iteration
But i don't know how i must do it!
<?php
$menu_html='';
function render_menu($parent_id,$actmenuid)
{
$obj = new Database();
$con = $obj->dbconnectt();
global $menu_html;
$result=mysqli_query($con, "select * from tbl_menu where parent_id='$parent_id'");
if(mysqli_num_rows($result)==0) return;
if($parent_id==0){
$menu_html.='<ul class="topnav">';
}else{
$menu_html.='<ul>';
}
while($row=mysqli_fetch_array($result)) {
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
if($childnum == 0){
$linkvalue='/category/'.$row['id'].'.html';
} else{
$linkvalue='#';
}
if($row['id']==$actmenuid && $actmenuid !=NULL){
$actv='class="active"';
}else{
$actv='';
}
$menu_html.='<li '.$actv.'>'.$row['title'].'';
render_menu($row['id'],$actmenuid);
$menu_html.='</li>';
}
$menu_html.='</ul>';return $menu_html;
}
if($isDsh==false){
echo render_menu(0,$actmenuid);
}
?>

Depending on how many records you have, try removing this query from inside the loop since it's running for every record on the first query.
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
Change it a single query like this where it returns counts for each parent idea, and place it outside of the loop:
$parentcount = mysqli_query($con, ("SELECT parent_id, count(*) FROM tbl_menu GROUP BY parent_id");
There may be other issues, so please post the database structure and number of records that you're working with too.

Don't make recursive queries.
Having "more than 1000" rows is not too big. You can simply call everything from the table into php, then perform the recursive html build in php this will have a memory overhead, but far less processing overhead because you only ever make one trip to the db.
Alternatively (when your db table is prohibitively large), you should avoid gathering rows unnecessarily by adding a new column. The new column will store all "descendants" for the respective row when the row is INSERTed or update it when it is UPDATEd. Then you only need to reference this column when needing to call specific rows. In other words, do the recursive processing only once (when writing to the db) AND not when needing to display the data. This will, again, produce a finite result set in one query which can then be recursively traversed to build the desired output.

basically you need to do what #spudly has suggested.
But there is a small catch in his solution which depending on the number of the rows in yous tbl_menu table you may use a big chunk of memory to fetch all the records.
you can optimise it more with using his solution but changing the query to:
select
parent_tbl_menu.id,
count(child_tbl_menu.id) as cnt
from
tbl_menu as parent_tbl_menu
left join
tbl_menu as child_tbl_menu
on parent_tbl_menu.id = child_tbl_menu.parent_id
where
parent_tbl_menu.parent_id = ?
group by
parent_tbl_menu.id
This way you will only fetch the child records of a specific parent.
And please consider using prepared statements as your code has sql injection vulnerability.

Connect (from PHP to MySQL) only once for the entire web page.
Don't put a SELECT inside a loop if you can do all the work in a single SELECT, such as with a JOIN. (Exception: A "hierarchical" table needs the nested SELECT. Exception to the exception: MySQL 8.0 and MariaDB 10.2 can do it with a "recursive CTE".)
Don't fetch all the columns (SELECT *) when all you want it is a recordcount. Instead, SELECT COUNT(*) ... and use the number returned.
1000 of anything is probably excessive for a web page. Re-think the UI.

Is it worth to save keyword <-> link relation into "hastable" like structure in mysql?

im working on PHP + MySQL application, which will crawl HDD/shared drive and index all files and directories into database, to provide "fulltext" search on it. So far im doing well, but im stuck on question, if i chosed good way how to store data into database.
On picture below, you can see part schema of my database. Thought is, that i'm saving domain (which represents part of disk which i wana to index) then there are some link(s) (which represents files and folder (with content, filepath, etc) then i have table to store sole (uniq) keywords, which i find in file/folder name or content.
And finaly, i have 16 tables linkkeyword to store relations between links and keywords. I have 16 of them because i thought it might be good to make something like hashtable, because im expecting high number of relations between link <-> keyword. (so far for 15k links and 400k keywords i have about 2.5milion of linkkeyword records). So to avoid storing so much data into one table (and later search above them) i thought that this hastable can be faster. It works like i wana to search for word, i compute it md5 and look at first character of md5 and then i know to which linkkeyword table i should use. So there is only about 150~200k records in each linkkeyword table (against 2.5milions)
So there im curious, if this approach can be of any use, or if will be better to store all linkkeyword information to single table and mysql will take care of it (and to how much link<->keyword it can work?)
So far this was great solution to me, but i crushed hard when i tried to implement regular-expression search. So user can use e.g. "tem*" which can result in temp, temporary, temple etc... In normal way when searching for word, i will conpute in md5 hash and then i know to which linkkeyword table i need to look. But for regular expression i need to get all keywords from keywords table (which matches regular expression) and then process them one by one.
Im also attaching part of code for normal keyword search
private function searchKeywords($selectedDomains) {
$searchValues = $this->searchValue;
$this->resultData = array();
foreach (explode(" ", $searchValues) as $keywordName) {
$keywordName = strtolower($keywordName);
$keywordMd5 = md5($keywordName);
$selection = $this->database->table('link');
$results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName)
->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword');
foreach ($results as $result) {
$keyExists = array_key_exists($result->linkId, $this->resultData);
if ($keyExists) {
$this->resultData[$result->linkId]->updateWeight($result->weight);
$this->resultData[$result->linkId]->addKeyword($result->keyword);
} else {
$domain = $result->ref('domain');
$linkClass = new search\linkClass($result, $domain);
$linkClass->updateWeight($result->weight);
$linkClass->addKeyword($result->keyword);
$this->resultData[$result->linkId] = $linkClass;
}
}
}
}
and regular expression search function
private function searchRegexp($selectedDomains) {
//get stored search value
$searchValues = $this->searchValue;
//replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent
$searchValues = str_replace("*", "%", $searchValues);
$searchValues = str_replace("!", "_", $searchValues);
// empty result array to prevent previous results to interfere
$this->resultData = array();
//searched phrase can be multiple keywords, so split it by space and get results for each keyword
foreach (explode(" ", $searchValues) as $keywordName) {
//set default link result weight to -1 (default value)
$weight = -1;
//select all keywords, which match searched keyword (or its regular expression)
$keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName);
foreach ($keywords as $keyword) {
//count keyword md5 sum to determine which table should be use to match it links
$md5 = md5($keyword->keyword);
//get all link ids from linkkeyword relation table
$keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1');
//loop found links
foreach ($keywordJoinLink as $link) {
//store link weight, for later result sort
$weight = $link->weight;
//get link ID
$linkId = $link->linkId;
//check if link already exists in results, to prevent duplicity
$keyExists = array_key_exists($linkId, $this->resultData);
//if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification
if ($keyExists) {
$this->resultData[$linkId]->updateWeight($weight);
$this->resultData[$linkId]->addKeyword($keyword->keyword);
//if link isnt in result yet, insert it
} else {
//get link reference
$linkData = $link->ref('link', 'linkId');
//get information about domain, to which link belongs (location, flagPath,...)
$domainData = $linkData->ref('domain', 'domainId');
//if is domain searchable and was selected before search, add link to result set. Otherwise ignore it
if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) {
//create new link instance
$linkClass = new search\linkClass($linkData, $domainData);
//insert matching keyword to links keyword set
$linkClass->addKeyword($keyword->keyword);
//set links weight
$linkClass->updateWeight($weight);
//insert link into result set
$this->resultData[$linkId] = $linkClass;
}
}
}
}
}
}

Your question is mostly one of opinion, so you may want to include the criteria that allow us to answer "worth it' more objectively.
It appears you've re-invented the concept of database sharding (though without distributing your data across multiple servers).
I assume you are trying to optimize search time; if that's the case, I'd suggest that 2.5 million records on a modern hardware is not a particularly big performance challenge, as long as your queries can use an index. If you can't use an index (e.g. because you're doing a regular expression search), sharding will probably not help at all.
My general recommendation with database performance tuning is to start with the simplest possible relational solution, keep tuning that until it breaks your performance goals, then add more hardware, and only once you've done that should you go for "exotic" solutions like sharding.
This doesn't mean using prayer as a strategy. For performance-critical application, I typically build a test database, where I can experiment with solutions. In your case, I'd build a database with your schema without the "sharding" tables, and then populate it with test data (either write your own population routines, or use a tool like DBMonster). Typically, I'd go for at least double the size I expect in production. You can then run and tune queries to prove, one way or another, whether your schema is good enough. It sounds like a lot of work, but it's much less work than your sharding solution is likely to bring along.
There are (as #danFromGermany comments) solutions that are optimized for text serach, and you could use MySQL fulltext search features rather than regular expressions.

PHP/MySQL Search Engine Using Levenshtein Distance

I'm trying to create a simple search engine where users can query a database and be returned results that both match and are close to their query. At first I was just using wildcards (%) to find results that were relevant to a users search. The PHP for that looked something like this:
// Users search terms is saved in $_POST['q']
$q = $_POST['q'];
// Prepare statement
$search = $db->prepare("SELECT `id`, `name` FROM `users` WHERE `name` LIKE ?");
// Execute with wildcards
$search->execute(array("%$q%"));
// Echo results
foreach($search as $s) {
echo $s['name'];
}
The above code works fine, however, it's rather limited. While it can fetch results that are close to but don't exactly match the users query (because of the wildcards), it still doesn't return all relevant results; the user's query still has to have an exact match to something in the database. For example, if I had a database with the name "Tim" as a row, searching for "Timothy" wouldn't work. So my new approach looks something like this:
// Users search terms is saved in $_POST['q']
$q = $_POST['q'];
// Create array for the names that are close to or match the search term
$results = array();
foreach($db->query('SELECT `id`, `name` FROM `users`') as $name) {
// Keep only relevant results
if (levenshtein($q, $name['name']) < 4) {
array_push($results,$name['name']);
}
}
// Echo out results
foreach ($results as $result) {
echo $result."\n";
}
This code technically works, however, it's pretty inefficient and I'm wondering how it can be improved. The biggest problem is that as all results have to be retrieved from the database and then sorted, an unnecessarily large SQL query is created, which is especially problematic as I have a big database. Furthermore I wanted to know if simply using the levenshtein function is sufficient for getting relevant results, or if there is a better way to sort out the irrelevant results. Some other ways of sorting the relevant results I came up with:
if (levenshtein(metaphone($q), metaphone($name['name'])) < 4) {
array_push($results,$name['name']);
}
or
if (similar_text(metaphone($q), metaphone($name)['name']) < 2) {
array_push($results,$name['name']);
}
or
if (similar_text($q, $name['name']) > 2) {
array_push($results,$name['name']);
}
I think using levenshtein with metaphone may actually work the best as it would better take into account simple spelling errors. But I'm not sure which would be the best to use, especially considering that the way I'm doing it now is already very expensive (the large SQL query + the expensive functions that take place in a loop don't bode well for performance).
Thanks in advance

MYSQL PHP check a like statement on more one column at once

I have query I have came up with for a search bar. I know I can use OR to split the statement to check different columns:
$query = mysql_query('SELECT * FROM PRODUCTS WHERE cats LIKE "%'.$s.'%" OR sub_cats LIKE "%'.$s.'%"');
But if I want to check more than 2 or three columns is the re a way doing something like this to speed things up a bit:
$query_string = explode('&&=',$_GET['q']);
$db_str = 'SELECT * FROM products WHERE name, cats, sub_cats, desc, brand ';
for($x = 0; $x < count($query_string); $x++){
$enc = mysql_real_escape_string($query_string[$x]);
if($x == (count($query_string) -1)){
$db_str .= 'LIKE "%'.$enc.'%"';
}else{
$db_str .= 'LIKE "%'.$enc.'%" ';
}
}
$query = mysql_query($db_str);
I normally used POST for search bars but I fancy giving GET a go it looks more user friendly to me.

I don't know if this would speed things up (you would have to test on your data). However, the following is an alternative with only one comparison:
WHERE concat_ws('|', name, cats, sub_cats, desc, brand) LIKE '%$enc%'
Because your like has a wildcard at the beginning, the query does not use regular indexes. So this version should not slow things down much, if at all.
The answer to your performance problem, though, may be a full text index.

Your code will not work like that, even ignoring the deprecated API. You would need to do something like this:
$searchFields = ['name', 'cats', 'sub_cats', 'desc', 'brand'];
$searchTerm = mysql_real_escape_string($query_string[$x]);
$filters = implode(' OR ', array_map(function($field) use ($searchTerm) {
return "$field LIKE '%{$searchTerm}%'";
}, $searchFields));
$query = mysql_query('SELECT * FROM products WHERE '.$filters);
It's far from optimal though, and will quickly become slow as database and userbase grow. It is however exactly how for example phpMyAdmin implements its database-wide searches. If your database is larger than a few records, or is likely to attract more than a handful of searches simultaneously, you should implement a smarter solution like full text search or implement an external search engine like Apache SOLR, Amazon CloudSearch and the like - relational databases like MySQL on their own just aren't good at messy searching nor were they ever meant to be.
As for this remark:
I normally used POST for search bars but I fancy giving GET a go it
looks more user friendly to me.
This isn't really the good argument to choose. As a rule:
GET is for repeatable requests, that the user can F5 on as much as they want.
POST is for non-repeatable requests, that are usually considered harmful if (overly) repeated.
If your name is Google and you have a gazillion servers all over the world, use GET. If you already know your search has bad performance, stick with POST and don't tempt the lions.

How to filter by multiple fields in MySQL/PHP

I'm writing a filter/sorting feature for an application right now that will have text fields above each column. As the user types in each field, requests will be sent to the back-end for sorting. Since there are going to be around 6 text fields, I was wondering if there's a better way to sort instead of using if statements to check for each variable, and writing specific queries if say all fields were entered, just one, or just two fields, etc.
Seems like there would be a lot of if statements. Is there a more intuitive way of accomplishing this?
Thanks!

Any initial data manipulation, such as sorting, is usually done by the database engine.
Put an ORDER BY clause in there, unless you have a specific reason the sorting needs done in the application itself.
Edit: You now say that you want to filter the data instead. I would still do this at the database level. There is no sense in sending a huge dataset to PHP, just for PHP to have to wade through it and filter out data there. In most cases, doing this within MySQL will be far more efficient than what you can build in PHP.

Since there are going to be around 6 text fields, I was wondering if there's a better way to sort instead of using if statements to check for each variable
Definitely NO.
First, nothing wrong in using several if's in order.
Trust me - I myself being a huge fan of reducing repetitions of code, but consider these manually written blocks being the best solution.
Next, although there can be a way to wrap these condition ns some loop, most of time different conditions require different treatment.
however, in your next statements you are wrong:
and writing specific queries
you need only one query
Seems like there would be a lot of if statements.
why? no more than number of fields you have.
here goes a complete example of custom search query building code:
$w = array();
$where = '';
if (!empty($_GET['rooms'])) $w[]="rooms='".mesc($_GET['rooms'])."'";
if (!empty($_GET['space'])) $w[]="space='".mesc($_GET['space'])."'";
if (!empty($_GET['max_price'])) $w[]="price < '".mesc($_GET['max_price'])."'";
if (count($w)) $where="WHERE ".implode(' AND ',$w);
$query="select * from table $where";
the only fields filled by the user going to the query.
the ordering is going to be pretty the same way.
mesc is an abbreviation for the mysql_real_escape_string or any other applicable database-specific string escaping function

select * from Users
order by Creadted desc, Name asc, LastName desc, Status asc
And your records will be sorted by order from query.
First by Created desc, then by Name asc and so on.
But from your question I can see that you are searching for filtering results.
So to filter by multiple fileds just append your where, or if you are using any ORM you can do it through object methods.
But if its simple you can do it this way
$query = "";
foreach($_POST['grid_fields'] as $key => $value)
{
if(strlen($query) > 0)
$query .= ' and '
$query .= sprintf(" %s LIKE '%s' ", mysql_real_escape_string($key), '%' .mysql_real_escape_string($value) .'%');
}
if(strlen($query) > 0)
$original_query .= ' where ' . $query;
this could help you to achieve your result.

No. You cannot avoid the testing operations when sorting the set, as you have to compare the elements in the set in same way. The vehicle for this is an if statement.

Could you take a look at this?
WHERE (ifnull(#filter1, 1) = 1 or columnFilter1 = #filter1)
and (ifnull(#filter2, 1) = 1 or columnFilter2 = #filter2)
and (ifnull(#filter3, 1) = 1 or columnFilter3 = #filter3)
and (ifnull(#filter4, 1) = 1 or columnFilter4 = #filter4)
and (ifnull(#filter5, 1) = 1 or columnFilter5 = #filter5)
and (ifnull(#filter6, 1) = 1 or columnFilter6 = #filter6)
Please let me know if I'm misunderstanding your question.. It's not like an IF statement batch, and is pretty lengthy, but what do you think?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.