Optimize MySQL FULL TEXT search - php

I have a search with only one field that allows me to search in several columns of a MySQL table.
This is the SQL query:
SELECT td__user.*
FROM td__user LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE ( td__user.id LIKE "contact#mywebsite.com" OR (MATCH (email, firstname, lastname) AGAINST (+"contact#mywebsite.com" IN BOOLEAN MODE)) )
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0
The exact SQL query with PHP pre-processing of the search field to separate each word:
if($_POST['search'] == '') {
$searchId = '%';
} else {
$searchId = $_POST['search'];
}
$searchMatch = '';
foreach($arrayWords as $word) {
$searchMatch .= '+"'.$word.'" ';
}
$sqlSearch = $dataBase->prepare('SELECT td__user.*,
td__user_oauth.facebook_id, td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE (
td__user.id LIKE :id OR
(MATCH (email, firstname, lastname) AGAINST (:match IN BOOLEAN MODE)) )
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0);
$sqlSearch->execute(['id' => $searchId,
'match' => $searchMatch]);
$searchResult = $sqlSearch->fetchAll();
$sqlSearch->closeCursor();
And these are my index:
The SQL query works well, when I put an ID in the search field or a first name or a last name, or an email even not complete I have results. I can also put a first name and a last name in my search field and I will only result in people with this name.
On the other hand, in a table containing 500,000 contacts, the query takes more than 5 seconds. Are there any possible points for improvement in the query or in the indexes to be done in order to have a faster query?

Did you try to use "UNION" 2 sets of results instead of using the "OR" operator in the "WHERE" clause? Because I'm just afraid of the index can not be used with the "OR" operator.
The query will be something like this:
SELECT td__user.*,
td__user_oauth.facebook_id,
td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE td__user.id LIKE :id
UNION
SELECT td__user.*,
td__user_oauth.facebook_id,
td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE MATCH (email, firstname, lastname) AGAINST (:match IN BOOLEAN MODE))
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0
Hope this can help!

FULLTEXT is fast; LIKE with a leading % is slow; OR is slow.
Consider this approach:
Run the MATCH part of the query.
If no results, then run the LIKE query but only allow trailing wildcards.

Related

Perform query on existing SQL result? Find result from subset of SQL result

I have a script that goes through all order history. It takes several minutes to print the results, but I noticed I perform several SQL statements that are similar enough I wonder if you could do another query on an existing SQL result.
For example:
-- first SQL request
SELECT * FROM orders
WHERE status = 'shipped'
Then, in a foreach loop, I want to find information from this result. My naive approach is to perform these three queries. Note the similarity to the query above.
-- grabs customer's LTD sales
SELECT SUM(total) FROM orders
WHERE user = :user
AND status = 'shipped'
-- grabs number of orders customer has made
SELECT COUNT(*) FROM orders
WHERE user = :user
AND status = 'shipped'
AND total != 0
-- grabs number of giveaways user has won
SELECT COUNT(*) FROM orders
WHERE user = :user
AND status = 'shipped'
AND total = 0
I end up querying the same table several times when the results I seek are subsets of the first query. I'd like to get information from the first query without performing more SQL calls. Some pseudocode:
$stmt1 = $db->prepare("
SELECT * FROM orders
WHERE status = 'shipped'
");
$stmt1->execute();
foreach($stmt1 as $var) {
$username = $var['username'];
$stmt2 = $stmt1->workOn("
SELECT SUM(total) FROM this
WHERE user = :user
");
$stmt2->execute(array(
':user' => $username
));
$lifesales = $stmt2->fetchColumn();
$stmt3 = $stmt1->workOn("
SELECT COUNT(*) FROM this
WHERE user = :user
AND total != 0
");
$stmt3->execute(array(
':user' => $username
));
$totalorders = $stmt3->fetchColumn();
$stmt4 = $stmt1->workOn("
SELECT COUNT(*) FROM this
WHERE user = :user
AND total = 0
");
$stmt4->execute(array(
':user' => $username
));
$totalgaws = $stmt4->fetchColumn();
echo "Username: ".$username;
echo "<br/>Lifetime Sales: ".$lifesales;
echo "<br/>Total Orders: ".$totalorders;
echo "<br/>Total Giveaways: ".$totalgaws;
echo "<br/><br/>";
}
Is something like this possible? Is it faster? My existing method is slow and ugly, I'd like a quicker way to do this.
We could do one pass through the table to get all three aggregates for all users:
SELECT s.user
, SUM(s.total) AS `ltd_sales`
, SUM(s.total <> 0) AS `cnt_prior_sales`
, SUM(s.total = 0) AS `cnt_giveaways`
FROM orders s
WHERE s.status = 'shipped'
GROUP
BY s.user
That's going to be expensive on large sets. But if we are needing that for all orders, for all users, that's likely going to be faster than doing separate correlated subqueries.
An index with leading column of user is going to allow MySQL to use the index for the GROUP BY operation. Including the status and total columns in the index will allow the query to be satisfied entirely from the index. (With the equality predicate on status column, we could also try an index with status as the leading column, followed by user column, then followed by total.
If we only need this result for a small subset of users e.g. we are fetching only the first 10 rows from the first query, then running a separate query is likely going to be faster. We'd just incorporate the condition WHERE s.user = :user into the query, as in the original code. But run just the one query rather than three separate queries.
We can combine that with the first query by making it into an inline view, wrapping it in parens and putting into the FROM clause as a row source
SELECT o.*
, t.ltd_sales
, t.cnt_prior_sale
, t.cnt_giveaways
FROM orders o
JOIN (
SELECT s.user
, SUM(s.total) AS `ltd_sales`
, SUM(s.total <> 0) AS `cnt_prior_sales`
, SUM(s.total = 0) AS `cnt_giveaways`
FROM orders s
WHERE s.status = 'shipped'
GROUP
BY s.user
) t
ON t.user = o.user
WHERE o.status = 'shipped'
I'm not sure about that column named "prior" sales... this is returning all shipped orders, without regard to comparing any dates (order date, fulfillment date, shipment date), which we would typically associate with a concept of what "prior" means.
FOLLOWUP
noticing that the question is modified, removing the condition "status = 'shipped'" from the count of all orders by the user...
I will note that we can move conditions from the WHERE clause into the conditional aggregates.
Not that all these results are needed by OP, but as a demonstration...
SELECT s.user
, SUM(IF(s.status='shipped',s.total,0)) AS `ltd_sales_shipped`
, SUM(IF(s.status<>'shipped',s.total,0)) AS `ltd_sales_not_shipped`
, SUM(s.status='shipped' AND s.total <> 0) AS `cnt_shipped_orders`
, SUM(s.status='canceled') AS `cnt_canceled`
, SUM(s.status='shipped' AND s.total = 0) AS `cnt_shipped_giveaways`
FROM orders s
GROUP
BY s.user
Once the results are returned from the database, you can not run an SQL on top of them. However you can store them in a temporary table, to reuse them.
https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html
https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
https://dev.mysql.com/doc/refman/8.0/en/insert-select.html
You need to create a temporary table, and insert all the data from the select statement, and then you can run queries on that table. Not sure if it would help much in your case.
For your particular case you can do something like:
select user, (total = 0) as is_total_zero, count(*), sum(total)
from orders
where status = 'shipped'
group by user, total = 0
However you would have to do some additional summing to get the results of the second query which gives you the sums per user, as they would be divided into two different groups with a different is_total_zero value.

Check if specific value exists in mysql column

I have mysql column called categories. It can contain single or multiple values like this: 1 or 2 or 1,2,3 or 2,12...
I try to get all rows containing value 2.
$query = "SELECT * FROM my_table WHERE categories LIKE '2'";
$rows = mysql_query($query);
This returns row if column only has value 2 but not 1,2,3 or 2,12. How I can get all rows including value 2?
You can use either of the following:
% is a wildcard so it will match 2 or 1,2, etc. Anything on either side of a 2. The problem is it could match 21, 22, etc.
$query = "SELECT * FROM my_table WHERE categories LIKE '%2%'";
Instead you should consider the find_in_set mysql function which expects a comma separated list for the value.
$query = "SELECT * FROM my_table WHERE find_in_set('2', `categories`)";
Like #jitendrapurohut said, you can do it using
$query = "SELECT * FROM my_table WHERE categories LIKE '%2%'";
$rows = mysql_query($query);
But is really bad to store collections like this. A better aproach is as follow:
categories(id_c, name) => A table with each category
my_table(id_m [, ...])
categories_my_table(id_c, id_m)
Then use this query:
SELECT *
FROM my_table m
INNER JOIN categories_my_table cm ON m.id_m = cm.id_m
INNER JOIN categories c ON cm.id_c = c.id_c
WHERE
c.id_c = 2;
EDIT:
#e4c5 link explains why it is bad to store collections like this...
SELECT * FROM my_table WHERE categories LIKE '%2%' AND categories!='1,2,3' AND categories!='2,12';

Retrieving total count of matching rows along with the rows themselves using PDO

I have a PDO snippet that retrieves a bunch of rows from a MySQL table and assigns the field values (2 fields are returned per row) to two arrays like so:
$connect = dbconn(PROJHOST,'dbcontext', PROJDBUSER, PROJDBPWD);
$sql= "SELECT contextleft, contextright FROM tblcontext WHERE contextleft REGEXP :word LIMIT 0, 25";
$xleft = array();
$xright = array();
$countrows = 0;
$word = "[[:<:]]".$term."[[:>:]]";
$query = $connect->prepare($sql);
$query->bindParam(":word", $word, PDO::PARAM_STR);
if($query->execute()) {
$rows = $query->fetchAll(PDO::FETCH_ASSOC);
foreach($rows as $row){
$pattern = '/\b'. $term .'\b/ui';
$replacer = function($matches) { return '<span class="highlight">' . $matches[0] . '</span>'; };
$xleft[$countrows] = preg_replace_callback($pattern, $replacer, $row['contextleft']);
$xright[$countrows] = $row['contextright'];
$countrows++;
}
$notfound = null;
}
$connect = null;
This works perfect. As you can see, I use the LIMIT clause to ensure only a maximum of 25 rows are extracted. However, there can actually be many more matching records in the table and I need to also retrieve the total count of all matching records along with the returned rows. The end goal is pagination, something like: 25 of 100 entries returned...
I understand I have 2 options here, both involving 2 queries instead of a single one:
$sql= "SELECT COUNT(*) FROM tblcontext WHERE contextleft REGEXP :word;
SELECT contextleft, contextright FROM tblcontext WHERE contextleft REGEXP :word LIMIT 0, 25";
or...
$sql= "SELECT SQL_CALC_FOUND_ROWS contextleft, contextright FROM tblcontext WHERE contextleft REGEXP :word LIMIT 0, 25;
SELECT FOUND_ROWS();";
But I am still confused around retrieving the returned count value in PHP. Like I could access the fields by running the fetchAll() method on the query and referencing the field name. How can I access the count value returned by either FOUND_ROWS() or COUNT()? Also, is there any way to use a single SQL statement to get both count as well as the rows?
If you have a separate query with the count, then you can retrieve its value exactly the same way as you read the values from any queries, there is no difference.
If you use SQL_CALC_FOUND_ROWS, then you need to have a separate query to call FOUND_ROWS() function:
SELECT FOUND_ROWS();
Since this is again a normal query, you can read its output the same way as you do now.
You can technically retrieve the record count within the query itself by adding a count(*) field to the query:
SELECT contextleft, contextright, (select count(*) from tblcontext) as totalrecords FROM tblcontext WHERE contextleft REGEXP :word LIMIT 0, 25
or
SELECT contextleft, contextright, totalrecords
FROM tblcontext WHERE contextleft, (select count(*) as totalrecords from tblcontext) t REGEXP :word LIMIT 0, 25
Limit affects the number of records returned, but does not affect the number of rows counted by the count() function. The only drawback is that the count value will be there in every row, but in case of 25 rows, that may be an acceptable burden.
You need to test which method works the best for you.

Complex Query with repeated parameter not working in PHP PDO

I have a somewhat complex query (using a subquery) for an election database where I'm trying to get the number of votes per particular candidate for a position.
The headers for the votes table are: id (PRIMARY KEY), timestamp, pos1, pos2, ..., pos6 where pos1-pos6 are the position names. A cast ballot becomes a new row in this table, with the member id number of the selected candidate (candidates are linked to profiles in a "membership" table in the database) stored as the value for each position. So for instance, one row in the database might look like the following (except with the actual 6 position names):
id timestamp pos1 pos2 pos3 (and so on)
=================================================
6 1386009129 345 162 207
I want to get the results for each position using PHP PDO, listing for each position the candidate's name and the number of votes they have received for this position. So the raw database results should appear as (for "pos1", as an example):
name votecount
======================
Joe Smith 27
Jane Doe 45
I have a raw SQL query which I can successfully use to get these results, the query is (making pos1 the actual column/position name President):
SELECT (SELECT fullname FROM membership memb WHERE member_id=`President`) name, count(`President`) votecount FROM `election_votes` votes GROUP BY `President`
So as you can see, the position name (President, here) is repeated 3 times in the query. This seems to cause a problem in the PHP PDO code. My code is as follows:
$position = "President"; // set in earlier code as part of a loop
$query = "SELECT (SELECT fullname FROM membership memb WHERE member_id=:pos1) name, count(:pos2) votecount FROM `election_votes` votes GROUP BY :pos3";
$query2 = "SELECT (SELECT fullname FROM membership memb WHERE member_id=$position) name, count($position) votecount FROM `election_votes` votes GROUP BY $position"; // NOT SAFE!
$STH = $DBH->prepare($query);
$STH->bindParam(':pos1', $position);
$STH->bindParam(':pos2', $position);
$STH->bindParam(':pos3', $position);
$STH->execute();
while($row = $STH->fetch(PDO::FETCH_ASSOC)) {
print_r($row);
// I'd like to do other things with these results
}
When I run this, using the query $query, I don't get results per-person as desired. My output is:
Array
(
[name] =>
[votecount] => 47
)
where 47 is the total number of ballots cast instead of an array for each candidate contianing their name and number of votes (out of the total). However if I use the obviously insecure $query2, which just inserts the value of $position into the query string three times, I get the results I want.
Is there something wrong with my PHP code above? Am I doing something that's impossible in PDO (I hope not!)? My underlying database is MySQL 5.5.32. I've even tried replacing the three named parameters with the ? unnamed ones and passing an array array($position, $position, $position) into the $STH->execute() method with no greater success.
Your query isn't complex. I think part of the confusion is that you aren't constructing the basic SQL properly. You are attempting to treat "President" as both a value and column. The final SQL should look something like this:
SELECT
`fullname` AS `name`,
COUNT(`id`) AS `votecount`
FROM
`election_votes` AS `votes`
LEFT JOIN
`membership` AS `memb` ON `member_id` = `president`
GROUP BY
`pos1`
You join the election_votes table to the membership table where the value in column pos1 equals the value in column member_id.
NOTE: you cannot use parameters with table and column names in PDO (Can PHP PDO Statements accept the table or column name as parameter?), so you have to escape those manually.
/**
* Map positions/columns manually. Never use user-submitted data for table/column names
*/
$positions = array('President' => 'pos1', 'Vice-president' => 'pos2');
$column = $positions['President'];
You should be able to re-write your query in PDO as:
/**
* Get election results for all candidates
*/
$sql = "SELECT
`fullname` AS `name`,
COUNT(`id`) AS `votecount`
FROM
`election_votes` AS `votes`
LEFT JOIN
`membership` AS `memb` ON `member_id` = `".$column."`
GROUP BY
`".$column."`";
As you can see there's nothing to bind in PDO with this particular query. You use BindParam to filter values that appear in the WHERE clause (again: not table/column names). For example:
/**
* Get election results for person named 'Homer Simpson'
*/
$sql = "SELECT
`fullname` AS `name`,
COUNT(`id`) AS `votecount`
FROM
`election_votes` AS `votes`
LEFT JOIN
`membership` AS `memb` ON `member_id` = `".$column."`
WHERE
`fullname` = :fullname:
GROUP BY
`".$column."`";
$STH = $DBH->prepare($sql);
$STH->bindParam(':fullname', 'Homer Simpson');

Speeding up MySQL query searching multiple tables using MATCH AGAINST

This is my first time trying to build more complex search functionality than just using the LIKE function. The results returned are pretty much perfect from this search but it's running really slow. Is there anything I can improve code wise to speed things up or anything I should look at on the database? or would I need to be looking at more server power?
Thanks a lot of any and all help. It's much appreciated!
function new_live_search($q){
$title_score = 5;
$tags_score = 10;
$upvote_score = 1;
$subdomain = $this->config->item('subdomain_name');
$query = "SELECT DISTINCT results.*,
(
".$title_score."*(MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) +
".$tags_score."*(MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)) +
".$upvote_score."*usefulness
) AS score
FROM results
LEFT JOIN tags ON results.id = tags.result_id
WHERE (scope = 'all' OR scope = '$subdomain')
AND (published = 1)";
$query .= "
HAVING score - usefulness > 0
ORDER BY score DESC, title";
$query = $this->db->query($query);
$results = array();
foreach ($query->result() as $row){
$results[] = $row;
}
return $results;
}
From MySQL documentation
Unfortunately it is not possible to combine Fulltext field and normal (i.e integer) field into one index. Since only one index per query can be used, that seems be a problem
Table layout:
id(integer primary key)|content(text fulltext indexed)|status(integer key)
Note that executing following query, MySQL will use only one index. Either fulltext, or status (Depending on intern statistics).
Query 1:
SELECT * FROM table WHERE MATCH(content) AGAINST('searchQuery') AND status = 1
However it is still possible to use both indexes in one query. You will need a new index on id,status pair and use join. Thus MySQL will be able to use one index for each table.
Query 2:
SELECT t1.* FROM table t1
LEFT JOIN table t2 ON(t1.id=t2.id)
WHERE MATCH(t1.content) AGAINST('searchQuery') AND status=1
Query 2 will run significantly faster than Query 1, at least in my case :)
Note the overhead: You will need an id for each row and a key which is spanned over needed fields starting with id.
Refer Fulltext search on MySQL Documentation
Hope it help you
If you look at your query, your fulltext part of the query does not actually limit the search. Using something like the following should increase performance a bit.
SELECT DISTINCT results.*, (
$title_score * (MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) +
$tags_score * (MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)) +
$upvote_score * usefulness
) AS score
FROM results
LEFT JOIN tags ON results.id = tags.result_id
WHERE (scope = 'all' OR scope = '$subdomain')
AND (published = 1)
AND (
(MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) OR
(MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)))
HAVING score - usefulness > 0
ORDER BY score DESC, title

Categories