Speeding up MySQL query searching multiple tables using MATCH AGAINST

Speeding up MySQL query searching multiple tables using MATCH AGAINST - php

This is my first time trying to build more complex search functionality than just using the LIKE function. The results returned are pretty much perfect from this search but it's running really slow. Is there anything I can improve code wise to speed things up or anything I should look at on the database? or would I need to be looking at more server power?
Thanks a lot of any and all help. It's much appreciated!
function new_live_search($q){
$title_score = 5;
$tags_score = 10;
$upvote_score = 1;
$subdomain = $this->config->item('subdomain_name');
$query = "SELECT DISTINCT results.*,
(
".$title_score."*(MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) +
".$tags_score."*(MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)) +
".$upvote_score."*usefulness
) AS score
FROM results
LEFT JOIN tags ON results.id = tags.result_id
WHERE (scope = 'all' OR scope = '$subdomain')
AND (published = 1)";
$query .= "
HAVING score - usefulness > 0
ORDER BY score DESC, title";
$query = $this->db->query($query);
$results = array();
foreach ($query->result() as $row){
$results[] = $row;
}
return $results;
}

From MySQL documentation
Unfortunately it is not possible to combine Fulltext field and normal (i.e integer) field into one index. Since only one index per query can be used, that seems be a problem
Table layout:
id(integer primary key)|content(text fulltext indexed)|status(integer key)
Note that executing following query, MySQL will use only one index. Either fulltext, or status (Depending on intern statistics).
Query 1:
SELECT * FROM table WHERE MATCH(content) AGAINST('searchQuery') AND status = 1
However it is still possible to use both indexes in one query. You will need a new index on id,status pair and use join. Thus MySQL will be able to use one index for each table.
Query 2:
SELECT t1.* FROM table t1
LEFT JOIN table t2 ON(t1.id=t2.id)
WHERE MATCH(t1.content) AGAINST('searchQuery') AND status=1
Query 2 will run significantly faster than Query 1, at least in my case :)
Note the overhead: You will need an id for each row and a key which is spanned over needed fields starting with id.
Refer Fulltext search on MySQL Documentation
Hope it help you

If you look at your query, your fulltext part of the query does not actually limit the search. Using something like the following should increase performance a bit.
SELECT DISTINCT results.*, (
$title_score * (MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) +
$tags_score * (MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)) +
$upvote_score * usefulness
) AS score
FROM results
LEFT JOIN tags ON results.id = tags.result_id
WHERE (scope = 'all' OR scope = '$subdomain')
AND (published = 1)
AND (
(MATCH(title) AGAINST('$q' IN BOOLEAN MODE)) OR
(MATCH(tags.name) AGAINST('$q' IN BOOLEAN MODE)))
HAVING score - usefulness > 0
ORDER BY score DESC, title

Related

Optimize MySQL FULL TEXT search

I have a search with only one field that allows me to search in several columns of a MySQL table.
This is the SQL query:
SELECT td__user.*
FROM td__user LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE ( td__user.id LIKE "contact#mywebsite.com" OR (MATCH (email, firstname, lastname) AGAINST (+"contact#mywebsite.com" IN BOOLEAN MODE)) )
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0
The exact SQL query with PHP pre-processing of the search field to separate each word:
if($_POST['search'] == '') {
$searchId = '%';
} else {
$searchId = $_POST['search'];
}
$searchMatch = '';
foreach($arrayWords as $word) {
$searchMatch .= '+"'.$word.'" ';
}
$sqlSearch = $dataBase->prepare('SELECT td__user.*,
td__user_oauth.facebook_id, td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE (
td__user.id LIKE :id OR
(MATCH (email, firstname, lastname) AGAINST (:match IN BOOLEAN MODE)) )
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0);
$sqlSearch->execute(['id' => $searchId,
'match' => $searchMatch]);
$searchResult = $sqlSearch->fetchAll();
$sqlSearch->closeCursor();
And these are my index:
The SQL query works well, when I put an ID in the search field or a first name or a last name, or an email even not complete I have results. I can also put a first name and a last name in my search field and I will only result in people with this name.
On the other hand, in a table containing 500,000 contacts, the query takes more than 5 seconds. Are there any possible points for improvement in the query or in the indexes to be done in order to have a faster query?

Did you try to use "UNION" 2 sets of results instead of using the "OR" operator in the "WHERE" clause? Because I'm just afraid of the index can not be used with the "OR" operator.
The query will be something like this:
SELECT td__user.*,
td__user_oauth.facebook_id,
td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE td__user.id LIKE :id
UNION
SELECT td__user.*,
td__user_oauth.facebook_id,
td__user_oauth.google_id
FROM td__user
LEFT JOIN td__user_oauth ON td__user.id = td__user_oauth.user_id
WHERE MATCH (email, firstname, lastname) AGAINST (:match IN BOOLEAN MODE))
ORDER BY date_accountcreated DESC LIMIT 20 OFFSET 0
Hope this can help!

FULLTEXT is fast; LIKE with a leading % is slow; OR is slow.
Consider this approach:
Run the MATCH part of the query.
If no results, then run the LIKE query but only allow trailing wildcards.

SQL query taking too long to execute

I am currently working on a project that requires me to scan the Public Whip Raw Data and return a list of MP's names (who have voted for a policy that matches the keywords that have been input, eg "fox hunting). The current SQL query takes about 30 seconds to finish executing, which is way too long.
This is the SQL query that looks in the "distance" table and the "policy" table. (This is what is taking too long to execute)
$sql = "SELECT DISTINCT distance.mp_id from distance WHERE distance.distance < 0.2 AND distance.dream_id IN (SELECT dream_id from policy WHERE UPPER(policy.title) LIKE UPPER('%".$keyword."%')) ORDER BY distance.distance LIMIT 5";
This is the rest of the code that just echo's out the mp names
$results = mysql_query($sql);
echo "<ul>";
while ($row = mysql_fetch_array($results)) {
$mpid = $row['mp_id'];
$sql = "SELECT mp.first_name,mp.last_name FROM mp WHERE mp_id = ".$mpid;
$result = mysql_query($sql);
$result = mysql_fetch_assoc($result);
echo "<li>".$result['first_name']." ".$result['last_name']."</li>\n";
}
echo "</ul>";

This is your query:
SELECT DISTINCT distance.mp_id
from distance
WHERE distance.distance < 0.2 AND
distance.dream_id IN (SELECT dream_id
from policy
WHERE UPPER(policy.title) LIKE UPPER('%".$keyword."%')
)
ORDER BY distance.distance
LIMIT 5;
In some versions of MySQL, the in with a subquery is inefficient. Let me also assume that mp_id is unique for the table distance. This query might work better:
SELECT d.mp_id
from distance d
WHERE d.distance < 0.2 AND
exists (select 1
from policy p
where UPPER(p.title) LIKE UPPER('%".$keyword."%') and
p.dream_id = d.dream_id
)
ORDER BY d.distance
LIMIT 5;
This query would be further improved by having an index on policy(dream_id) and possibly distance(distance).
Depending on how large the policy table is, one major impediment to performance is the expression UPPER(policy.title) LIKE UPPER('%".$keyword."%'). If you really mean equality, then use equality and not like with wildcards. If you are really storing multiple keywords in the title column, then consider either breaking these out into a separate table or using full text search.

What is the query statement to write in order to solve the followin database problem?

I have the following 3 tables in the database.
Programs_Table
Program_ID (Primary Key)
Start_Date
End_Date
IsCompleted
IsGoalsMet
Program_type_ID
Programs_Type_Table(different types of programs, supports a dropdown list in the form)
Program_type_ID (Primary Key)
Program_name
Program_description
Client_Program_Table
Client_ID (primary key)
Program_ID (primary key)
What is the best way to find out how many clients are in a specific program (program type)?
Would the following SQL statement be the best way, or even plausible?
SELECT Client_ID FROM Client_Program_Table
INNER JOIN Programs_Table
ON Client_Program_Table.Program_ID = Programs_Table.Program_ID
WHERE Programs_Table.Program_type_ID = "x"
where "x" is the Program_type_ID of the specific program we're interested in.
OR is the following a better way?
$result = mysql_query("SELECT Program_ID FROM Programs_Table
WHERE Program_type_ID = 'x'");
$row = mysql_fetch_assoc($result);
$ProgramID = $row['Program_ID'];
$result = mysql_query("SELECT * FROM Client_Program_Table
WHERE Program_ID = '$ProgramID'");
mysql_num_rows($result) // returns how many rows of clients we pulled.
Thank you in advance, please excuse my inexperience and any mistakes that I've made.

Here is how you can do it:
<?php
// always initialize a variable
$number_of_clients = 0;
// escape the string which will go in an SQL query
// to protect yourself from SQL injection
$program_type_id = mysql_real_escape_string('x');
// build a query, which will count how many clients
// belong to that program and put the value on the temporary colum "num_clients"
$query = "SELECT COUNT(*) `num_clients` FROM `Client_Program_Table` `cpt`
INNER JOIN `Programs_Table` `pt`
ON `cpt`.`Program_ID` = `pt`.`Program_ID`
AND `pt`.`Program_type_ID` = '$program_type_id'";
// execute the query
$result = mysql_query($query);
// check if the query executed correctly
// and returned at least a record
if(is_resource($result) && mysql_num_rows($result) > 0){
// turn the query result into an associative array
$row = mysql_fetch_assoc($result);
// get the value of the "num_clients" temporary created column
// and typecast it to an intiger so you can always be safe to use it later on
$number_of_clients = (int) $row['num_clients'];
} else{
// query did not return a record, so we have no clients on that program
$number_of_clients = 0;
}
?>

If you want to know how many clients are involved in a program, you'd rather want to use COUNT( * ). MySQL (with MyISAM) and SQL Server have a fast way to retrieve the total number of lines. Using a SELECT(*), then mysql_num_rows leads to unnecessary memory ressources and computing time. To me, this is the fastest, though not the "cleanest" way to write the query you want:
SELECT
COUNT(*)
FROM
Client_Program_Table
WHERE
Program_ID IN
(
SELECT
Program_ID
FROM
Programs_Table
WHERE
Program_type_ID = 'azerty'
)
Why is that?
Using JOIN make queries more readable, but subqueries often prove to be computed faster.

This returns a count of the clients in a specific program type (x):
SELECT COUNT(cpt.Client_ID), cpt.Program_ID
FROM Client_Program_Table cpt
INNER JOIN Programs_Table pt ON cpt.Program_ID=pt.Program_ID
WHERE pt.Program_type_ID = "x"
GROUP BY cpt.Program_ID

Summing a field from all tables in a database

I have a MySQL database called "bookfeather." It contains 56 tables. Each table has the following structure:
id site votes_up votes_down
The value for "site" is a book title. The value for "votes_up" is an integer. Sometimes a unique value for "site" appears in more than one table.
For each unique value "site" in the entire database, I would like to sum "votes_up" from all 56 tables. Then I would like to print the top 25 values for "site" ranked by total "votes_up".
How can I do this in PHP?
Thanks in advance,
John

You can do something like this (warning: Extremely poor SQL ahead)
select site, sum(votes_up) votes_up
from (
select site, votes_up from table_1
UNION
select site, votes_up from table_2
UNION
...
UNION
select site, votes_up from table_56
) group by site order by sum(votes_up) desc limit 25
But, as Dav asked, does your data have to be like this? There are much more efficient ways of storing this kind of data.
Edit: You just mentioned in a comment that you expect there to be more than 56 tables in the future -- I would look into MySQL limits on how many tables you can UNION before going forward with this kind of SQL.

Here's a PHP code snip that should get it done.
I have not tested it so it might have some typos and stuff, make sure you replace DB_NAME
$result = mysql_query("SHOW TABLES");
$tables = array();
while ($row = mysql_fetch_assoc($result)) {
$tables[] = '`'.$row["Tables_in_DB_NAME"].'`';
}
$subQuery = "SELECT site, votes_up FROM ".implode(" UNION ALL SELECT site, votes_up FROM ",$tables);
// Create one query that gets the data you need
$sqlStr = "SELECT site, sum(votes_up) sumVotesUp
FROM (
".$subQuery." ) subQuery
GROUP BY site ORDER BY sum(votes_up) DESC LIMIT 25";
$result = mysql_query($sqlStr);
$arr = array();
while ($row = mysql_fetch_assoc($result)) {
$arr[] = $row["site"]." - ".$row["sumVotesUp"];
}
print_r($arr)

The UNION part of Ian Clelland answer can be generated using a statement like the following. The table INFORMATION_SCHEMA.COLUMNS has a column TABLE_NAME to get all tables.
select * from information_schema.columns
where table_schema not like 'informat%'
and column_name like 'VOTES_UP'
Join all inner SELECT with UNION ALL instead of UNION. UNION is doing an implicit DISTINCT (on oracle).

The basic idea would be to iterate over all your tables (using a SQL SHOW TABLES statement or similar) in PHP, then for every table, iterate over the rows (SELECT site,votes_up FROM $table). Then, for every row, check the site against an array that you're building with sites as keys and votes up as values. If the site is already in the array, increment its votes appropriately; otherwise, add it.
Vaguely PHP-like pseudocode:
// Build an empty array for use later
$votes_array = empty_array();
// Get all the tables and iterate over them
$tables = query("SHOW TABLES");
for($table in $tables) {
$rows = query("SELECT site,votes_up FROM $table");
// Iterate over the rows in each table
for($row in $rows) {
$site = $row['site'];
$votes = $row['votes_up'];
// If the site is already in the array, increment votes; otherwise, add it
if(exists_in_array($site, $votes_array)) {
$votes_array[$site] += $votes;
} else {
insert_into_array($site => $votes);
}
}
}
// Get the sites and votes as lists, and print out the top 25
$sorted_sites = array_keys($votes_array);
$sorted_votes = array_values($votes_array);
for($i = 0; $i < 25; $i++) {
print "Site " . $sorted_sites[$i] . " has " . $sorted_votes[$i] . " votes";
}

"I allow users to add tables to the database." - I hope all your users are benevolent and trustworthy and capable. Do you worry about people dropping or truncating tables, creating incorrect new tables that break your code, or other things like that? What kind of security do you have when users can log right into your database and change the schema?
Here's a tutorial on relational database normalization. Maybe it'll help.
Just in case someone else that comes after you wants to find what this could have looked like, here's a single table that could do what you want:
create database bookfeather;
create user bookfeather identified by 'bookfeather';
grant all on bookfeather.* to 'bookfeather'#'%';
use bookfeather;
create table if not exists book
(
id int not null auto_increment,
title varchar(255) not null default '',
upvotes integer not null default 0,
downvotes integer not null default 0,
primary key(id),
unique(title)
);
You'd vote a title up or down with an UPDATE:
update book set upvotes = upvotes + 1 where id = ?
Adding a new book is as easy as adding another row:
insert into book(title) values('grails in action')
I'd strongly urge that you reconsider.

MySql variables and php

I am getting an error with this in php. What is the correct way to format this string to pass to mysql_query() in php?
SELECT count(*) FROM agents INTO #AgentCount;
SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount,
COUNT( * ) / ( #AgentCount) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;
In php, here is how I set up the $query
$query = "
SELECT count(*) FROM agents INTO #AgentCount;
SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount,
COUNT( * ) / ( #AgentCount) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50";
That exact query will work fine if I put it directly into MySql via a command line session. Do I need to issue two separate php calls to mysql_query() and store the first result?
I am getting the below error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount' at line 3
The reason for not using a sub select and instead choosing a MySql variable is to avoid a count() happening on every percentage calculation. Though it may be possible the engine is optimizing for that. So far, I have not been able to confirm that. I have also heard sub selects are almost always non optimal.
EXPLAIN tells me this:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY agents index NULL user_agent_parsed 28 NULL 82900 Using temporary; Using filesort
2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Select tables optimized away

You can only have one query at a time in PHP.
$query1 = "SELECT count(*) FROM agents INTO #AgentCount"
$query2="
SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount,
COUNT( * ) / ( #AgentCount) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50";
UPDATE
I have a DAL that contains all my queries. A typical function in my DAL looks like this:
// These functions are reusable
public function getAllRows($table)
{
$sql =" SELECT * FROM $table";
$this->query($sql);
return $this->query_result;
}
Then in my BLL (Business Layer) I have the following:
public function getUserAgents()
{
$result = parent::getAllRows();
$row = mysql_fetch_array($result);
return $row[0]; // Retrieves the first row
// Then you take this value and to a second request. Then return the answer / rows.
}

If you are using mysql_query, then yes, you need to send each query separately. From the description at the top of mysql_query's entry in the PHP manual: "mysql_query() sends a unique query (multiple queries are not supported) to the currently active database..."
As for subqueries, you'd be surprised; the query optimizer generally handles them very well.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Speeding up MySQL query searching multiple tables using MATCH AGAINST - php

Related

Optimize MySQL FULL TEXT search

SQL query taking too long to execute

What is the query statement to write in order to solve the followin database problem?

Summing a field from all tables in a database

MySql variables and php

Categories

Resources