I have a query I need to run which is currently running extremely slow. The table I am working with has nearly 1 million records.
What I need to do is search for items with a LIKE name : Example "SELECT name, other, stuff FROM my_table WHERE name LIKE '%some_name%'"
In addition it has to be able to exclude certain terms with wildcards from the search results so it turns into something like this :
SELECT name, other, stuff
FROM my_table
WHERE name LIKE '%some_name%'
AND name NOT IN (SELECT name FROM my_table WHERE name LIKE '%term1%' AND name LIKE '%term2%' AND name LIKE '%term3%')
To top things off, I have two INNER JOINs and I have to execute it in PHP. I have indexes on the table for relevant columns. It is on a server that is fast for pretty much every other query I can throw at it.
The question is, is there a different method I could be using to do this type of query thats is quick?
UPDATES
This is my actual query after adding in FULLTEXT index to the chem table and trying the match function instead. NOTE : I don't have control over table/column names. I know they dont look nice.
CREATE FULLTEXT INDEX name_index ON chems (name)
SELECT f.name fname, f.state, f.city, f.id fid,
cl.alt_facilid_ida, cl.alt_facili_idb,
c.ave, c.name, c.id chem_id,
cc.first_name, cc.last_name, cc.id cid, cc.phone_work
FROM facilities f
INNER JOIN facilitlt_chemicals_c cl ON (f.id = cl.alt_facilid485ilities_ida)
INNER JOIN chems c ON (cl.alt_facili3998emicals_idb = c.id)
LEFT OUTER JOIN facilities_contacts_c con ON (f.id = con.alt_facili_ida)
LEFT OUTER JOIN contacts cc ON (con.alt_facili_idb = cc.id)
WHERE match(c.name) against ('+lead -gas' IN BOOLEAN MODE)
That is just an example of a simple query. The actual terms could be numerous.
You want to implement MySQL's full text capabilities on the columns you're searching. Read more about it here.
Moreover, you probably want to do a boolean search:
select
t1.name,
t1.stuff
from
my_table t1
where
match(t1.name) against ('+some_name -term1 -term2 -term3' in boolean mode)
I would suggest to use external Full-text engine like Lucene or Sphinx. Especially if you plan to fire search queries against relatively big datasets.
In Sphinx case you can use SELECT-like syntax called SphinxQL which will do exactly as required:
SELECT * FROM <sphinx_index_name> WHERE MATCH('some_name -term1 -term2 -term3');
as described in http://sphinxsearch.com/docs/current.html#extended-syntax (NOT operator in your case). Plus you could enable morphology support which might be helpful as well.
You can combine MySQL and Sphinx queries with http://sphinxsearch.com/docs/current.html#sphinxse
Related
I have two tables, one that looks like this:
ID, Datetime, User_ID, Location and Status // the rest is not relevant
And the other looks like this:
ID, Lastname // the rest is not relevant
Now I only want to get the entry of the first table with the highest Datetime per User_ID and ask the other table the lastname of the User_ID. Simple...
I tried it this way (whick looks like the most promising but is false nontheless):
SELECT w.Datetime, w.User_ID, w.Status, e.Lastname
FROM worktimes AS w
INNER JOIN employees AS e
ON w.User_ID=e.ID
RIGHT JOIN (SELECT max(Datetime) AS Datetime, User_ID
FROM worktimes
WHERE Datetime>1467583200 AND Location='16'
GROUP BY User_ID
ORDER BY Datetime DESC
) AS v
ON v.User_ID=w.User_ID
GROUP BY w.User_ID
ORDER BY e.Nachname;
Could someone give me a hint please? I'm really stuck at this for a while now and now i begin to get some knots in my brain... :(
You are very close, actually:
SELECT w.Datetime, w.User_ID, w.Status, e.Lastname
FROM worktimes w INNER JOIN
employees e
ON w.User_ID = e.ID LEFT JOIN
(SELECT max(Datetime) AS Datetime, User_ID
FROM worktimes
WHERE Datetime > 1467583200 AND Location = '16'
GROUP BY User_ID
) ww
ON ww.User_ID = w.User_ID AND w.DateTime = ww.DateTime
ORDER BY e.Nachname;
Notes:
You do need to join on the DateTime value.
The RIGHT JOIN is unnecessary. I replaced it with a LEFT JOIN, but I'm not sure that is what you want either. You might start with an INNER JOIN to see if that produces what you want.
Do not use ORDER BY in subqueries in most circumstances.
You do not need the GROUP BY in the outer query
What you are asking about is known as correlated subqueries. In standard SQL it can be implemented using APPLY and LATERAL constructs. Unfortunatelly, not all RDBMS support these elegant solutions. For example, MSSQL, recent versions of Oracle and Postgresql have these constructs, but MySQL does not. IMHO, it is a real pain for MySQL users, because in recent years MySQL started to lean towards standard, but in some strange manner - by default it switches off its non-standard hacks, but does not implement standard counterparts. For example, your own query presented in your question will not work by default in recent versions of MySQL, because sorting in subqueries is not supported any more and to make it work you have to use some nasty hack - add LIMIT some_really_big_number to the subquery.
I'm trying to write a simple interface for a list of companies using MySQL and PHP. So, I want to fetch some information from my database.
Here are my tables:
companies_data - only for system information.
corporate_data - here I want to keep information about big companies.
individual_data - and here I want to keep information about little companies.
So, here is the tables
And here is the query that I've written:
SELECT
a.id,
a.user_id,
a.added,
a.`status`,
a.company_id,
a.company_type,
a.deposit,
a.individual_operations_cache,
a.corporate_operations_cache,
a.physical_operations_cache,
b.full_name,
b.tax_number,
b.address,
b.statement_date,
b.psrn,
c.full_name,
c.tax_number,
c.address,
c.statement_date,
c.psrn
FROM
companies_data a
LEFT OUTER JOIN corporate_data b
ON (a.company_id = b.id) AND a.company_type = 0
LEFT OUTER JOIN individual_data c
ON (a.company_id = c.id) AND a.company_type = 1
WHERE
a.user_id = 3
This is just the code for a test, I'll expand it soon.
As you see, I've got result with extra fields like %field_name%1, %another_field_name%1 and so on. Of course it is not the mysql error - what I've asked that I've got - but I want to remove this fields? It's possible or I must convert this output on the application side?
thos %field_name%1, %another_field_name%1 , are visible since you are selecting them in your query:
b.full_name,
b.tax_number,
b.address,
b.statement_date,
b.psrn,
c.full_name,
c.tax_number,
c.address,
c.statement_date,
c.psrn
When you use fields with the same name in distinct tables, then the result column name come with this identifier field1, field2, fieldn... in order to distinguish from which table does the field come from.
If you want to avoid this names, you can use aliases as follows:
[...]
b.full_name as corporate_full_name,
[...]
Probably, if every common fields are coincident, you won´t need to show them all, so just remove them from the select.
Hope being usefull for you.
Br.
Can you let me know if my interpretation is correct (the last AND part)?
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
Below was really where I am not sure:
... AND id IN (select registrar_id from registrations_industry where industry_id='$industryid')
Interpretation: Get any match on id(registrations id field) equals registrar_id(field) from the join table registrations_industry where industry_id equals the set $industryid
Is this select statement considered a sub routine since it's a query within the main query?
So an example would be with the register table id search to 23 would look like:
registrations(table)
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
registrations_industry(table)
id=256, registrar_id=23, industry_id=400<br>
id=159, registrar_id=23, industry_id=284<br>
id=227, registrar_id=23, industry_id=357
I assume this would return 3 records with the same registration table data And of course varying registrations_industry returns.
For a given test data set your query will return one record. This one:
id=23,title=owner,name=mike,company=nono,address1=1234 s walker lane,address2
To get three records with the same registration table data and varying registrations_industry you need to use JOIN.
Something like this:
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations AS r
LEFT OUTER JOIN registrations_industry AS ri
ON ri.registrar_id=r.id
WHERE r.title!=0 AND ri.industry_id={$industry_id}
Sorry for the essay, I didn't realize it was as long as it is until looking at it now. And although you've checked an answer, I hope you read this gain some insight into why this solution is preferred and how it evolved out of your original query.
First things first
Your query
$q = "SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND id IN (
SELECT registrar_id
FROM registrations_industry
WHERE industry_id = '$industryid'
)";
seems fine. The IN syntax is equivalent to a number of OR matches. For example
WHERE field_id IN (101,102,103,105)
is functionally equivalent to
WHERE (field_id = 101
OR field_id = 102
OR field_id = 103
OR field_id = 105)
You complicate it a bit by introducing a subquery, no problem. As long as your subquery returns one column (and yours does), passing it to IN will be fine.
In your case, you're comparing registrations.id to registrations_industry.registrar_id. (Note: This is just <table>.<field> syntax, nothing special, but helpful to disambiguate what tables your fields are in.)
This seems fine.
What happens
SQL would first run the subquery, generating a result set of registrar_ids where the industry_id was set as specified.
SQL would then run the outer query, replacing the subquery with its results and you would get rows from registrations where registrations.id matched one of the registrar_ids returned from the subquery.
Subqueries are helpful to debug your code, because you can pull out the subquery and run it separately, ensuring its output is as you expect.
Optimization
While subqueries are good for debugging, they're slow, at least slower than using optmized JOIN statements.
And in this case, you can convert your query to a single-level query (without subqueries) by using a JOIN.
First, you'd start with basically the exact same outer query:
SELECT title,name,company,address1,address2
FROM registrations
WHERE title != 0 AND ...
But you're also interested in data from the registrations_industry table, so you need to include that. Giving us
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0 AND ...
We need to fix the ... and now that we have the registrations_industry table we can:
SELECT title,name,company,address1,address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Now a problem might arise if both tables have an id column -- since just saying id is ambiguous. We can disambiguate this by using the <table>.<field> syntax. As in
SELECT registrations.title, registrations.name,
registrations.company, registrations.address1, registrations.address2
FROM registrations, registrations_industry
WHERE registrations.title != 0
AND registrations_industry.industry_id = '$industryid'
We didn't have to use this syntax for all the field references, but we chose to for clarity. The query now is unnecessarily complex because of all the table names. We can shorten them while still providing disambiguation and clarity. We do this by creating table aliases.
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND ri.industry_id = '$industryid'
By placing r and ri after the two tables in the FROM clause, we're able to refer to them using these shortcuts. This cleans up the query but still gives us the ability to clearly specify which tables the fields are coming from.
Sidenote: We could be more explicit about the table aliases by including the optional AS e.g. FROM registrationsASr rather than just FROM registrations r, but I typically reserve AS for field aliases.
If you run the query now you will get what is called a "Cartesian product" or in SQL lingo, a CROSS JOIN. This is because we didn't define any relationship between the two tables when, in fact, there is one. To fix this we need to reintroduce part of the original query that was lost: the relationship between the two tables
r.id = ri.registrar_id
so that our query now looks like
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
And this should work perfectly.
Nitpicking -- implicit vs. explicit joins
But the nitpicker in me needs to point out that this is called an "implicit join". Basically you're joining tables but not using the JOIN syntax.
A simpler example of an implicit join is
SELECT *
FROM foo f, bar b
WHERE f.id = b.foo_id
The corresponding explicit syntax is
SELECT *
FROM foo f
JOIN bar b ON f.id = b.foo_id
The result will be identical but it is using proper (and clearer) syntax. (Its clearer because it explicitly stats that there is a relationship between the foo and bar tables and it is defined by f.id = b.foo_id.)
We could similarly express your implicit query
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r, registrations_industry ri
WHERE r.title != 0
AND r.id = ri.registrar_id
AND ri.industry_id = '$industryid'
explicitly as follows
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
As you can see, the relationship between the tables is now in the JOIN clause, so that the WHERE and subsequent AND and OR clauses are free to express any restrictions. Another way to look at this is if you took out the WHERE + AND/OR clauses, the relationship between tables would still hold and the results would still "make sense" whereas if you used the implicit method and removed the WHERE + AND/OR clauses, your result set would contain rows that were misleading.
Lastly, the JOIN syntax by itself will cause rows that are in registrations, but do not have any corresponding rows in registrations_industry to not be returned.
Depending on your use case, you may want rows from registrations to appear in the results even if there are no corresponding entries in registrations_industry. To do this you would use what's called an OUTER JOIN. In this case, we want what is called a LEFT OUTER JOIN because we want all of the rows of the table on the left (registrations). We could have alternatively used RIGHT OUTER JOIN for the right table or simply OUTER JOIN for the outer join of both tables.
Therefore our query becomes
SELECT r.title, r.name, r.company, r.address1, r.address2
FROM registrations r
LEFT OUTER JOIN registrations_industry ri ON r.id = ri.registrar_id
WHERE r.title != 0
AND ri.industry_id = '$industryid'
And we're done.
The end result is we have a query that is
faster in terms of runtime
more compact / concise
more explicit about what tables the fields are coming from
more explicit about the relationship between the tables
A simpler version of this query would be:
SELECT title, name, company, address1, address2
FROM registrations, registrations_industry
WHERE title != 0
AND id = registrar_id
AND industry_id = '$industryid'
Your version was a subquery, this version is a simple join. Your assumptions about your query are generally correct, but it is harder for SQL to optimize and a little harder to unravel to anyone trying to read the code. Also, you won't be able to extract the data from the registrations_industry table in that parent SELECT statement because it's not technically joining and the subtable is not a part of the parent query.
I hate to submit a new question, but everyone else has some slight thing that is different enough to make this one seem necessary to ask.
Users are to type in a vendor name, and then see all the "kinds" of things they have bought from that company, in a list, sorted by the lowest-inventory-on-hand.
Summary:
I have three tables.
There are more fields than these, but these are the relevant ones (as far as I can tell).
stuff_table
stuff_vendor_name *(search this field with $user_input, but only one result per lookup_type)*
lookup_type
lookup_table
lookup_type
lookup_quantity (order by this)
category_type
category_table
category_type
category_location (check if this field == $this_location, which is already assigned)
Wordier Explanation:
The users are searching for a value that is contained only in the stuff_table -- distinct stuff_vendor_name values for each lookup_type. Each item can be bought from multiple sources, the idea is to see if any vendor has ever sold even one of any type of item before.
But the results need to be ORDER BY the lookup_quantity, in the lookup_table.
And importantly, I have to check to see if they are searching the correct location for these categories, located in the category_table in the category_location field.
How do I efficiently make this query?
Above, I mentioned the variables that I have:
$user_input (the value we are searching for distinct matches in the stuff_vendor_name field) and $current_location.
To understand the relationship of these tables, I will use an example.
The stuff_table would have dozens of entries with dozens of vendors, but have a lookup_type of, say, "watermelon," "apple," or "cherry."
The lookup_table would give the category_type of "Jellybean." One category type can have multiple lookup_types. But each lookup_type has exactly one category_type.
You are not sharing much about the relationships, but try this:
SELECT *
FROM stuff_table st
LEFT JOIN lookup_table lt
ON st.lookup_type = lt.lookup_type
LEFT JOIN category_table ct
ON lt.category_type = ct.category_type
AND ct.category_location = $this_location
GROUP BY st.lookup_type
ORDER BY lt.lookup_quantity
WHERE st.stuff_vendor_name = $user_input
From a first glance at it you could use foreign keys in your tables to make link between them or using the LEFT JOIN mysql command to make abstraction of another linked table.
The only example I can think of is on a Doctrine pattern, but I think you'll get what I'm saying:
$q = Doctrine_Query::create()
->from('Default_Model_DbTable_StuffTable s')
->leftJoin('s.LookupTable l')
->leftJoin('s.CategoryTable c')
->orderBy('l.lookup_quantity DESC');
$stuff= $q->execute(array(), Doctrine_Core::HYDRATE_ARRAY);
I made a nested query instead.
The final code looks like this:
$query_row=mysql_query(
"SELECT DISTINCT * FROM table_a WHERE
field_1 IN (SELECT field_1 FROM table_b WHERE field_2 = $field_2)
AND field_3 IN (SELECT field_3 FROM table_c WHERE field_4 = $field_4)
ORDER BY field_5 DESC
");
This was incredibly simple. I just didn't know you could do a nested query like that.
I read it was "bad form" because it makes some kind of search optimization not as good as it could be, so be careful using nested select statements.
However for me, it seemed to actually be significantly faster.
I'm building a searchfunctionallity where a user can select the different libraries to search in with the provided keywords. Also the AND operator is supported. My problem is that when I try to search for information it gets somehow multiplied or it gets increased exponential??
I'm using a MySQL database and PHP to process the results.
A sample query looks like this, sample tablenames taken for readability:
SELECT table1.id AS table1_id,
table2.id AS table2_id,
table1.*,
table2.*
FROM table1, table2
WHERE (table1.column1 LIKE '%key%' OR table1.column2 LIKE '%key%') OR
(table2.column1 LIKE '%key%' OR table2.column2 LIKE '%key%')
Or when a user provides keywords like 'key AND yek' the queries will look like:
SELECT table1.id AS table1_id,
table2.id AS table2_id,
table1.*,
table2.*
FROM table1, table2
WHERE ( (table1.column1 LIKE '%key%' AND table1.column2 LIKE '%key%') OR
(table1.colum1 LIKE '%yek%' AND table1.colum2 LIKE '%yek%') )
OR
(( table2.column1 LIKE '%key%' AND table2.column2 LIKE '%key%') OR
(table2.colum1 LIKE '%yek%' AND table2.colum2 LIKE '%yek%') )
I've searched arround and came accross UNION, but that won't do the trick. Firstly because of the tables I'm querying are not equal and it's a pretty expensive query. So also because the searchfunctionallity will be used often it's not an option.
In fact I'm querying 5 tables and per table between 2 or 4 columns. I set up a testrow in one of the tables with a unique value so I should get 1 row as result but in fact I'm getting a lot more. When I select all 5 tables I'm getting over 10K resultrows. So my guess is the result gets multiplied some way or another. I've played with the operators AND and OR but so far without any luck.
Is there someone who can help me further in my 'quest' to build a userfriendly searchengine in a webapp?
you should take a look at a real full text search engine, there is many around:
lucene
sphinx
Mysql: MySQL has a full text search engine but it's kind of slow so I don't advise you to use it apart if you want a quick solution
couchDB: that's not really a full text search engine but in your case it might helps but you might be able to create a DB from your current MySQL database and use that only for search purpose.
depending how your table are done, you might want to create a table with all the information you need.
A join will solve the problem of your dataset appearing to 'multiply'.
http://en.wikipedia.org/wiki/Join_%28SQL%29
This requires a field from Table1 to also be present in Table2 and used as a foreign key.
For eg, Table1 and Table2 both have ProductID:
Select * From Table1 Left Outer Join Table2 on Table1.ProductID = Table2.ProductID
Some suggestions:
I would suggest using (INNER?) JOIN instead of SELECT FROM table1, table2. That will narrow the number of tables. This should limit the output considerably.
Look into the DISTINCT keyword (or parallels like GROUP BY), it should further limit the output
DON'T rule out UNION. I don't know about MySQL, but in Oracle's SQL it is often two to three times faster than using an "OR". If you have
-- Admittedly, this query is a bit defective, but I'm trying to demonstrate.
SELECT
table1.data
FROM
table1, table2
WHERE
table1.data = 7
UNION
SELECT
table1.data
FROM
table1, table2
WHERE
table1.data = 8;
that will be far faster than using:
SELECT
table1.data, table2.*
FROM
table1, table2
WHERE
table1.data = 7 OR table1.data = 8;
Overall, I think that you'll see better results using something like this (this is Oracle syntax):
SELECT
ti.id,
t1.data,
t2.*
FROM
table1 t1
JOIN
table2 t2
ON(
t1.id = t2.id
)
WHERE
table1.data LIKE '%foo%'
UNION
SELECT
ti.id,
t1.data,
t2.*
FROM
table1 t1
JOIN
table2 t2
ON(
t1.id = t2.id
)
WHERE
table1.data LIKE '%bar%';
------- EDIT -------
It has been protested that "JOIN will not work as the tables are completely different". So long as there exists some key which exists that can relate one of the tables to another, JOIN will be able to create (effectively) an "uber-table" which has access to all of the data of both tables. If this index does not exist, then you'll probably need some system to concatenate the information outside of the SQL query.
I found the fulltext solutions to 'heavy' for my situation. I've created a query-array where queries per table are stored and later on I've used them for displaying the data I wanted. Everybody thanks for their effort!
I think what you want is to use UNION, which can combine the result of different SELECT queries.
(SELECT id FROM table1 WHERE (table1.column1 LIKE '%key%' OR table1.column2 LIKE '%key%'))
UNION
(SELECT id FROM table2 WHERE (table2.column1 LIKE '%key%' OR table2.column2 LIKE '%key%'))