Using IN clause vs. multiple SELECTs - php

I was wondering which of these would be faster (performance-wise) to query (on MySQL 5.x CentOS 5.x if this matters):
SELECT * FROM table_name WHERE id=1;
SELECT * FROM table_name WHERE id=2;
.
.
.
SELECT * FROM table_name WHERE id=50;
or...
SELECT * FROM table_name WHERE id IN (1,2,...,50);
I have around 50 ids to query for. I know usually DB connections are expensive, but I've seen the IN clause isn't so fast either [sometimes].

I'm pretty sure the second option gives you the best performance; one query, one result. You have to start looking for > 100 items before it may become an issue.
See also the accepted answer from here: MySQL "IN" operator performance on (large?) number of values

IMHO you should try it and measure response time: IN should give you better performances...
Anyway if your ids are sequential you could try
SELECT * FROM table_name WHERE id BETWEEN 1 AND 50

Here is another post where the discuss the performance of using OR vs IN. IN vs OR in the SQL WHERE Clause
You suggested using multiple queries, but using OR would also work.

2nd will be faster because resources are consumed when query gets interpreted and during php communication with mysql for sending query and waiting for result , if your data is sequential you can also do just
SELECT * FROM table_name WHERE id <= 50;

I was researching this after experimenting with 3000+ values in an IN clause. It turned out to be multitudes faster than individual SELECTs since the column referenced in the IN was not keyed. My guess is that in my case it only needed to build a temporary index for that column once instead of 3000 separate times.

Related

PHP & MySQL web app - Selecting a single field (vs) select * from table

I am working on converting a prototype web application into something that can be deployed. There are some locations where the prototype has queries that select all the fields from a table although only one field is needed or the query is just being used for checking the existence of the record. Most of the cases are single row queries.
I'm considering changing these queries to queries that only get what is really relevant, i.e.:
select * from users_table where <some condition>
vs
select name from users_table where <some condition>
I have a few questions:
Is this a worthy optimization in general?
In which kind of queries might this change be particularly good? For example, would this improve queries where joins are involved?
Besides the SQL impact, would this change be good at the PHP level? For example, the returned array will be smaller (a single column vs multiple columns with data).
Thanks for your comments.
If I were to answer all of your three questions in a single word, I would definitely say YES.
You probably wanted more than just "Yes"...
SELECT * is "bad practice": If you read the results into a PHP non-associative array; then add a column; now the array subscripts are possibly changed.
If the WHERE is complex enough, or you have GROUP BY or ORDER BY, and the optimizer decides to build a tmp table, then * may lead to several inefficiencies: having to use MyISAM instead of MEMORY; the tmp table will be bulkier; etc.
EXISTS SELECT * FROM ... comes back with 0 or 1 -- even simpler.
You may be able to combine EXISTS (or a suitable equivalent JOIN) to other queries, thereby avoiding an extra roundtrip to the server.

Joining a count query mysql for performance

Have searched but can't find an answer which suits the exact needs for this mysql query.
I have the following quires on multiple tables to generate "stats" for an application:
SELECT COUNT(id) as count FROM `mod_**` WHERE `published`='1';
SELECT COUNT(id) as count FROM `mod_***` WHERE `published`='1';
SELECT COUNT(id) as count FROM `mod_****`;
SELECT COUNT(id) as count FROM `mod_*****`;
pretty simple just counts the rows sometimes based on a status.
however in the pursuit of performance i would love to get this into 1 query to save resources.
I'm using php to fetch this data with simple mysql_fetch_assoc and retrieving $res[count] if it makes a difference (pro isn't guaranteed, so plain old mysql here).
The overhead of sending a query and getting a single-row response is very small.
There is nothing to gain here by combining the queries.
If you don't have indexes yet an INDEX on the published column will greatly speed up the first two queries.
You can use something like
SELECT SUM(published=1)
for some of that. MySQL will take the boolean result of published=1 and translate it to an integer 0 or 1, which can be summed up.
But it looks like you're dealing with MULTIPLE tables (if that's what the **, *** etc... are), in which case you can't really. You could use a UNION query, e.g.:
SELECT ...
UNION ALL
SELECT ...
UNION ALL
SELECT ...
etc...
That can be fired off as one single query to the DB, but it'll still execute each sub-query as its own query, and simply aggregate the individual result sets into one larger set.
Disagreeing with #Halcyon I think there is an appreciable difference, especially if the MySQL server is on a different machine, as every single query uses at least one network packet.
I recommend you UNION the queries with a marker field to protect against the unexpected.
As #Halcyon said there is not much to gain here. You can anyway do several UNIONS to get all the result in one query

When should I consider saving the total count in a field?

For example, if I have to count the comments belonging to an article, it's obvious I don't need to cache the comments total.
But what if I want to paginate a gallery (WHERE status = 1) containing 1 million photos. Should I save that in a table called counts or SELECT count(id) as total every time is fine?
Are there other solutions?
Please advise. Thanks.
For MySQL, you don't need to store the counts, you can use SQL_CALC_FOUND_ROWS to avoid two queries.
E.g.,
SELECT SQL_CALC_FOUND_ROWS *
FROM Gallery
WHERE status = 1
LIMIT 10;
SELECT FOUND_ROWS();
From the manual:
In some cases, it is desirable to know how many rows the statement
would have returned without the LIMIT, but without running the
statement again. To obtain this row count, include a
SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke
FOUND_ROWS() afterward.
Sample usage here.
It depends a bit on the amount of queries that are done on that table with 1 million records. Consider just taking care of good indexes, especially also multi-column indexes (because they are easily forgotton: here. That will do a lot. And, be sure the queries become cached also well on your server.
If you use this column very regular, consider saving it (if it can't be cached by MySQL), as things could become slow. But most of the times good indexing will take care of it.
Best try: setup some tests to find out if a query can still be fast and performance is not dropping when you execute it a lot of times in a row.
EXPLAIN [QUERY]
Use that command (in MySQL) to get information about the way the query is performed and if it can be improved.
Doing the count every time would be OK.
During paging, you can use SQL_CALC_FOUND_ROWS anyway
Note:
A denormalied count will become stale
No-one will page so many items

How to Requery a query?

consider "Query1", which is quite time consuming. "Query1" is not static, it depends on $language_id parameter, thats why I can not save it on the server.
I would like to query this "Query1" with another query statement. I expect, that this should be fast. I see perhaps 2 ways
$result = mysql_query('SELECT * FROM raw_data_tbl WHERE ((ID=$language_id) AND (age>13))');
then what? here I want to take result and requery it with something like:
$result2 = mysql_query('SELECT * FROM $result WHERE (Salary>1000)');
Is it possible to create something like "on variable based" MYSQL query directly on the server side and pass somehow variable $language_id to it? The second query would query that query :-)
Thanks...
No, there is no such thing as your second idea.
For the first idea, though, I would go with a single query :
select *
from raw_data
where id = $language_id
and age > 13
and Salary > 1000
Provided you have set the right indexes on your table, this query should be pretty fast.
Here, considering the where clause of that query, I would at least go with an index on these three columns :
id
age
Salary
This should speed things up quite a bit.
For more informations on indexes, and optimization of queries, take a look at :
Chapter 7. Optimization
7.3.1. How MySQL Uses Indexes
12.1.11. CREATE INDEX Syntax
With the use of sub queries you can take advantage of MySQL's caching facilities.
SELECT * FROM raw_data_tbl WHERE (ID='eng') AND (age>13);
... and after this:
SELECT * FROM (SELECT * FROM raw_data_tbl WHERE (ID='eng') AND (age>13)) WHERE salary > 1000;
But this is only beneficial in some very rare circumstances.
With the right indexes your query will run fast enough without the need of trickery. In your case:
CREATE INDEX filter1 ON raw_data_tbl (ID, age, salary);
Although the best solution would be to just add conditions from your second query to the first one, you can use temporary tables to store temporary results. But it would still be better if you put that in a single query.
You could also use subqueries, like SELECT * FROM (SELECT * FROM table WHERE ...) WHERE ....

Optimizing a PHP page: MySQL bottleneck

I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.
I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).
The queries take the form of the following with the tag_id changing on each of the 25 different calls:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.
My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.
Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):
1 PRIMARY tbl_news ALL NULL NULL NULL NULL 1318 Using where
2 DEPENDENT SUBQUERY tbl_tag_relations ref FK_tbl_tag_tags_1 FK_tbl_tag_tags_1 4 const 179 Using where
I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
and
SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20
as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.
By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.
So I would add indexes on the following:
tbl_news (news_id)
tbl_tag_relations (news_id)
tbl_tag_relations (tag_id)
and the query will execute almost instantaneously.
Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.
The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:
SELECT *
FROM tbl_news, tbl_tag_relations
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id
To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:
SELECT news_title, news_body
...
LIMIT 5;
You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.
Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)
If I understand correctly, this is just listing the news stories for a specific set of tags.
First of all, you really shouldn't
ever SELECT *
Second, this can probably be
accomplished within a single query,
thus reducing the overhead cost of
multiple queries. It seems like it
is getting fairly trivial data so
it could be retrieved within a
single call instead of 20.
A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
Your tbl_tag_relations should definitely have an index on tag_id
select *
from tbl_news, tbl_tag_relations
where
tbl_tag_relations.tag_id = 20 and
tbl_news.news_id = tbl_tag_relations.news_id
limit 20
I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.
Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.
For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:
SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)
would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.
Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.

Categories