Joining a count query mysql for performance - php

Have searched but can't find an answer which suits the exact needs for this mysql query.
I have the following quires on multiple tables to generate "stats" for an application:
SELECT COUNT(id) as count FROM `mod_**` WHERE `published`='1';
SELECT COUNT(id) as count FROM `mod_***` WHERE `published`='1';
SELECT COUNT(id) as count FROM `mod_****`;
SELECT COUNT(id) as count FROM `mod_*****`;
pretty simple just counts the rows sometimes based on a status.
however in the pursuit of performance i would love to get this into 1 query to save resources.
I'm using php to fetch this data with simple mysql_fetch_assoc and retrieving $res[count] if it makes a difference (pro isn't guaranteed, so plain old mysql here).

The overhead of sending a query and getting a single-row response is very small.
There is nothing to gain here by combining the queries.
If you don't have indexes yet an INDEX on the published column will greatly speed up the first two queries.

You can use something like
SELECT SUM(published=1)
for some of that. MySQL will take the boolean result of published=1 and translate it to an integer 0 or 1, which can be summed up.
But it looks like you're dealing with MULTIPLE tables (if that's what the **, *** etc... are), in which case you can't really. You could use a UNION query, e.g.:
SELECT ...
UNION ALL
SELECT ...
UNION ALL
SELECT ...
etc...
That can be fired off as one single query to the DB, but it'll still execute each sub-query as its own query, and simply aggregate the individual result sets into one larger set.

Disagreeing with #Halcyon I think there is an appreciable difference, especially if the MySQL server is on a different machine, as every single query uses at least one network packet.
I recommend you UNION the queries with a marker field to protect against the unexpected.

As #Halcyon said there is not much to gain here. You can anyway do several UNIONS to get all the result in one query

Related

the maximum limit of items a 'In' clause can handle

what are the limits of data i can pass to a database in a programing language(like php).
suppose i have 1 million records in my database and I have 1 million data in my hand which i want to do a exist checking. if i used a query like
select id from table where id in (array of 1 million data)
what will happen? will this request even reach database?
if it reaches, what are the posibilities ,will it returns a data a better speed than a million querys to db searching id's or a full select data call with millions of for loops.
just for curiosity!.
There isn't a specific number, however, the documentation specifies you'll likely to have problems once you have "thousands" of values. IN (Transact-SQL) - Remarks:
Explicitly including an extremely large number of values (many
thousands of values separated by commas) within the parentheses, in an
IN clause can consume resources and return errors 8623 or 8632. To
work around this problem, store the items in the IN list in a table,
and use a SELECT subquery within an IN clause.
Error 8623:
The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.
Error 8632:
Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try
to simplify them.
To quote my comment I made:
If you need to pass a large number of values to a query, I suggest a Table-Type parameter. But if you really need to pass 1M+ values then it sounds like something is wrong with your design. You may even be better off listing the values you don't want.
Edit: To add to my comment, many (including myself) prefer to use EXISTS instead of IN. So instead of a query like:
FROM YourTable YT
WHERE YT.YourColumn IN (SELECT OT.YourColumn
FROM OtherTable OT)
You would have the query:
FROM YourTable YT
WHERE EXISTS (SELECT 1
FROM OtherTable OT
WHERE OT.YourColumn = YT.YourColumn)

When should I consider saving the total count in a field?

For example, if I have to count the comments belonging to an article, it's obvious I don't need to cache the comments total.
But what if I want to paginate a gallery (WHERE status = 1) containing 1 million photos. Should I save that in a table called counts or SELECT count(id) as total every time is fine?
Are there other solutions?
Please advise. Thanks.
For MySQL, you don't need to store the counts, you can use SQL_CALC_FOUND_ROWS to avoid two queries.
E.g.,
SELECT SQL_CALC_FOUND_ROWS *
FROM Gallery
WHERE status = 1
LIMIT 10;
SELECT FOUND_ROWS();
From the manual:
In some cases, it is desirable to know how many rows the statement
would have returned without the LIMIT, but without running the
statement again. To obtain this row count, include a
SQL_CALC_FOUND_ROWS option in the SELECT statement, and then invoke
FOUND_ROWS() afterward.
Sample usage here.
It depends a bit on the amount of queries that are done on that table with 1 million records. Consider just taking care of good indexes, especially also multi-column indexes (because they are easily forgotton: here. That will do a lot. And, be sure the queries become cached also well on your server.
If you use this column very regular, consider saving it (if it can't be cached by MySQL), as things could become slow. But most of the times good indexing will take care of it.
Best try: setup some tests to find out if a query can still be fast and performance is not dropping when you execute it a lot of times in a row.
EXPLAIN [QUERY]
Use that command (in MySQL) to get information about the way the query is performed and if it can be improved.
Doing the count every time would be OK.
During paging, you can use SQL_CALC_FOUND_ROWS anyway
Note:
A denormalied count will become stale
No-one will page so many items

Can I "cache" an embedded MySQL select used several times?

Working in Drupal 6, PHP 5.3, and MySQL, I'm building a query that looks roughly like this:
SELECT val from table [and some other tables joined in below]
where [a bunch of clauses, including getting all the tables joined up]
and ('foo' not in (select ...))
and (('bar' in (select...) and x = y)
or ('baz' in (select ...) and p = q))
That's not a great representation of what I'm trying to do, but hopefully it will be enough. The point is that, in the middle of the query there is an embedded SELECT that is used a number of times. It's always the same. It's not completely self-contained -- it relies on a value pulled from one of the tables at the top level of the query.
I'm feeling a little guilty/unclean for just repeating the query every time it's needed, but I don't see any other way to compute the value once and reuse it as needed. Since it refers to the value from a top level table, I can't compute it once outside the query and just insert the value into the query, either through a MySQL variable or by monkeying around with the query string. Or, so I think, anyway.
Is there anything I can do about this? Or, maybe it's a non-issue from a performance perspective: the code might be nasty, but parhaps MySQL is smart enough to cache the value itself and avoid executing the query over and over again? Any advice? Thanks!
You should be able to alias the result by doing SELECT ... AS alias, and then using in alias in the other queries, since the SELECT is really just a table.

php and MySQL: 2 requests or 1 request?

I'm building a wepage in php using MySQL as my database.
Which way is faster?
2 requests to MySQL with the folling query.
SELECT points FROM data;
SELECT sum(points) FROM data;
1 request to MySQL. Hold the result in a temporary array and calcuale the sum in php.
$data = SELECT points FROM data;
EDIT -- the data is about 200-500 rows
It's really going to depend on a lot of different factors. I would recommend trying both methods and seeing which one is faster.
Since Phill and Kibbee have answered this pretty effectively, I'd like to point out that premature optimization is a Bad Thing (TM). Write what's simplest for you and profile, profile, profile.
How much data are we talking about? I'd say MySQL is probably faster at doing those kind of operations in the majority of cases.
Edit: with the kind of data that you're talking about, it probably won't make masses of difference. But databases tend to be optimised for those kind of queries, whereas PHP isn't. I think the second DB query is probably worth it.
If you want to do it in one line, use a running total like this:
SET #total=0;
SELECT points, #total:=#total+points AS RunningTotal FROM data;
I wouldn't worry about it until I had an issue with performance.
If you go with two separate queries, you need to watch out for the possibility of the data changing between getting the rows & getting their sum. Until there's an observable performance problem, I'd stick to doing my own summation to keep the page consistent.
The general rule of thumb for efficiency with mySQL is to try to minimize the number of SQL requests. Every call to the database adds overhead and is "expensive" in terms of time required.
The optimization done by mySQL is quite good. It can take very complex requests with many joins, nestings and computations, and make it run efficiently.
But it can only optimize individual requests. It cannot check the relationship between two different SQL statements and optimize between them.
In your example 1, the two statements will make two requests to the database and the table will be scanned twice.
Your example 2 where you save the result and compute the sum yourself would be faster than 1. This would only be one database call, and looping through the data in PHP to get the sum is faster than a second call to the database.
Just for the fun of it.
SELECT COUNT(points) FROM `data`
UNION
SELECT points FROM `data`
The first row will be the total, the next rows will be the data.
NOTE: Union can be slow, but its an option.
Could also do more fun and this supports you sorting the rows.
SELECT 'total' AS name, COUNT(points) FROM `data`
UNION
SELECT 'points' AS name, points FROM `data`
Then selecting through PHP
while($row = mysql_fetch_assoc($query))
{
if($row["data"] == "points")
{
echo $row["points"];
}
if($row["data"] == "total")
{
echo "Total is: ".$row["points"];
}
}
You can use union like this:
(select points, null as total from data) union (select null, sum(points) from data group by points);
The result will look something like this:
point total
2 null
5 null
...
null 7
you can figure out how to handle it.
do it the mySQL way. let the database manager do its work.
mySQL is optimized for such tasks

Optimizing a PHP page: MySQL bottleneck

I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.
I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).
The queries take the form of the following with the tag_id changing on each of the 25 different calls:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.
My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.
Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):
1 PRIMARY tbl_news ALL NULL NULL NULL NULL 1318 Using where
2 DEPENDENT SUBQUERY tbl_tag_relations ref FK_tbl_tag_tags_1 FK_tbl_tag_tags_1 4 const 179 Using where
I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
and
SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20
as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.
By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.
So I would add indexes on the following:
tbl_news (news_id)
tbl_tag_relations (news_id)
tbl_tag_relations (tag_id)
and the query will execute almost instantaneously.
Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.
The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:
SELECT *
FROM tbl_news, tbl_tag_relations
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id
To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:
SELECT news_title, news_body
...
LIMIT 5;
You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.
Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)
If I understand correctly, this is just listing the news stories for a specific set of tags.
First of all, you really shouldn't
ever SELECT *
Second, this can probably be
accomplished within a single query,
thus reducing the overhead cost of
multiple queries. It seems like it
is getting fairly trivial data so
it could be retrieved within a
single call instead of 20.
A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
Your tbl_tag_relations should definitely have an index on tag_id
select *
from tbl_news, tbl_tag_relations
where
tbl_tag_relations.tag_id = 20 and
tbl_news.news_id = tbl_tag_relations.news_id
limit 20
I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.
Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.
For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:
SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)
would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.
Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.

Categories