This is a really broad question, but I have come across it a couple of times in the last few weeks and I was wondering what the general consensus is regarding good practice and efficiency.
1)
SELECT COUNT(*) FROM table WHERE id='$id', name='$name', owner='$owner_id'
and then based on if there is one result then the record matches.
2)
SELECT * FROM table WHERE id='$id'
and then a series of if commands to check the results match.
Now obviously there are advantages to the second solution as it allows for accurate error reports as to the field that does not match... but if that is not required which is more efficient, considered better practice and is there a difference to the load on the mySQL server between the two?
Option 1 by a long shot. Let SQL do what it is designed to do best, and better than procedural code. That is, filtering and sorting data.
Also, it is a much more efficient use of resources (bandwidth, DB utilization, etc) to pull down only the data you need from the server.
Use 1). Mysql is very efficient in selecting data based on certain conditions.
Large query can take .1 to 5.1 or more seconds, you need to find, run it and find it. Usually multiple if are way better as PHP is very fast. I did that when I was using it with 5 joins in table with 5 billion products, then I reduce one join and then use if statement to fix it up. Query was taking 4.2 seconds, when I reduced join, it took 3.8s but as you know PHP is way faster.
Related
I can think of a couple ways to count the number of rows in a table with Laravel (version 3).
DB::table('threads')->count();
Threads::count();
Threads::max('id');
DB::table('threads')->max('id);
DB::query('SELECT COUNT(*) FROM threads;');
Are any of these notably faster than the others? Is there any one fastest way to run this query? Later on it's going to be part of an expression: ceil(DB::table('threads')->count() / $threads_per_page); and it's executed on every page load so it's good to be optimized.
Database/table is MySQL and the InnoDB engine.
MAX(ID) is not the same as counting rows, so that rules out two of five alternatives.
And then it is your task to actually do a performance comparison between the remaining three methods to get the count. I'd think that actually executing an SQL statement directly might remove plenty of unnecessary ORM-layer overhead and be actually faster, but this would be premature optimization unless proven by facts.
DB::table('threads')->count();
Threads::count();
DB::query('SELECT COUNT(*) FROM threads;');
I was looking for the same thing.
These 3 results are exactly the same query I tested it (You can watch this with laravel debugbar).
Laravel perform "SELECT COUNT(*) as aggregate FROM threads";
It's already optimised with eloquent, but if you do ->get()->count() it's not optimised !
No performance difference with Threads::count();
Max('id') is totally different as it output the max id, it will never count the number of rows.
i dont think that it really that matter.. just be consistent in your code..
any way there is no need to run that query on evey page load.. use some caching to cache that number..
Is there any advantages to having nested queries instead of separating them?
I'm using PHP to frequently query from MySQL and would like to separate them for better organization. For example:
Is:
$query = "SELECT words.unique_attribute
FROM words
LEFT JOIN adjectives ON adjectives.word_id = words.id
WHERE adjectives = 'confused'";
return $con->query($query);
Faster/Better than saying:
$query = "SELECT word_id
FROM adjectives
WHERE adjectives = 'confused';";
$id = getID($con->query($query));
$query = "SELECT unique_attribute
FROM words
WHERE id = $id;";
return $con->query($query);
The second option would give me a way to make a select function, where I wouldn't have to repeat so much query string code, but if making so many additional calls(these can get very deeply nested) will be very bad for performance, I might keep it. Or at least look out for it.
Like most questions containing 'faster' or 'better', it's a trade-off and it depends on which part you want to speed up and what your definition of 'better' is.
Compared with the two separate queries, the combined query has the advantages of:
speed: you only need to send one query to the database system, the database only needs to parse one query string, only needs to compose one query plan, only needs to push one result back up and through the connection to PHP. The difference (when not executing these queries thousands of times) is very minimal, however.
atomicity: the query in two parts may deliver a different result from the combined query if the words table changes between the first and second query (although in this specific example this is probably not a constantly-changing table...)
At the same time the combined query also has the disadvantage of (as you already imply):
re-usability: the split queries might come in handy when you can re-use the first one and replace the second one with something that selects a different column from the words table or something from another table entirely. This disadvantage can be mitigated by using something like a query builder (not to be confused with an ORM!) to dynamically compose your queries, adding where clauses and joins as needed. For an example of a query builder, check out Zend\Db\Sql.
locking: depending on the storage engine and storage engine version you are using, tables might get locked. Most select statements do not lock tables however, and the InnoDB engine definitely doesn't. Nevertheless, if you are working with an old version of MySQL on the MyISAM storage engine and your tables are under heavy load, this may be a factor. Note that even if the combined statement locks the table, the combined query will offer faster average completion time because it is faster in total while the split queries will offer faster initial response (to the first query) while still needing a higher total time (due to the extra round trips et cetera).
It would depend on the size of those tables and where you want to place the load. If those tables are large and seeing a lot of activity, then the second version with two separate queries would minimise the lock time you might see as a result of the join. However if you've got a beefy db server with fast SSD storage, you'd be best off avoiding the overhead of dipping into the database twice.
All things being equal I'd probably go with the former - it's a database problem so it should be resolved there. I imagine those tables wouldn't be written to particularly often so I'd ensure there's plenty of MySQL cache available and keep an eye on the slow query log.
Excuse me if this has been asked before, but I tried looking for something similar but couldn't find anything.
I have three tables: users, hobbies and user_hobbies (linking the first two). I want to calculate the similarity betweet two users based on their hobbies. For this, I need, first of all, two sets: User A hobbies and user B hobbies, which I can acquire with two simple queries. I have to calculate these two sets for other reasons too, in a php file, so they are available to me, in two arrays, for the next step:
I have to calculate their common hobbies (i.e. the intersection of the sets).
Idea #1: Having two arrays, I can calculate through some method the common elements.
Idea #2: I can make a third query (e.g. SELECT hobby FROM user_hobbies WHERE user_id IN ('uid_A', 'uid_B') GROUP BY hobby HAVING COUNT (*) = 2) and not bother myself.
I suppose my question is about performance. Is it quicker to calculate manually or are mysql queries much faster?
You already have a normalized table to hold the user-hobbies table, so why not go with that?
Generally speaking, SQL will be much faster, at least for the first 100k records or so. Then you'll see a performance drop on queries that vet through columns that aren't indexed, or from queries that use the 'filesort' to order large datasets brought on by the ORDER BY keyword.
For scalability, I recommend using an inner join to narrow down the possibilities for starters.
Think critically about this. Are there any other columns not mentioned could indicate that the user could have more than one hobby? These are the things you consider when looking to scale your application.
Otherwise, you should be fine for starters, lest you should be optimizing prematurely.
I would go with Option #2.
In short: If your operations is NOT a set base operation it is better to be shifted out of the MsSql or any RDBMS.
Because, you can not scale MsSQL easily.
I have a search engine on a shared host that uses MySQL. This search engine potentially has millions/trillions etc of records.
Each time a search is performed I return a count of the records that can then be used for pagination purposes.
The count tells you how many results there are in regard to the search performed. MySQL count is I believe considered quite slow.
Order of search queries:
Search executed and results returned
Count query executed
I don't perform a PHP count as this will be far slower in larger data sets.
Question is, do I need to worry about MySQL "count" and at what stage should I worry about it. How do the big search engines perform this task?
In almost all cases the answer is indexing. The larger your database gets the more important it is to have a well designed and optimized indexing strategy.
The importance of indexing on a large database can not be overstated.
You are absolutely right about not looping in code to count DB records. Your RDBMS is optimized for operations like that, your programming language is no. Wherever possible you want to do any sorting, grouping, counting, filtering operations within the SQL language provided by your RDBMS.
As for efficiently getting the count on a "paginated" query that uses a LIMIT clause, check out SQL_CALC_FOUND_ROWS.
SQL_CALC_FOUND_ROWS tells MySQL to calculate how many rows there would
be in the result set, disregarding any LIMIT clause. The number of
rows can then be retrieved with SELECT FOUND_ROWS(). See Section
11.13, “Information Functions”.
If MySQL database reaches several millions of records, that's a sign you'll be forced to stop using monolithic data store - meaning you'll have to split reads, writes and most likely use a different storage engine than the default one.
Once that happens, you'll stop using the actual count of the rows and you'll start using the estimate, cache the search results and so on in order to alleviate the work on the database. Even Google uses caching and displays an estimate of number of records.
Anyway, for now, you've got 2 options:
1 - Run 2 queries, one to retrieve the data and the other one where you use COUNT() to get the number of rows.
2 - Use SQL_CALC_FOUND_ROWS like #JohnFX suggested.
Percona has an article about what's faster, tho it might be outdated now.
The biggest problem you're facing is the way MySQL uses LIMIT OFFSET, which means you probably won't like your users using large offset numbers.
In case you indeed get millions of records - I don't forsee a bright future for your MySQL monolithic storage on a shared server. However, good luck to you and your project.
If I understand what you are trying to do properly, you can execute the one query, and perform the mysql_num_rows() function on the result in PHP... that should be pretty zippy.
http://php.net/manual/en/function.mysql-num-rows.php
Since you're using PHP, you could use the mysql_num_rows method to tell you the count after the query is done. See here: http://www.php.net/manual/en/function.mysql-num-rows.php
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc.
I am currently porting an application written in MySQL3 and PHP4 to MySQL5 and PHP5.
On analysis I found several SQL queries which uses "select * from tablename" even if only one column(field) is processed in PHP. The table has almost 60 columns and it has a primary key. In most cases, the only column used is id which is the primary key.
Will there be any performance boost if I use queries in which the column names are explicitly mentioned instead of * ? (In this application there is only one method which we need all the columns and all other methods return only a subset of the columns)
It is generally considered good practise to only fetch what is needed. Especially if the database server is not on the same machine, fetching an entire row will result in slower queries, because there is more data to transport over the network to the consuming machine. So if a full row is like 100k of data and you only need the ID which is much less, you will get faster results of course.
As a general tip for optimizing queries, use the EXPLAIN statement to see how costly a query will be.
"Premature optimization is root of the all evil". Donald Knuth.
Never ask a question like Will there be any performance boost?. But ask only a question like "I have certain bottleneck. How can I eliminate it?"
In 99% of our applications, this "improvement" would be irrlelvant. As many other improvements, based on the dreams, not on the profiling and real needs.
Will there be any performance boost if
I use queries in which the column
names are explicitly mentioned instead
of * ? - YES
If and how much you benefit depends on the case, but at least for the cases when you only need the id column, you should fix the SQL.
In addition to the reduced network traffic (of sending useless data), the database may be able to get to the few columns you do need just using indexes, without accessing the table at all. That would speed things up a lot.
The only possible downside is the increased number of distinct SQL statements that the server has to process (and more complex code on your end).
No - there will be an impact on performance but as long as there aren't BLOBs/CLOBs in the schema it will be negligible (unless you access your database over a 300 baud modem) - most of the work done by the database is in identifying the rows matching the WHERE clause - however its (IMHO) bad programming practice to use SELECT *
C.
Yes. Fetch only the columns you require. Not only can this improve performance, but it will prevent your code from inadvertently breaking. Consider this query:
SELECT *
FROM tabA JOIN tabB on ...
ORDER BY colX
They query works today when only tabA has colX, but if you change schema and add colX to tabB, the query will abend.
Of course using table aliases for all fields will also help prevent breakage.
-Krip
Yes. If you're fetching more data than you need, that has to be read from disk, transferred between MySQL and PHP, etc. which is probably going to take longer.