Task: get display 10 objects except 1 specific
Solutions:
get 11 objects from DB, and do something like this
foreach ($products as $product) {
if($product->getId() != $specificProduct->getId()){
//display
}
}
just add condition in sql query WHERE p.id != :specific_product_id
Some additional information: we use doctrine2 with mysql, so we must expect some additional time by hydration. I have made some test, I timed both of this solutions but I still haven't any idea which way is better.
So, I have gotten some strange results by my test(get 100 queries with different parameters)
php = 0.19614
dql = 0.16745
php = 0.13542
dql = 0.15531
Maybe someone have advice about how I should have made my test better
If you're concerned about the overhead of hydration, keep in mind that doing the != condition in PHP code means you have to fetch thousands of irrelevant, non-matching rows from the database, hydrate them, and then discard them.
Just the wasted network bandwidth for fetching all those irrelevant rows is costly -- even more so if you do that query hundreds of times per second as many applications do.
It is generally far better to eliminate the unwanted rows using SQL expressions, supported by indexes on the referenced tables.
Use SQL as much as possible. THis is the basic query, you could use. This is much more effecient than discarding the rows in PHP.
$query = "SELECT * FROM table_name WHERE p.id <> '$specific_id'"
I think such as queries (like id based, or well indexed) must be on sql side... cause it uses indexes, and returns less data to your application. processing less data makes your applications runs faster.
Related
In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.
Is there any advantages to having nested queries instead of separating them?
I'm using PHP to frequently query from MySQL and would like to separate them for better organization. For example:
Is:
$query = "SELECT words.unique_attribute
FROM words
LEFT JOIN adjectives ON adjectives.word_id = words.id
WHERE adjectives = 'confused'";
return $con->query($query);
Faster/Better than saying:
$query = "SELECT word_id
FROM adjectives
WHERE adjectives = 'confused';";
$id = getID($con->query($query));
$query = "SELECT unique_attribute
FROM words
WHERE id = $id;";
return $con->query($query);
The second option would give me a way to make a select function, where I wouldn't have to repeat so much query string code, but if making so many additional calls(these can get very deeply nested) will be very bad for performance, I might keep it. Or at least look out for it.
Like most questions containing 'faster' or 'better', it's a trade-off and it depends on which part you want to speed up and what your definition of 'better' is.
Compared with the two separate queries, the combined query has the advantages of:
speed: you only need to send one query to the database system, the database only needs to parse one query string, only needs to compose one query plan, only needs to push one result back up and through the connection to PHP. The difference (when not executing these queries thousands of times) is very minimal, however.
atomicity: the query in two parts may deliver a different result from the combined query if the words table changes between the first and second query (although in this specific example this is probably not a constantly-changing table...)
At the same time the combined query also has the disadvantage of (as you already imply):
re-usability: the split queries might come in handy when you can re-use the first one and replace the second one with something that selects a different column from the words table or something from another table entirely. This disadvantage can be mitigated by using something like a query builder (not to be confused with an ORM!) to dynamically compose your queries, adding where clauses and joins as needed. For an example of a query builder, check out Zend\Db\Sql.
locking: depending on the storage engine and storage engine version you are using, tables might get locked. Most select statements do not lock tables however, and the InnoDB engine definitely doesn't. Nevertheless, if you are working with an old version of MySQL on the MyISAM storage engine and your tables are under heavy load, this may be a factor. Note that even if the combined statement locks the table, the combined query will offer faster average completion time because it is faster in total while the split queries will offer faster initial response (to the first query) while still needing a higher total time (due to the extra round trips et cetera).
It would depend on the size of those tables and where you want to place the load. If those tables are large and seeing a lot of activity, then the second version with two separate queries would minimise the lock time you might see as a result of the join. However if you've got a beefy db server with fast SSD storage, you'd be best off avoiding the overhead of dipping into the database twice.
All things being equal I'd probably go with the former - it's a database problem so it should be resolved there. I imagine those tables wouldn't be written to particularly often so I'd ensure there's plenty of MySQL cache available and keep an eye on the slow query log.
I'm having an inner debate at my company about looping queries in this matter:
$sql = "
SELECT foreign_key
FROM t1";
foreach(fetchAll($sql) as $row)
{
$sub_sql = "
SELECT *
FROM t2
WHERE t2.id = " . $row['foreign_key'];
foreach(fetchAll($sub_sql) as $sub_row)
{
// ...
}
}
Instead of using an sql join like this:
$sql = "
SELECT t2.*
FROM t2
JOIN t1
ON t1.foreign_key = t2.id";
foreach(fetchAll($sql) as $row)
{
// ...
}
Additional information about this, the database is huge, millions of rows.
I have of course searched an answer to this question, but nobody can answer this in a a good way and with a lot of up votes that makes me certain that one way is better then the other.
Question
Can somebody explain to me why one of thees methods is better then the other one?
The join method is generally considered better, if only because it reduces the overhead of sending queries back and forth to the database.
If you have appropriate indexes on the tables, then the underlying performance of the two methods will be similar. That is, both methods will use appropriate indexes to fetch the results.
From a database perspective, the join method is far superior. It consolidates the data logic in one place, making the code more transparent. It also allows the database to make optimizations that might not be apparent in application code.
Because of driver overhead, a loop is far less efficient
This is similar to another question I answered, but different enough not to cv. My full answer is here but I'll summarize the main points:
Whenever you make a connection to a database, there are three steps taken:
A connection to the database is established.
A query, or multiple queries, to the database is executed.
Data is returned for processing.
Using a loop structure, you will end up generating additional overhead with driver requests, where you will have a request and a return per loop cycle rather than a single request and single return. Even if the looped queries do not take any longer than the single large query (this is very unlikely as MySQL internals have a lot of shortcuts built in to prevent using a full repetitive loop), you will still find that the single query is faster on driver overhead.
Using a loop without TRANSACTIONS, you will also find that you run into relational data integrity issues where other operations affect the data you're iterating between loop cycles. Using transactions, again, increases overhead because the database has to maintain two persistent states.
Is there an appreciable performance difference between having one SELECT foo, bar, FROM users query that returns 500 rows, and 500 SELECT foo, bar, FROM users WHERE id = x queries coming all at once?
In a PHP application I'm writing, I'm trying to choose between a writing clear, readable section of code that would produce about 500 SELECT statements; or writing a it in an obscure, complex way that would use only one SELECT that returns 500 rows.
I would prefer the way that uses clear, maintainable code, but I'm concerned that the connection overhead for each of the SELECTs will cause performance problems.
Background info, in case it's relevant:
1) This is a Drupal module, coded in PHP
2) The tables in question get very few INSERTs and UPDATEs, and are rarely locked
3) SQL JOINs aren't possible for reasons not relevant to the question
Thanks!
It's almost always faster to do one big batch SELECT and parse the results in your application code than doing a massive amount of SELECTs for one row. I would recommend that you implement both and profile them, though. Always strive to minimize the number of assumptions you have to make.
I would not worry about the connection overhead of mysql queries too much, especially if you are not closing the connection between every query. Consider that if your query creates a temporary table, you've already spent more time in the query than the overhead of the query took.
I love doing a complex SQL query, personally, but I have found that the size of the tables, mysql query cache and query performance of queries that need to do range checking (even against an index) all make a difference.
I suggest this:
1) Establish the simple, correct baseline. I suspect this is the zillion-query approach. This is not wrong, and very likely helfully correct. Run it a few times and watch your query cache and application performance. The ability to keep your app maintainable is very important, especially if you work with other code maintainers. Also, if you're querying really large tables, small queries will maintain scalability.
2) Code the complex query. Compare the results for accuracy, and then the time. Then use EXPECT on the query to see what the rows scanned are. I have often found that if I have a JOIN, or a WHERE x != y, or a condition that creates a temporary table, the query performance could get pretty bad, especially if I'm in a table that's always getting updated. However, I've also found that a complex query might not be correct, and also that a complex query can more easily break as an application grows. Complex queries typically scan larger sets of rows, often creating temporary tables and invoke using where scans. The larger the table, the more expensive these get. Also, you might have team considerations where complex queries don't suit your team's strengths.
3) Share the results with your team.
Complex queries are less likely to hit the mysql query cache, and if they are large enough, don't cache them. (You want to save the mysql query cache for frequently hit queries.) Also, query where predicates that have to scan the index will not do as well. (x != y, x > y, x < y). Queries like SELECT foo, bar FROM users WHERE foo != 'g' and mumble < '360' end up doing scans. (The cost of query overhead could be negligible in that case.)
Small queries can often complete without creating temporary tables just by getting all values from the index, so long as the fields you're selecting and predicating on are indexed. So the query performance of SELECT foo, bar FROM users WHERE id = x is really great (esp if columns foo and bar are indexed like, aka alter table users add index ix_a ( foo, bar );.)
Other good ways to increase performance in your application would be to cache those small query results in the application (if appropriate), or doing batch jobs of a materialized view query. Also, consider memcached or some features found in XCache.
It seems like you know what the 500 id values are, so why not do something like this:
// Assuming you have already validated that this array contains only integers
// so there is not risk of SQl injection
$ids = join(',' $arrayOfIds);
$sql = "SELECT `foo`, `bar` FROM `users` WHERE `id` IN ($ids)";
I need to pull several rows from a table and process them in two ways:
aggregated on a key
row-by-row, sorted by the same key
The table looks roughly like this:
table (
key,
string_data,
numeric_data
)
So I'm looking at two approaches to the function I'm writing.
The first would pull the aggregate data with one query, and then query again inside a loop for each set of row-by-row data (the following is PHP-like pseudocode):
$rows = query(
"SELECT key,SUM(numeric_data)
FROM table
GROUP BY key"
);
foreach ($rows as $row) {
<process aggregate data in $row>
$key = $row['key'];
$row_by_row_data = handle_individual_rows($key);
}
function handle_individual_rows($key)
{
$rows = query(
"SELECT string_data
FROM table WHERE key=?",
$key
);
<process $rows one row at a time>
return $processed_data;
}
Or, I could do one big query and let the code do all the work:
$rows = query(
"SELECT key, string_data, numeric_data
FROM table"
);
foreach ($rows as $row) {
<process rows individually and calculate aggregates as I go>
}
Performance is not a practical concern in this application; I'm just looking to write sensible and maintainable code.
I like the first option because it's more modular -- and I like the second option because it seems structurally simple. Is one option better than the other or is it really just a matter of style?
One SQL query, for sure.
This will
Save you lots of roundtrips to database
Allow to use more efficient GROUP BY methods
Since your aggregates may be performed equally well by the database, it will also be better for mainainability: you have all your resultset logic in one place.
Here is an example of a query that returns every row and calculates a SUM:
SELECT string_data, numeric_data, SUM(numeric_data) OVER (PARTITION BY key)
FROM table
Note that this will most probably use parallel access to calculate SUM's for different key's, which is hardly implementable in PHP.
Same query in MySQL:
SELECT key, string_data, numeric_data,
(
SELECT SUM(numeric_data)
FROM table ti
WHERE ti.key = to.key
) AS key_sum
FROM table to
If performance isn't a concern, I'd go with the second. Seems the tiniest bit friendlier.
If performance were a concern, my answer would be "don't think, profile". :)
The second answer is by far more clear, sensible and maintainable. You're saying the same thing with less code, which is usually better.
And I know you said performance is not a concern, but why fetch data more than you have to?
I can't be certain from the example here, but I'd like to know if there's a chance to do the aggregation and other processing right in the SQL query itself. In this case, you'd have to evaluate "more maintainable" with respect to your relative comfort level expressing that processing in SQL code vs. PHP code.
Is there something about the additional processing you need to do on each row that would prevent you from expressing everything in the SQL query itself?
I don't think you'll find many situations at all where doing a query-per-iteration of a loop is the better choice. In fact, I'd say it's probably a good rule of thumb to never do that.
In other words, the fewer round trips to the database, the better.
Depending on your data and actual tables, you might be able to let SQL do the aggregation work and select all the rows you need with one query.
one sql query is probably a better idea.
It avoids you having to re-write relational operations
I think somehow you've answered your own question, because you say you have two different processings : one aggregation and one row by row.
if you want to keep everything readable and maintainable, mixing both in a single query doesn't sound right, the query will answer two different needs so it won't be very readable
even if perf is not an issue, it's faster to do the aggregation on the DB server instead of doing it in code
with only one query, the code that will handle the result will mix two processings, handling rows and computing aggregations in the same time, so in time this code will tend to get confusing and buggy
the same code might evolve over time, for instance the row-by-row can get complex and could create bugs in the aggregation part or the other way around
if in the future you'll need to split these two treatments, it will be harder to disentangle the code that at that moment, somebody else has written ages ago...
Performance considerations aside, in terms of maintainability and readability I'd recommend to use two queries.
But keep in mind that the performance factor might not be an issue at the moment, but it can be in time once the db volume grows or whatever, it's never a negligible factor on long term ...
Even if perf is not an issue, your mind is. When a musician practices every movement is intended to improve the musician's skill. As a developer, you should develop every procedure to improve your skill. iterative loops though data is sloppy and ugly. SQL queries are elegant. Do you want to develop more elegant code or more sloppy code?