Better performance, JOIN or multiple queries? - php

Whats the best way to achieve best performance for getting data from multiple table?
I have these following tables
Applicants ( 50,101 rows )
-id
-first_name
-email
Phones ( 50,151 rows )
-id
-number
-model_id
-model_type
Address (100,263 rows)
-id
-state
-model_id
-model_type
Business (26 rows)
-id
-company
-model_id
-model_type
My desired result
id | first_name | email | number | company | state
----+------------+-------+--------+---------+------
1 | test | - | - | - | -
Im using SQLyog to perform this query below and its very slow, I have thousands of data on these tables
SELECT `app`.`id`,`app`.`first_name`, `app`.`email`, `p`.`number`, `b`.`company`, `add`.`state`
FROM `applicants` AS `app`
LEFT JOIN phones AS `p` ON `app`.`id` = `p`.`model_id`
AND `p`.`model_type` = 'App\\Models\\Applicant'
LEFT JOIN `businesses` AS `b` ON `app`.`id` = `b`.`model_id`
AND `b`.`model_type` = 'App\\Models\\Applicant'
LEFT JOIN `addresses` AS `add` ON `app`.`id` = `add`.`model_id`
AND `b`.`model_type` = 'App\\Models\\Applicant'
LIMIT 10
summary, takes 25.794 to finish
Execution Time : 25.792 sec
Transfer Time : 0.001 sec
Total Time : 25.794 sec
What would be the best way to achieve my goal? like should a perform a separate multiple query for each
phone, business and address? though Im not sure how to achieve my desired result with multiple query

It really depends on the specific situation what will be faster. Also you can probably optimize your situation by creating the proper indexes. For example if you query a lot by model_id and model_type, you could create an index on either, or both of the fields.
I would suggest running the query with joins with an EXPLAIN in front, i.e. EXPLAIN SELECT {your query}. That will give you some insights on how MySQL executes your query. Then you can try the same with separate queries. Then add indexes and see if they are used. Then choose the best performing solution.
About indexes:
Introduction: https://www.mysqltutorial.org/mysql-index/mysql-create-index/
More in-depth: https://use-the-index-luke.com/

From my understanding, there is no general answer on this question.
One of the possible solutions is denormalisation.
Creating extra table with all the data you need periodically.
It helps a lot in some cases but, unfortunately, just not possible to do this way in other cases.

Here are the things to think about in this case. Arguments for a single query:
Databases are designed to handle complicated queries, so the JOINs are probably faster in the database.
Each query incurs overhead for moving the query into the database and data out of the database.
In favor of a multiple queries:
Query optimizers are not perfect, so they might come up with the wrong plan.
Returning data as a single result set often requires a "wide" format for the data with many repeated columns. Returning more data is slower.
In general, the balance is on the single query, but that is not always true. For instance, if the database is fast but the bandwidth to the application is slow, the last bullet may consistently be the dominating factor.
I am surprised that your query takes so much time. You don't have an ORDER BY or GROUP BY so the time to the first result should be pretty fast. You might be able to have it run faster by simply doing a subselect on app:
FROM (SELECT app.*
FROM `applicants` `app`
LIMIT 10
) app . . .

Joining tables is taking all of the data from each table and combining it into one. It is expected to have a slower loading time because of this. If you are only gathering specific data from each table then running seperate queries might be the better idea. It depends on how your application is laid out and how you are using this information.

Related

MYSQL Query optimization, comparing 3 tables w/ thousands of records

i have this query:
SELECT L.sku,L.desc1,M.map,T.retail FROM listing L INNER JOIN moto M ON L.sku=M.sku INNER JOIN truck T ON L.sku=T.sku LIMIT 5;
Each table (listing,moto,truck) has ~300.000 rows, and just for testing purppose i've set a LIMIT of 5 results, at the end i will need hundreds but let see...
That query takes like 3:26 minutes in Console...i wont imagine how much it will take with PHP...i need to handle it there
Any advice/solution to Optmize the query? Thanks!
Two things to recommend here:
Indexes
Denormalization
One thing people tend to do when databases get massive is invoke Denormalization. This is when you store the data from multiple tables in one table to prevent the need to do a join. This is useful if your application relies on specific reads to power it. It is a commonly used tactic when scaling.
If Denormalization is out of the question, another, simpler way to optimize this query would be to make sure you have indexes on the columns you are running the join against. So the columns L.sku, m.sku,T.sku would need to be indexed, you will immediately notice an increase in performance.
Any other optimizations I would need some more information about the data, hope it helps!

How to approach multi-million data selection

I have a table that stores specific updates for all customers.
Some sample table:
record_id | customer_id | unit_id | time_stamp | data1 | data2 | data3 | data4 | more
When I created the application, I did not realize how much this table would grow -- currently I have over 10mil records within 1 month. I am facing issues, when php stops executing due to amount of time it takes. Some queries produce top-1 results, based on the time_stamp + customer_id + unit_id
How would you suggest handling this type of issues? For example, I can create new table for each customer, although I think it does not a good solution.
I am stuck with no good solution in mind.
If you're on the cloud (where you're charged for moving data between server and db), ignore.
Move all logic to the server
The fastest query is a SELECT WHEREing the PRIMARY. It won't matter how large your database is, it will come back just as fast with a table of 1 row (as long as your hardware isn't unbalanced).
I can't tell exactly what you're doing with your query, but first download all of the sorting and limiting data into PHP. Once you've got what you need, SELECT the data directly WHEREing on record_id (I assume that's your PRIMARY).
It looks like your on demand data is pretty computationally intensive and huge, so I recommend using a faster language. http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/
Also, when you start sorting and limiting on the server rather than the db, you can start identifying shortcuts to speed it up even further.
This is what the server's for.
I suggest you use partitioning of your data following some criteria.
You can make horizontal or vertical partition of your data.
For example group your customer_id in 10 partitions, using his id module 10.
So, customer_id terminated in 0 goes to partition 0, with ended in 1 goes to partition 1
MySQL can make this for you easily.
What is the count of records within the tables? Often, with relational databases, it's not how much data you have (millions are nothing to relational databases), it's how you're retrieving it.
From the look of your select, in fact, you probably just need to optimize the statement itself and avoid the multiple subselects, which is probably the main cause of the slowdown. Try running an explain on that statement, or just get the ids and run the interior select individually on the ids of the records that you've actually found & retrieved in the first run.
Just the fact that you have those subselects within your overall statement means that you haven't optimized that far into the process anyway. For example, you could be running a nightly or hourly cron job that aggregates into a new table the sets like the one created by SELECT gps_unit.idgps_unit, and then you can run your selects against a previously generated table instead of creating blocks of data that are equivalent of a table on the fly.
If you find yourself unable to effectively optimize that select statement, you have "final" options like:
Categorize via some criteria and split into different tables.
Keep a deep archive, such that anything past the first year or so is migrated to a less used table and requires special retrieval.
Finally, if you have so much small data, you may be able to completely archive certain tables and keep them around in file form only and then truncate past a certain date. Often with web tracking data that isn't that important and is kinda spammy, I end up doing this after a few years, when the data is really not going to do anyone any good any more.

Which of the following SQL queries would be faster? A join on two tables or successive queries?

I have two tables here:
ITEMS
ID| DETAILS| .....| OWNER
USERS:
ID| NAME|....
Where ITEMS.OWNER = USERS.ID
I'm listing the items out with their respective owners names. For this I could use a join on both tables or I could select all the ITEMS and loop through them making a sql query to retrieve the tuple of that itmes owner. Thats like:
1 sql with a JOIN
versus
1x20 single table sql queries
Which would be a better apporach to take in terms of speed?
Thanks
Of course a JOIN will be faster.
Making 20 queries will imply:
Parsing them 20 times
Making 20 index seeks to find the start of the index range on items
Returning 20 recordsets (each with its own metadata).
Every query has overhead. If you can do something with one query, it's (almost) always better to do it with one query. And most database engines are smarter than you. Even if it's better to split a query in some way, the database will find out himself.
An example of overhead: if you perform 100 queries, there will be a lot more traffic between your application and your webserver.
In general, if you really want to know something about performance, benchmark the various approaches, measure the parameters you're interested in and make a decision based on the results of the becnhmark.
Good luck!
Executing a join will be much quicker as well as a better practice
I join would be a lot quicker than performing another query on the child table for each record in the parent table.
You can also enable performance data in SQL to see the results for yourself..
http://wraithnath.blogspot.com/2011/01/getting-performance-data-from-sql.html
N

php and MySQL: 2 requests or 1 request?

I'm building a wepage in php using MySQL as my database.
Which way is faster?
2 requests to MySQL with the folling query.
SELECT points FROM data;
SELECT sum(points) FROM data;
1 request to MySQL. Hold the result in a temporary array and calcuale the sum in php.
$data = SELECT points FROM data;
EDIT -- the data is about 200-500 rows
It's really going to depend on a lot of different factors. I would recommend trying both methods and seeing which one is faster.
Since Phill and Kibbee have answered this pretty effectively, I'd like to point out that premature optimization is a Bad Thing (TM). Write what's simplest for you and profile, profile, profile.
How much data are we talking about? I'd say MySQL is probably faster at doing those kind of operations in the majority of cases.
Edit: with the kind of data that you're talking about, it probably won't make masses of difference. But databases tend to be optimised for those kind of queries, whereas PHP isn't. I think the second DB query is probably worth it.
If you want to do it in one line, use a running total like this:
SET #total=0;
SELECT points, #total:=#total+points AS RunningTotal FROM data;
I wouldn't worry about it until I had an issue with performance.
If you go with two separate queries, you need to watch out for the possibility of the data changing between getting the rows & getting their sum. Until there's an observable performance problem, I'd stick to doing my own summation to keep the page consistent.
The general rule of thumb for efficiency with mySQL is to try to minimize the number of SQL requests. Every call to the database adds overhead and is "expensive" in terms of time required.
The optimization done by mySQL is quite good. It can take very complex requests with many joins, nestings and computations, and make it run efficiently.
But it can only optimize individual requests. It cannot check the relationship between two different SQL statements and optimize between them.
In your example 1, the two statements will make two requests to the database and the table will be scanned twice.
Your example 2 where you save the result and compute the sum yourself would be faster than 1. This would only be one database call, and looping through the data in PHP to get the sum is faster than a second call to the database.
Just for the fun of it.
SELECT COUNT(points) FROM `data`
UNION
SELECT points FROM `data`
The first row will be the total, the next rows will be the data.
NOTE: Union can be slow, but its an option.
Could also do more fun and this supports you sorting the rows.
SELECT 'total' AS name, COUNT(points) FROM `data`
UNION
SELECT 'points' AS name, points FROM `data`
Then selecting through PHP
while($row = mysql_fetch_assoc($query))
{
if($row["data"] == "points")
{
echo $row["points"];
}
if($row["data"] == "total")
{
echo "Total is: ".$row["points"];
}
}
You can use union like this:
(select points, null as total from data) union (select null, sum(points) from data group by points);
The result will look something like this:
point total
2 null
5 null
...
null 7
you can figure out how to handle it.
do it the mySQL way. let the database manager do its work.
mySQL is optimized for such tasks

Optimizing a PHP page: MySQL bottleneck

I have a page that is taking 37 seconds to load. While it is loading it pegs MySQL's CPU usage through the roof. I did not write the code for this page and it is rather convoluted so the reason for the bottleneck is not readily apparent to me.
I profiled it (using kcachegrind) and find that the bulk of the time on the page is spent doing MySQL queries (90% of the time is spent in 25 different mysql_query calls).
The queries take the form of the following with the tag_id changing on each of the 25 different calls:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
Each query is taking around 0.8 seconds to complete with a few longer delays thrown in for good measure... thus the 37 seconds to completely load the page.
My question is, is it the way the query is formatted with that nested select that is causing the problem? Or could it be any one of a million other things? Any advice on how to approach tackling this slowness is appreciated.
Running EXPLAIN on the query gives me this (but I'm not clear on the impact of these results... the NULL on primary key looks like it would be bad, yes? The number of results returned seems high to me as well as only a handful of results are returned in the end):
1 PRIMARY tbl_news ALL NULL NULL NULL NULL 1318 Using where
2 DEPENDENT SUBQUERY tbl_tag_relations ref FK_tbl_tag_tags_1 FK_tbl_tag_tags_1 4 const 179 Using where
I'e addressed this point in Database Development Mistakes Made by AppDevelopers. Basically, favour joins to aggregation. IN isn't aggregation as such but the same principle applies. A good optimize will make these two queries equivalent in performance:
SELECT * FROM tbl_news WHERE news_id
IN (select news_id from
tbl_tag_relations WHERE tag_id = 20)
and
SELECT tn.*
FROM tbl_news tn
JOIN tbl_tag_relations ttr ON ttr.news_id = tn.news_id
WHERE ttr.tag_id = 20
as I believe Oracle and SQL Server both do but MySQL doesn't. The second version is basically instantaneous. With hundreds of thousands of rows I did a test on my machine and got the first version to sub-second performance by adding appropriate indexes. The join version with indexes is basically instantaneous but even without indexes performs OK.
By the way, the above syntax I use is the one you should prefer for doing joins. It's clearer than putting them in the WHERE clause (as others have suggested) and the above can do certain things in an ANSI SQL way with left outer joins that WHERE conditions can't.
So I would add indexes on the following:
tbl_news (news_id)
tbl_tag_relations (news_id)
tbl_tag_relations (tag_id)
and the query will execute almost instantaneously.
Lastly, don't use * to select all the columns you want. Name them explicitly. You'll get into less trouble as you add columns later.
The SQL Query itself is definitely your bottleneck. The query has a sub-query in it, which is the IN(...) portion of the code. This is essentially running two queries at once. You can likely halve (or more!) your SQL times with a JOIN (similar to what d03boy mentions above) or a more targeted SQL query. An example might be:
SELECT *
FROM tbl_news, tbl_tag_relations
WHERE tbl_tag_relations.tag_id = 20 AND
tbl_news.news_id = tbl_tag_relations.news_id
To help SQL run faster you also want to try to avoid using SELECT *, and only select the information you need; also put a limiting statement at the end. eg:
SELECT news_title, news_body
...
LIMIT 5;
You also will want to look into the database schema itself. Make sure you are indexing all of the commonly referred to columns so that the queries will run faster. In this case, you probably want to check your news_id and tag_id fields.
Finally, you will want to take a look at the PHP code and see if you can make one single all-encompassing SQL query instead of iterating through several seperate queries. If you post more code we can help with that, and it will probably be the single greatest time savings for your posted problem. :)
If I understand correctly, this is just listing the news stories for a specific set of tags.
First of all, you really shouldn't
ever SELECT *
Second, this can probably be
accomplished within a single query,
thus reducing the overhead cost of
multiple queries. It seems like it
is getting fairly trivial data so
it could be retrieved within a
single call instead of 20.
A better approach to using IN might be to use a JOIN with a WHERE condition instead. When using an IN it will basically be a lot of OR statements.
Your tbl_tag_relations should definitely have an index on tag_id
select *
from tbl_news, tbl_tag_relations
where
tbl_tag_relations.tag_id = 20 and
tbl_news.news_id = tbl_tag_relations.news_id
limit 20
I think this gives the same results, but I'm not 100% sure. Sometimes simply limiting the results helps.
Unfortunately MySQL doesn't do very well with uncorrelated subqueries like your case shows. The plan is basically saying that for every row on the outer query, the inner query will be performed. This will get out of hand quickly. Rewriting as a plain old join as others have mentioned will work around the problem but may then cause the undesired affect of duplicate rows.
For instance the original query would return 1 row for each qualifying row in the tbl_news table but this query:
SELECT news_id, name, blah
FROM tbl_news n
JOIN tbl_tag_relations r ON r.news_id = n.news_id
WHERE r.tag_id IN (20,21,22)
would return 1 row for each matching tag. You could stick DISTINCT on there which should only have a minimal performance impact depending on the size of the dataset.
Not to troll too badly, but most other databases (PostgreSQL, Firebird, Microsoft, Oracle, DB2, etc) would handle the original query as an efficient semi-join. Personally I find the subquery syntax to be much more readable and easier to write, especially for larger queries.

Categories