Most efficient JOIN query - MySQL

Most efficient JOIN query - MySQL - php

Below is a gross over simplification of 2 very large tables I'm working worth.
campaign table
| id | uid | name | contact | pin | icon |
| 1 | 7 | bob | ted | y6w | yuy |
| 2 | 7 | ned | joe | y6e | ygy |
| 3 | 6 | sam | jon | y6t | ouy |
records table
| id | uid | cid | fname | lname | address | city | phone |
| 1 | 7 | 1 | lars | jack | 13 main | lkjh | 55555 |
| 2 | 7 | 1 | rars | jock | 10 maun | oyjh | 55595 |
| 2 | 7 | 1 | ssrs | frck | 10 eaun | oyrh | 88595 |
The page loops thru the records table and prints the results to an HTML table. The existing code, for some reason, does a separate query for each record "select name from campaign where id = $res['cid']" I'd like to get rid of the second query and do a some kind of join but what is the most effective way to do it?
I need to
SELECT * FROM records
and also
SELECT name FROM campaigns WHERE campaigns.id = records.cid
in a single query.
How can I do this efficiently?

Simply join the two tables. You already have the required WHERE condition. Select all columns from one but only one column from the other. Like this:
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
Note that a record row without matching campaign will get lost. To avoid that, rephrase your query like this:
SELECT records.*, campaigns.name
FROM records LEFT JOIN campaigns
ON campaigns.id = records.cid
Now you'll get NULL names instead of missing rows.

The "most efficient" part is where the answer becomes very tricky. Generally a great way to do this would be to simply write a query with a join on the two tables and happily skip away singing songs about kittens. However, it really depends on a lot more factors. how big are the tables, are they indexed nicely on the right columns for the query? When the query runs, how many records are generated? Are the results being ordered in the query?
This is where is starts being a little bit of an art over science. Have a look at the explain plan, understand what is happening, look for ways to make it more efficient or simpler. Sometimes running two subqueries in the from clause that will generate only a subset of data each is much more efficient than trying to join the entire tables and select data you need from there.
To answer this question in more detail, while hoping to be accurate for your particular case will need a LOT more information.
If I was to guess at some of these things in your database, I would suggest the following using a simple join if your tables are less than a few million rows and your database performance is decent. If you are re-running the EXACT query multiple times, even a slow query can be cached by MySQL VERY nicely, so look at that as well. I have an application running on a terribly specc'ed machine, where I wrote a cron job that simply runs a few queries with new data that is loaded overnight and all my users think the queries are instant as I make sure that they are cached. Sometimes it is the little tricks that really pay off.
Lastly, if you are actually just starting out with SQL or aren't as familiar as you think you might eventually get - you might want to read this Q&A that I wrote which covers off a lot of basic to intermediate topcs on queries, such as joins, subqueries, aggregate queries and basically a lot more stuff that is worth knowing.

You can use this query
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
But, it's much better to use INNER JOIN (the new ANSI standard, ANSI-92) because it's more readable and you can easily replace INNER with LEFT or other types of join.
SELECT records.*, campaigns.name
FROM records INNER JOIN campaigns
ON campaigns.id = records.cid
More explanation here:
SQL Inner Join. ON condition vs WHERE clause
INNER JOIN ON vs WHERE clause

SELECT *
FROM records
LEFT JOIN campaigns
on records.cid = campaigns.id;
Using a left join instead of inner join guarantees that you will still list every records entry.

Related

How to save, handle the order total amount in an orders, ordersDetails schema?

When I started designing my application database schema few months ago I have been told not to store the same data/calculated data in more than one place in the database(normalization). If I do, I will make a scope of bugs when I update the data in one place and left the other without updating. So I did an orders table and ordersDetails table. Something like this..
-- orders table
+-----+---------+----------+
| ID | clintID | date |
+-----+---------+----------+
| 1 | 1 |2018-02-22|
| 2 | 1 |2018-02-23|
| 3 | 2 |2018-02-24|
+-----+---------+----------+
-- orderDetail table
+-----+---------+------------+----------+----------+
| ID | orderID | itemNumber | quantity | unitPrice|
+-----+---------+------------+----------+----------+
| 1 | 1 | 12345 | 3 | 100.75 |
| 2 | 1 | 12346 | 3 | 100.75 |
| 3 | 2 | 12347 | 3 | 100.75 |
| 4 | 2 | 12345 | 3 | 100.75 |
| 5 | 3 | 12347 | 3 | 100.75 |
| 6 | 3 | 12345 | 3 | 100.75 |
+-----+---------+------------+----------+----------+
And to make the the queries easier for me I made a view "allOrdersSummary" like
-- allOrdersSummary
SELECT
orders.*, SUM(orderDetail.quantity * orderDetail.unitPrice) totalAmount
FROM orders INNER JOIN orderDetail ON orders.ID = orderDetail.orderID
GROUP BY orders.ID;
and I used this view later for my queries, but now I started to get the MAX_JOIN_SIZE error.
So I thought of saving the calculated total order amount along with the orders table ID, clintID, date, totalAmount and whenever I change something in the orderDeatils table I update the calculated totalAmount column in the orders table, I don't know if this is good or bad!
This problem -I don't know if this is considered a problem or not- is encountered many times, for example to know the unread messages of the client making the request I have to do sum(messages) unread from messages where to = ? and isRead = 0
A) should I make another column for calculated totalAmount in the orders table or it is a normal thing in databases to calculate the totalAmount from the orderDetails table every time I need it ?
B) If you recommend making another column in the orders table, what is the best way to update it every time a change happens in the orderDetails table ? should I update it at the PHP layer whenever I update the orderDetails table, or this is something that needs a stored procedure ?

Yes, it is normal to store pre-calculated values, based on other data in the database, in a database. But not necessarily for the reason you mention. I never had a problem with MAX_JOIN_SIZE.
The main, and probably only, reason for storing calculated values is speed. So you do it for values that don't change that often and that may be used in queries that use a lot of data and may therefore be too slow if you didn't use them.
For instance: If you want to know the average value of all the orders in your database the query would be a lot faster if you already have the order totals.
Why, and how, you update the values is completely up to you. However you have got to be consistent about it. If you use the MVC pattern it would make sense to integrate it in the controller. Or in simple terms: Whenever a form is submitted that could change one of the values, out of which the pre-calculated value is computed, you need to recompute it.
This is a clear demonstration where 'normalization' is not entirely maintained. It's not really pretty, but sometimes worth it. You could, of course, argue, that the calculated value represents 'new' information, and therefore does not offend against 'normalization'.

You have an "inflate-deflate" problem.
JOIN the two tables to make a much larger temporary table.
GROUP BY to shrink back to one row per row of the original (orders) table.
This avoids the problem:
SELECT *,
( SELECT SUM(quantity * unitPrice
FROM orderDetail WHERE orderID = orders.ID
) AS totalAmount
FROM orders;
Please let me know how your experience is with this one. It is one of the simplest examples of the inflate-deflate problem.

1 table query vs join multiple tables query performance

I'm wondering which method below is faster?
Suppose:
Maximum 10,000 products, each product has 1 user id, 1 cat id, 3 extra fields, and 5 images.
90-99% users come to the website just for the information, not posting.
Method 1: get all data from a table from a query without "JOIN":
SELECT * FROM products WHERE ...
Table: products
id | name | poster_name | cat_name | code_1 | code_2 | content |
dimensions | contact | message | images |
Method 2: get all data from multiple tables with "JOIN":
SELECT ... FROM products
LEFT JOIN cats ON products.cat_id = casts.id
LEFT JOIN users ON ....
table: products
id | name | code_1 | code_2 | content | cat_id | poster_id |
table: cats
id | cat_name |
table: users
id | poster_name |
table: extra
id | product_id | extra_info | extra_data |
table: images
id | product_id | img_src |

The first method will usually be faster for reads, and the second one will help you maintain data integrity and usually will be faster for writes.
The transition from the later form to the former is called denormalization and is usually used in data warehouses, while operational ("live") databases usually prefer the later form (second method).

You have not finished asking the question. Method 2 has no WHERE, so it will deliver 10K rows, plus have to do 20K lookups into the other tables. That makes it the loser.
Since your real question is about performance, then let's discuss the WHERE clause. With that, we can optimize it so that the desired data tends to be in RAM.
Back to your question... JOIN is probably the 'right' way to do it. And it is not that much of a performance hit assuming you have the proper indexes. So provide SHOW CREATE TABLE (even if tentative) and complete WHERE clauses.
Don't over-normalize. For example, do not normalize datetime or any other 'continuous' values.
Normalization can save space, especially in huge tables (eg, millions or billions of rows, and large, frequently repeated, strings being normalized.) This is especially helpful when the table is too big to stay cached in RAM.

mySQL JOIN query beat down

Alright, I'm not the best with JOIN's and after all these years of working with mySQL you think I would be by now at the least minimally decent. Guess I've never really worked on anything superbly complex til now worth needing to join a table. So here I am, confused ever so slightly in need of a helpful example to get me on a roll, something that's pertinent to my actual data that I can make heads or tales of cause all the reading I'm doing online else where just gives me headaches for the moment. I think I might be stuck on the mythology of JOIN's being a hard thing to do, they don't seem like it but when ever I've tried I fail. So anyway I am working with PHP as my server side coding, and I believe MySQL 5.
So heres the construct to an extent.
I have table information and table connections.
Connections has: member_id, connection_id, active
Information has: firstname, lastname, gender, member_id
I should say the tables contain more data per table, but as I understand it I need write a query that I can use the member_id as the connector/foreign key. Where I can use both sides of the information. I need to know if active is 1, and then I need to know all of the columns above mentioned for information.
I tried
SELECT member_id,
connection_id,
active,
firstname,
lastname,
gender,
member_id
FROM connections, information
WHERE connection.member_id = information.member_id AND
connection.active = 1
and I've tried
SELECT * FROM connections, information
WHERE connection.member_id = information.member_id AND
connection.active = 1
With the first one I get member_id is ambitious which is understandable to a point i think cause of the matching columns between the two tables. Then the second query doesn't server me well as it only results with one person.
My Ultimate goal is to find all the connections for a specific member_id in the connections table, while gathering all the information about those connections from the information table using the connection_id as its the same thing as the member_id in the in the information table.
So in laymans terms if I am not making sense lets say I wanted to list out all my friends in from the DB. My connection table lets me know which people I am connected to where member_id is my id and connection_id is my friends member_id on another table. Hopefully that makes more sense. And this is where I am having trouble with my query and trying to write it correctly. If I could get a working sane sample of that I think I might be able to make better sense of JOIN's doesnt help that I also can't figure out what type of JOIN is best suited for my needs either I suppose.
EDIT....
Ok as per request from comment below. Sample data expected output from tables that look similar to this:
Connections Table
member_id, connection_id, active
1 | 2 | 1
1 | 3 | 1
1 | 4 | 1
1 | 5 | 1
2 | 1 | 1
2 | 5 | 1
3 | 1 | 1
Information Table
member_id, firstname, lastname, gender, ...other unimportant rows for this query
1 | Chris | Something | m | ....
2 | Tony | Something | m | ....
3 | Brandon | Something | m | ....
4 | Cassie | Something | f | ....
5 | Jeff | Something | m | ....
6 | John | Something | m | ....
now from the connections table I need to gather all the connection_id's associated with my member_id where active is 1 then from the information table gather everyone firstname, lastname, gender.. Now I know I can do this 2 step where I pool all the connection_id's from connections then run them through a loop of some sort and one by one get the resulting id's from the first query but to me that seems a bit obscure and process intensive which I want to avoid, which brings me here.. Currently from my original query posted to the many shared thus far trying to help my results are to the effect of
1 | Chris | Something | m | 2
1 | Chris | Something | m | 3
1 | Chris | Something | m | 4
1 | Chris | Something | m | 5
what I'd like to see returned is something like
2 | Tony | Something | m
3 | Brandon | Something | m
4 | Cassie | Something | f
5 | Jeff | Something | m
After looking at this I suppose I would also need to know the friendID column that matches my member_id as well but thats something to figure out after this initial hurdle

The error is in your WHERE Clause because instead of Connections you wrote it Connection which then the server generates the error.
...
WHERE connection.member_id = information.member_id AND
connection.active = 1
try this:
SELECT a.Member_ID,
a.FirstName,
a.LastName, a.gender,
b.Connection_ID
FROM Information a INNER JOIN Connections b
on a.Member_ID = b.Member_ID
WHERE b.`Active` = 1
UPDATE
if that's the case then you will most likely have a SELF JOIN
SELECT DISTINCT b.Connection_ID,
c.*
FROM Information a INNER JOIN Connections b ON
a.Member_ID = b.Member_ID
INNER JOIN Information c ON
b.Connection_ID = c.Member_ID
WHERE b.`Active` = 1
AND a.Member_ID = 'ID HERE' -- If you want to get for specific member

The error in the first is "ambiguous" not ambitious. Ambiguous means that the SQL engine doesn't know which column you are talking about, because you have two columns with the same name (different tables but still).
You can fix that by specifying the table name along with the column name where you list the columns for select.
For example
SELECT connections.member_id, connection_id, active, firstname, lastname, gender, information.member_id FROM connections, information WHERE connections.member_id = information.member_id AND connections.active = 1
The problem with returning only one suggests that there is only one record that matches your query but it's hard to guess.

Try changing to:
SELECT * FROM connections c JOIN information i ON i.member_id = c.member_id WHERE c.active = 1

Try this:
SELECT *
FROM information
LEFT OUTER JOIN connections
ON information.member_id = connections.member_id
WHERE connections.active = 1;

You're joining on the wrong tables. You said:
where member_id is my id and connection_id is my friends member_id on another table
Then, what you have to match is connections.connection_id with information.member_id. The simplest solution is:
select c.member_id, c.connection_id, c.active from connections c
join information i on c.connection_id = i.member_id
where c.active = 1 and c.member_id = #yourId
That's all :)

Nested or Joins query for MySQL using PHP

I have many a times tried using nested query for MySQL in PHP, but it does not work. Is it not possible to do nested/Joins queries?
Just a Scenario:
I have two tables one table with user id and the other with data. User logins and with sessions I have to cross check two different tables with user id (user and data). Is it not possible to nest/join these two tables to write a single query statement.
In short is nesting or joining two or more tables permitted in PHP coding?

YES, it is possible to join two or more tables in MySQL (and therefore, also when using PHP).
You need to post your table schema, if you want us to show a relevant join query. You could, however, try something like:
SELECT * FROM user AS t1
CROSS JOIN data AS t2
ON t1.userid=t2.userid
WHERE t1.userid='154'
(This query presumes that there always will be one row with the userid in both tables. You should use LEFT JOIN instead of CROSS JOIN to return a row even if there is no row in data for the userid. 154 is just an example userid.)
Have a look at http://dev.mysql.com/doc/refman/5.5/en/join.html for information on the JOIN syntax.

users
| user_id | username | password | enabled |
|---------|----------|----------|---------|
| 1 | john | sgsd2gg | 1 |
| 2 | jane | sdshdhd | 0 |
users_data
|udata_id| user_id | some_column |
|--------|---------|-------------------|
| 1 | 1 | Some title |
| 2 | 2 | another title |
Since you haven't posted your table schema, I can't give you an exact solution. But supposing you have a users table and a users_data table, where users_data are owned by a user. You can do a join on the table to retrieve all the data.
SELECT * -- Don't select all fields unless you need it
FROM users U LEFT JOIN users_data UD ON U.user_id = UD.user_id
WHERE U.user_id = 1
This would pull all the records for user with an ID of 1. This is a very simplistic join, but it should give you an idea.
Here's an example that visually describes the different options you can use : SQL Join Differences

mysql select query problem

i have a form that has a multiple select drop down. a user can select more than one options in the select. the name of the select is array[]; using php I call implode(",",$array)
in mysql db, it stores the field as a text in this format "places"= "new york, toronto, london" when i want to display these fields i explode the commas.
I am trying to run a report to display the places. here is my select:
"select * from mytable where db.places .. userSelectedPlaces"
how can i check toronto in lists of "places" that user selected? note "places" in the db might be either just "toronto" or it might be comma separated lists of places like "ny, toronto, london, paris, etc".

If it is possible, you would be much better off using another table to hold the places that the user has selected. Call it SelectedPlaces with columns:
mytable_id - To join back to the table in your query
place - EG: "Toronto"
Then you can run a simple query to figure out if Toronto has been selected:
SELECT *
FROM mytable m
INNER JOIN SelectedPlaces sp ON sp.mytable_id = m.id
WHERE sp.place = 'Toronto'

If I understand you correctly, your database design is just wrong. Try reading about it more. Generally, in good design you should not have lists of values as one field in database and you should introduce new table for it.
But if you want to do it this way, you can use strcmp function.

If i understood correctly, this should work:
WHERE DB.PLACES LIKE '%TORONTO%'
but as other users said, its not a nice thing to have denormalized tables.

To directly answer your question, your query needs to look something like this
SELECT *
FROM mytable
WHERE places LIKE( '%toronto%' )
But, be aware, that LIKE() is slow.
To indirectly answer your question, your database schema is all wrong. That is not the right way to do a M:N (many-to-many) relationship.
Imagine instead you had this
mytable place mytable_place
+------------+ +----------+----------+ +------------+----------+
| mytable_id | | place_id | name | | mytable_id | place_id |
+------------+ +----------+----------+ +------------+----------+
| 1 | | 1 | new york | | 1 | 1 |
| 2 | | 2 | toronto | | 1 | 2 |
| 3 | | 3 | london | | 1 | 3 |
+------------+ +----------+----------+ | 2 | 2 |
| 3 | 1 |
| 3 | 3 |
+------------+----------+
The table mytable_places is what's called a lookup table (or, xref/cross-reference table, or correlation table). Its only job is to keep track of which mytable records have which place records, and vice versa.
From this example we can see that The 1st mytable record has all 3 places, the 2nd has only toronto, and the 3rd has new york and london.
This opens you up too all sorts of queries that would be difficult, expensive, or impossible with your current design.
Want to know how many mytable records have toronto? No problem
SELECT COUNT(*)
FROM mytable_place x
LEFT JOIN place p
ON p.place_id = x.place_id
WHERE p.name = 'toronto';
How about the number of mytable records per place, sorted?
SELECT p.name
, COUNT(*) as `count`
FROM mytable_place x
LEFT JOIN place p
ON p.place_id = x.place_id
GROUP BY p.place_id
ORDER BY `count` DESC, p.name ASC
And these are going to be much faster than any query using LIKE since they can use indexes on columns such as place.name.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.