I am trying to get the following query to return the right information but it is returning different number of values for fields from the same table.
My data schema is 3 table:
airlines
id|descript|userid
travelers
id|airid|ftrav|ltrav|active
travair
id|travid|airid
Here is the query:
SELECT a.id as `aid`,a.descript,userid,
group_concat(t.id) as `tids`,
group_concat(t.ftrav) as `ftravs`,
group_concat(IFNULL(t.ltrav,'')) as `ltravs`,
group_concat(t.active) as `tactives`,
ta.airid
FROM airlines `a`
LEFT JOIN travair `ta`
ON a.id = ta.airid
LEFT JOIN travelers `t`
ON ta.travid = t.id
WHERE a.userid='$userid'
GROUP BY a.id
Basically, I am trying to query the airlines table to get airlines but also pull the travelers for each of the airlines by way of the ta table which joins the two.
However, the group_concat fields all have different numbers of values in them. In the actual table, I have largely eliminated missing values so that would not account for the differences in number of elements. There seems to be something wrong with query.
Can anyone spot my error? Have been struggling with this for a couple days.
In general, the problem with aggregation and joins is that the joins produce Cartesian products for matching keys. In English, this means that you are joining along multiple dimensions and getting all combinations of different items for the same user.
What can you do? The quick-and-dirty solution is to use the distinct keyword:
SELECT a.id as `aid`,a.descript,userid,
group_concat(distinct t.id) as `tids`,
group_concat(distinct t.ftrav) as `ftravs`,
group_concat(distinct coalesce(t.ltrav, '')) as `ltravs`,
group_concat(distinct t.active) as `tactives`
A more scalable solution is to pre-aggregate each dimension to get the list along each dimension.
Note: It is possible that distinct will not work in your case, if you happen to want all lists to be the same length.
GROUP_CONCAT, like most aggregation functions, ignores NULL values. Since you are using left joins, any GROUP_CONCATs on fields from tables on the right side of those joins may have null values for some of the pre-aggregation result rows.
Edit: If you want to synthesize "results" for the lacking data, you can aggregate calculated values instead; you've actually already done so once with this bit...
group_concat(IFNULL(t.ltrav,'')) as `ltravs`
.... you can just take it a bit further (make the lack of that data a bit more obvious) with something like this:
GROUP_CONCAT(IFNULL(theField, '[Not Recorded]')) AS theList
Related
I have a SQL SELECT statement in which I'm using 3 tables.
I'm using INNER JOINs to join the tables, however I've come across a bit of an issue because two of the columns that I'd like the join conditional to be based on are different data types;
One is an integer - the id of the products table and can be seen below as p.id.
The other is a comma delimited string of these id's in the order table. customers can order more than one product at a time, so the product id's are stored as a comma delimited list.
here's how far I've gotten with the SQL:
"SELECT o.transaction_id, o.payment_status, o.payment_amount, o.product_id, o.currency, o.payment_method, o.payment_time, u.first_name, u.last_name, u.email, p.title, p.description, p.price
FROM orders AS o
INNER JOIN products AS p ON ( NEED HELP HERE--> p.id IN o.product_id comma delimited list)
INNER JOIN users AS u ON ( o.user_id = u.id )
WHERE user_id = '39'
ORDER BY payment_time DESC
LIMIT 1";
Perhaps I could use REGEX? currently the comma delimited list reads as '2,1,3' - however the number of characters isn't limited - so I need a conditional to check if my product id (p.id) is in this list of o.product_id?
What you have is a perfect example for one-to-many relationship where you have one order and several items attached to it. You should have a link table like
order_product - which makes the connection between a orderid and productid where you can also put specific data for the relationship between the two (like when the item was added, quantity, etc)
Then you make the join using this table and you have same field types everywhere.
simple example:
select
/* list of products */
from
order o,
order_product op,
product p
where
o.id = 20
and o.id = op.orderid
and op.productid = p.id
This in one of those very common nightmares when working with legacy database.
The rule is simple: never ever store multiple values in one table columns. This is known as first normal form.
But how to deal with that in existing DB?
The good thing™
If you have the opportunity to refactor your DB, extract the "comma separated values" to their own table. See http://sqlfiddle.com/#!2/0f547/1 for a basic example how to do that.
Then to query the tables you will have to use a JOIN as explained in elanoism's answer.
The bad thing™
I you can't or don't want do that, you probably have to rely on the FIND_IN_SET function.
SELECT * FROM bad WHERE FIND_IN_SET(target_value, comma_separated_values) > 0;
See http://sqlfiddle.com/#!2/29eba/2
BTW, why is this bad thing™? Because as you see, it is not easy to write query against multi-valued columns -- but, probably more important, you are not able to use index on that columns, nor, as a consequence, to easily perform join operations or enforce referential integrity.
The so-so thing™
As a final note, if the set of possible value is small (less that 65), an alternative approach would be to change the column type to a SET().
My database has a challenges table where there are these columns: Challenge_Name, Challenge_Description. I have a 2nd table called completed_challenges_junction and it has these columns: Member_Name, Challenge_Name.
I need a way to display all of the challenge names from the challenges table along with the member names within the completed_challenges_junction. If there is no match then I would like it to display NULL.
I think I'm pretty close to having my SQL code working, here is what I have now.
SELECT challenges.Challenge_Name, challenges.Challenge_Description, completed_challenges_junction.Member_Names
FROM challenges
LEFT JOIN completed_challenges_junction ON challenges.Challenge_Name=completed_challenges_junction.Challenge_Name
This works but also bring duplicate entries of another member. If i use WHERE Member_Name='testmember' it only brings the entries of the member when I need it to still display all Challenge_Names.
SELECT
A.Challenge_Name,A.Challenge_Description,
GROUP_CONCAT(IFNULL(B.Member_Names,'')) Member_Names
FROM
challenges A
LEFT JOIN completed_challenges_junction B ON
A.Challenge_Name=B.Challenge_Name AND
A.Challenge_Description=B.Challenge_Description
GROUP BY
A.Challenge_Name,A.Challenge_Description
;
or
SELECT
A.Challenge_Name,A.Challenge_Description,
GROUP_CONCAT(IFNULL(B.Member_Names,'')) Member_Names
FROM
challenges A
LEFT JOIN completed_challenges_junction B
USING (Challenge_Name,Challenge_Description)
GROUP BY
A.Challenge_Name,A.Challenge_Description
;
You could add the WHERE clause into the ON condition like this
SELECT challenges.Challenge_Name, challenges.Challenge_Description, completed_challenges_junction.Member_Names
FROM challenges
LEFT JOIN completed_challenges_junction ON (
challenges.Challenge_Name=completed_challenges_junction.Challenge_Name
AND Member_Name='testmember'
)
This way only testmember entries will come up from completed_challanges_junction, but all challenges will be displayed.
I'm not sure what exactly you want, but maybe you are looking for 'group by' clause and group_concat function:
SELECT challenges.Challenge_Name, challenges.Challenge_Description,
GROUP_CONCAT(completed_challenges_junction.Member_Names SEPARATOR ', ')
FROM challenges
LEFT JOIN completed_challenges_junction
ON challenges.Challenge_Name=completed_challenges_junction.Challenge_Name
GROUP BY challenges.Challenge_Name
This query will return result where each Challenge_Name occurs only once.
I've got a large mysql query with 5 joins which may not seem efficient but I'm struggling to find a different solution which would work.
The views table is the main table here, because both clicks and conversions table rely on it via the token column(which is indexed and set as a foreign key in all tables).
The query:
SELECT
var.id,
var.disabled,
var.name,
var.updated,
var.cid,
var.outdated,
IF(var.type <> 0,'DL','LP') AS `type`,
COUNT(DISTINCT v.id) AS `views`,
COUNT(DISTINCT c.id) AS `clicks`,
COUNT(DISTINCT co.id) AS `conversions`,
SUM(tc.cost) AS `cost`,
SUM(cp.value) AS `revenue`
FROM variants AS var
LEFT JOIN views AS v ON v.vid = var.id
LEFT JOIN traffic_cost AS tc ON tc.id = v.source
LEFT JOIN clicks AS c ON c.token = v.token
LEFT JOIN conversions AS co ON co.token = v.token
LEFT JOIN c_profiles AS cp ON cp.id = co.profile
WHERE var.cid = 28
GROUP BY var.id
The results I'm getting are:
The problem is the revenue and cost results are too hight, because for views,clicks and impressions only the distinct rows are counted, but for revenue and cost for some reason(I would really appreciate an explanation here) all rows in all tables are taken into the result set.
I know this is a large query, but both clicks and conversions tables rely on the views table which is used for filtering the results e.g. views.country = 'uk'. I've tried doing 3 queries and merging them, but that didn't work(it gave me wrong results).
One more thing that I find weird is that if I remove the joins with clicks, conversions, c_profiles the costs column shows correct results.
Any help would be appreciated.
In the end I had to use 3 different queries and do a merge on them. Seemed like an overhead, but worked for me.
I hate to submit a new question, but everyone else has some slight thing that is different enough to make this one seem necessary to ask.
Users are to type in a vendor name, and then see all the "kinds" of things they have bought from that company, in a list, sorted by the lowest-inventory-on-hand.
Summary:
I have three tables.
There are more fields than these, but these are the relevant ones (as far as I can tell).
stuff_table
stuff_vendor_name *(search this field with $user_input, but only one result per lookup_type)*
lookup_type
lookup_table
lookup_type
lookup_quantity (order by this)
category_type
category_table
category_type
category_location (check if this field == $this_location, which is already assigned)
Wordier Explanation:
The users are searching for a value that is contained only in the stuff_table -- distinct stuff_vendor_name values for each lookup_type. Each item can be bought from multiple sources, the idea is to see if any vendor has ever sold even one of any type of item before.
But the results need to be ORDER BY the lookup_quantity, in the lookup_table.
And importantly, I have to check to see if they are searching the correct location for these categories, located in the category_table in the category_location field.
How do I efficiently make this query?
Above, I mentioned the variables that I have:
$user_input (the value we are searching for distinct matches in the stuff_vendor_name field) and $current_location.
To understand the relationship of these tables, I will use an example.
The stuff_table would have dozens of entries with dozens of vendors, but have a lookup_type of, say, "watermelon," "apple," or "cherry."
The lookup_table would give the category_type of "Jellybean." One category type can have multiple lookup_types. But each lookup_type has exactly one category_type.
You are not sharing much about the relationships, but try this:
SELECT *
FROM stuff_table st
LEFT JOIN lookup_table lt
ON st.lookup_type = lt.lookup_type
LEFT JOIN category_table ct
ON lt.category_type = ct.category_type
AND ct.category_location = $this_location
GROUP BY st.lookup_type
ORDER BY lt.lookup_quantity
WHERE st.stuff_vendor_name = $user_input
From a first glance at it you could use foreign keys in your tables to make link between them or using the LEFT JOIN mysql command to make abstraction of another linked table.
The only example I can think of is on a Doctrine pattern, but I think you'll get what I'm saying:
$q = Doctrine_Query::create()
->from('Default_Model_DbTable_StuffTable s')
->leftJoin('s.LookupTable l')
->leftJoin('s.CategoryTable c')
->orderBy('l.lookup_quantity DESC');
$stuff= $q->execute(array(), Doctrine_Core::HYDRATE_ARRAY);
I made a nested query instead.
The final code looks like this:
$query_row=mysql_query(
"SELECT DISTINCT * FROM table_a WHERE
field_1 IN (SELECT field_1 FROM table_b WHERE field_2 = $field_2)
AND field_3 IN (SELECT field_3 FROM table_c WHERE field_4 = $field_4)
ORDER BY field_5 DESC
");
This was incredibly simple. I just didn't know you could do a nested query like that.
I read it was "bad form" because it makes some kind of search optimization not as good as it could be, so be careful using nested select statements.
However for me, it seemed to actually be significantly faster.
This is the query. Im mostly interested if there is a better way to grab the stuff I use GROUP_CONCAT for, or if thats a fairy good way of grabbing this data. I then explode it, and put the ids/names into an array, and then use a for loop to echo them out.
SELECT
mov_id,
mov_title,
GROUP_CONCAT(DISTINCT categories.cat_name) as all_genres,
GROUP_CONCAT(DISTINCT cat_id) as all_genres_ids,
GROUP_CONCAT(DISTINCT case when gen_dominant = 1 then gen_catid else 0 end) as dominant_genre_ids,
GROUP_CONCAT(DISTINCT actors.act_name) as all_actors,
GROUP_CONCAT(DISTINCT actors.act_id) as all_actor_ids,
mov_desc,
mov_added,
mov_thumb,
mov_hits,
mov_numvotes,
mov_totalvote,
mov_imdb,
mov_release,
mov_html,
mov_type,
mov_buytickets,
ep_summary,
ep_airdate,
ep_id,
ep_hits,
ep_totalNs,
ep_totalRs,
mov_rating,
mov_rating_reason,
mrate_name,
dir_id,
dir_name
FROM movies
LEFT JOIN _genres
ON movies.mov_id = _genres.gen_movieid
LEFT JOIN categories
ON _genres.gen_catid = categories.cat_id
LEFT JOIN _actors
ON (movies.mov_id = _actors.ac_movid)
LEFT JOIN actors
ON (_actors.ac_actorid = actors.act_id AND act_famous = 1)
LEFT JOIN directors
ON movies.mov_director = directors.dir_id
LEFT JOIN movie_ratings
ON movies.mov_rating = movie_ratings.mrate_id
LEFT JOIN episodes
ON mov_id = ep_showid AND ep_season = 0 AND ep_num = 0
WHERE mov_id = *MOVIE_ID* AND mov_status = 1
GROUP BY mov_id
EXPLAIN of the query is here
alt text http://www.krayvee.com/o2/explain.gif
Personally, I would try to break the query up into multiple queries. Mostly I would recommend removing the Actor and Genre Joins so that you can get rid of all those group_concat functions. Then do separate queries to pull this data out. Not sure if it would speed things up, but it's probably worth a shot.
You've basically done a Cartesian product between genres, actors, directors, movie_ratings and episodes. That's why you have to use DISTINCT inside your GROUP_CONCAT(), because the pre-grouped result set has a number of rows equal to the product of the number of matching rows in each related table.
Note that this query wouldn't work at all in SQL, except that you're using MySQL which is permissive about the single-value rule.
Like #Kibbee, I usually recommend to run separate queries in cases like this. It's not always better to run a single query. Try breaking up the query and doing some profiling to be sure.
PS: What? No _directors table? So you can't represent a move with more than one director? :-)