Does ORDER BY apply before or after DISTINCT? - php

In a MySQL query, when using the DISTINCT option, does ORDER BY apply after the duplicates are removed? If not, is there any way to make it do so? I think it's causing some issues with my code.
EDIT:
Here's some more information about what's causing my problem. I understand that, at first glance, this order would not be important, since I am dealing with duplicate rows. However, this is not entirely the case, since I am using an INNER JOIN to sort the rows.
Say I have a table of forum threads, containing this data:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 1 | 1 | Information |
| 2 | 1 | FAQ |
| 3 | 2 | Support |
+----+--------+-------------+
I also have a set of posts in another table like this:
+----+----------+--------+---------+
| id | threadid | userid | content |
+----+----------+--------+---------+
| 1 | 1 | 1 | Lorem |
| 2 | 1 | 2 | Ipsum |
| 3 | 2 | 2 | Test |
| 4 | 3 | 1 | Foo |
| 5 | 2 | 3 | Bar |
| 6 | 3 | 5 | Bob |
| 7 | 1 | 2 | Joe |
+----+----------+--------+---------+
I am using the following MySQL query to get all threads, then sort them based on the latest post (assuming that posts with higher ids are more recent:
SELECT t.*
FROM Threads t
INNER JOIN Posts p ON t.id = p.threadid
ORDER BY p.id DESC
This works, and generates something like this:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 1 | 1 | Information |
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 1 | 1 | Information |
| 1 | 1 | Information |
+----+--------+-------------+
However, as you can see, the information is correct, but there are duplicate rows. I'd like to remove such duplicates, so I used SELECT DISTINCT instead. However, this yielded the following:
+----+--------+-------------+
| id | userid | title |
+----+--------+-------------+
| 3 | 2 | Support |
| 2 | 1 | FAQ |
| 1 | 1 | Information |
+----+--------+-------------+
This is obviously wrong, since the "Information" thread should be on top. It would seem that using DISTINCT causes the duplicates to be removed from the top to the bottom, so only the final rows are left. This causes some issues in the sorting.
Is this the case, or am I analyzing things incorrectly?

Two things to understand:
Generally speaking, resultsets are unordered unless you specify an ORDER BY clause; to the extent that you specify a non-strict order (i.e. ORDER BY over non-unique columns), the order in which records that are equal under that ordering appear within the resultset is undefined.
I suspect you may be specifying such a non-strict order, which is the root of your problems: ensure that your ordering is strict by specifying ORDER BY over a set of columns that is sufficient to uniquely identify each record for which you care about its final position in the resultset.
DISTINCT may use GROUP BY, which causes the results to be ordered by the grouped columns; that is, SELECT DISTINCT a, b, c FROM t will produce a resultset that appears as though ORDER BY a, b, c has been applied. Again, specifying a sufficiently strict order to meet your needs will override this effect.
Following your update, bearing in mind my point #2 above, it is clear that the effect of grouping the results to achieve DISTINCT makes it impossible to then order by the non-grouped column p.id; instead, you want:
SELECT t.*
FROM Threads t INNER JOIN Posts p ON t.id = p.threadid
GROUP BY t.id
ORDER BY MAX(p.id) DESC

DISTINCT informs MySQL how to build a rowset for you, ORDER BY gives a hint how this rowset should by presented. So the answer is: DISTINCT first, ORDER BY last.

The order in which DISTINCT and ORDER BY are applied, in most cases, will not affect the final output.
However, if you also use GROUP BY, this will affect the final output. In this case, the ORDER BY is performed after the GROUP BY, which will return unexpected results (assuming you expect the sort to be performed before the grouping).

Related

MYSQL UPDATE from SELECT INNER JOIN statement

Hello :) I am fairly new to using INNER JOIN and still trying to comprehend it's logic which I think I am sort of beginning to understand. After being across a few different articles on the topic I have generated a query for finding duplicates in my table of phone numbers.
My table structure is as such:
+---------+-------+
| PhoneID | Phone |
+---------+-------+
Very simple. I created this query:
SELECT A.PhoneID, B.PhoneID FROM T_Phone A
INNER JOIN T_Phone B
ON A.Phone = B.Phone AND A.PhoneID < B.PhoneID
Which returns the ID of a phone that matches another one. I don't know how to word that properly so here is an example output:
+---------+---------+
| PhoneID | PhoneID |
+---------+---------+
| 17919 | 17969 |
| 17919 | 22206 |
| 17919 | 23837 |
| 17920 | 17970 |
| 17920 | 22203 |
| 17920 | 23834 |
| 17921 | 17971 |
| 17921 | 22225 |
| 17921 | 22465 |
| 17921 | 24011 |
| 17921 | 24047 |
| 17922 | 17972 |
| 17922 | 22198 |
| 17922 | 23879 |
| 17923 | 17973 |
| 17923 | 22199 |
| 17923 | 23880 |
+---------+---------+
You can note that on the left there is repeating IDs, the phone number that matches will be on the right (These are just the IDs of said numbers). what I am trying to accomplish, is to actually change a join table relative to the ID on the right. The join table structure is as such:
+----------+-----------+
| T_JoinID | T_PhoneID |
+----------+-----------+
Where T_JoinID is a larger object with a collection of those T_PhoneIDs, hence the join table. What I want to do is take a row from the original match query, and find the right side PhoneID in the join table, then update that item in the Join to be equal to the left side PhoneID. Repeating this for each row.
It's sort of a way to save space and get rid of matching numbers, I can just point the matching ones to the original and use that as a reference when I need to retrieve it.
After that I need to actually delete the original numbers that I reset the reference for but... This seems like a job for 2 or 3 different queries.
EDIT:
Sorry I know I didn't include enough detail. Here is some additional info:
My exact table structure is not the same as here but I am only using the columns that I listed so I didn't consider the fact that any of the others would matter. Most of the tables have a unique ID that is auto incremented. The phone table has carrier, type, ect columns. The additional columns I felt were irrelevant to include, but if there is a solution that includes the auto incremented ID of each table, let me know :) Anyway, I sort of found a solution, using multiple queries though I am still interested to learn and apply knowledge based on this question. So I have a that join table that I mentioned. It might look something like this for the expected results. There is a before and after table in one sorry for poor formatting.
+--------------------+---------+----------+---------+
| Join Table Results | | | |
+--------------------+---------+----------+---------+
| Before | | After | |
| Join | Table | Join | Table |
| PersonID | PhoneID | PersonID | PhoneID |
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 2 |
| 1 | 3 | 1 | 3 |
| 2 | 4 | 2 | 1 |
| 2 | 5 | 2 | 5 |
| 2 | 6 | 2 | 6 |
| 3 | 7 | 3 | 5 |
| 3 | 8 | 3 | 5 |
| 3 | 9 | 3 | 5 |
| 3 | 10 | 3 | 8 |
| 3 | 11 | 3 | 9 |
+--------------------+---------+----------+---------+
So you can see that in the before columns, 7, 8, and 9 would all be duplicate phone numbers in the PhoneID - PhoneID relationship table I posted originally. After the query I wanted to retrieve the duplicates using the PhoneID - PhoneID comparison and take the ones that match, to change the join table in a way that I have shown directly above. So 7, 8, 9 all turn to 5. Because 5 is the original number, and 7, 8, 9 coincidentally were duplicates of 5. So I am basically pointing all of them to 5, and then deleting what would have been 7, 8, 9 in my Phone table since they all have a new relationship to 5. Is this making sense? xD It sounds outrageous typing it out.
End Edit
How can I improve my query to accomplish this task? Is it possible using an UPDATE statement? I was also considering just looping through this output and updating each row individually but I had a hope to just use a single query to save time and code. Typing it out makes me feel a tad obnoxious but I had hope there was a solution out there!
Thank you to anyone in advance for taking your time to help me out :) I really appreciate it. If it sounds outlandish, let me know I will just use multiple queries.

Can SELECT, SELECT COUNT and cross reference tables be handled by just one query?

I have a page that displays a list of projects. With each project is displayed the following data retrieved from a mysqli database:
Title
Subtitle
Description
Part number (1 of x)
The total number of photos associated with that project
A randomly selected photo from the project
A list of tags
Projects are displayed 6 per page using a pagination system
As this is based on an old project of mine, it was originally done with sloppy code (I was just learning and did not know any better) using many queries. Three, in fact, just for items 5-7, and those were contained within a while loop that worked with the pagination system. I'm now quite aware that this is not even close to being the right way to do business.
I am familiar with INNER JOIN and the use of subqueries, but I'm concerned that I may not be able to get all of this data using just one select query for the following reasons:
Items 1-4 are easy enough with a basic SELECT query, BUT...
Item 5 needs a SELECT COUNT AND...
Item 6 needs a basic SELECT query with an ORDER by RAND LIMIT 1 to
select one random photo out of all those associated with each project
(using FilesystemIterator is out of the question, because the photos
table has a column indicating 0 if a photo is inactive and 1 if it is
active)
Item 7 is selected from a cross reference table for the tags and
projects and a table containing the tag ID and names
Given that, I'm not certain if all this can (r even should for that matter) be done with just one query or if it will need more than one query. I have read repeatedly how it is worth a swat on the nose with a newspaper to nest one or more queries inside a while loop. I've even read that multiple queries is, in general, a bad idea.
So I'm stuck. I realize this is likely to sound too general, but I don't have any code that works, just the old code that uses 4 queries to do the job, 3 of which are nested in a while loop.
Database structure below.
Projects table:
+-------------+---------+----------+---------------+------+
| project_id | title | subtitle | description | part |
|---------------------------------------------------------|
| 1 | Chevy | Engine | Modify | 1 |
| 2 | Ford | Trans | Rebuild | 1 |
| 3 | Mopar | Diff | Swap | 1 |
+-------------+---------+----------+---------------+------+
Photos table:
+----------+------------+--------+
| photo_id | project_id | active |
|--------------------------------|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 2 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 3 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
+----------+------------+--------+
Tags table:
+--------+------------------+
| tag_id | tag |
|---------------------------|
| 1 | classic |
| 2 | new car |
| 3 | truck |
| 4 | performance |
| 5 | easy |
| 6 | difficult |
| 7 | hard |
| 8 | oem |
| 9 | aftermarket |
+--------+------------------+
Tag/Project cross-reference table:
+------------+-----------+
| project_id | tag_id |
|------------------------|
| 1 | 1 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 2 | 5 |
| 3 | 6 |
| 3 | 9 |
+------------+-----------+
I'm not asking for the code to be written for me, but if what I'm asking makes sense, I'd sincerely appreciate a shove in the right direction. Often times I struggle with both the PHP and MySQLi manuals online, so if there's any way to break this down, then fantastic.
Thank you all so much.
You're able to do subqueries inside your SELECT clause, like this:
SELECT
p.title, p.subtitle, p.description, p.part,
(SELECT COUNT(photo_id) FROM Photos where project_id = p.project_id) as total_photos,
(SELECT photo_id FROM Photos where project_id = p.project_id ORDER BY RAND LIMIT 1) as random_photo
FROM projects as p
Now, for the list of tags, as it returns more than one row, you can't do a subquery and you should do one query for every project. Well, in fact you can if you return all the tags in some kind of concatenation, like a comma separated list: tag1,tag2,tag3... but I don't recommend this one time that you will need to explode the column value. Do it only if you have many many projects and the performance to retrieve the list of tags for each individual project is fairly low. If you really want, you can:
SELECT
p.title, p.subtitle, p.description, p.part,
(SELECT COUNT(photo_id) FROM Photos where project_id = p.project_id) as total_photos,
(SELECT photo_id FROM Photos where project_id = p.project_id ORDER BY RAND LIMIT 1) as random_photo,
(SELECT GROUP_CONCAT(tag SEPARATOR ', ') FROM tags WHERE tag_id in (SELECT tag_id FROM tagproject WHERE project_id = p.project_id)) as tags
FROM projects as p
As you said from item 1 to 4 you already have the solution.
Add to the same query a SQL_CALC_FOUND_ROWS instead of a SELECT COUNT to solve the item 5.
For the item 6 you can use a subquery or maybe a LEFT JOIN limiting to one result.
For the latest item you can also use a subquery joining all the tags in a single result (separated by comma for instance).

How to filter in Sphinx

Sphinx data:
+----------+-------------+-------------+
| id | car_id | filter_id |
+----------+-------------+-------------+
| 37280991 | 4261 | 46 |
| 37280992 | 4261 | 18 |
| 37281000 | 4261 | 1 |
| 37281002 | 4261 | 28 |
| 51056314 | 4277 | 18 |
| 51056320 | 4277 | 1 |
| 51056322 | 4277 | 28 |
+----------+-------------+-------------+
I have a page that show cars and you can apply filters. I'm trying that Sphinx return the cars that have filter 1 and 46. If you take a look the above table, you will see that just one car(4261) have both filters. The problem is that I don't know how to apply this in Sphinx.
$this->cs->SetFilter('filter_id', array(1, 46)); // this don't work because show me both(4261, 4277) cars, because work like a "in"
$this->cs->SetGroupBy('car_id', SPH_GROUPBY_ATTR);
$this->cs->SetFilter('filter_id', array(1));
$this->cs->SetFilter('filter_id', array(46));
Both filters apply, and both need to match. In effect they are 'AND'ed.
Seems, misread the question, missed the fact using group by. THought using a MVA.
... so have to be a bit more creative. Alas will probably need to use SphinxQL, rather than SphinxAPI. As sphinxQL has HAVING
SELECT id,car_id FROM index WHERE filter_id IN (1,46) GROUP BY car_id HAVING COUNT(*)>1
This only includes rows where multiple documents match per car (ie matches each time using the IN clause. If there can be duplicates (like two rows with filter_id=1 then can perhaps use COUNT(DISTINCT filter_id) instead? )

Omit / ignore any records that have been purchased by a user

I'm currently in the process of developing a site that amongst other things allows a user to filter a marketplace by showing or hiding items they have already purchased. This works on a basic AJAX call that passes through the current conditions of those filters available, and then using CodeIgniter's active record, it builds the appropriate query.
My issue is wrapping my head around the query so that if a user selects to hide purchased items the query omits / ignores any relevant records (i.e. if user_id = 5 and hide purchased is true, any scenes that user_id = 5 owns are not returned in the query).
Tbl: scenes
-------------------------------------------------------------------------
| design_id | scene_id | scene_name | ... [irrelevant columns to the Q] |
|-----------|----------|------------|-----------------------------------|
| 1 | 1 | welcome | |
| 1 | 2 | hello | |
| 2 | 3 | asd | |
-------------------------------------------------------------------------
The designs table is very similar to this and includes references to the game, game type, design name and so forth.
Tbl: user_scenes
----------------------------------------------------------------------
| design_id | scene_id | user_id | ... [irrelevant columns to the Q] |
|-----------|----------|---------|-----------------------------------|
| 1 | 1 | 5 | |
| 1 | 2 | 5 | |
| 1 | 1 | 9 | |
----------------------------------------------------------------------
Query
SELECT `designs`.`design_id`, `designs`.`design_name`, `scenes`.`scene_id`, `scenes`.`scene_name`, `scenes`.`scene_description`, `scenes`.`scene_unique_code`, `scenes`.`date_created`, `scenes`.`scene_cost`, `scenes`.`type`, `games`.`game_title`, `games`.`game_title_short`, `games_genres`.`genre`
FROM (`scenes`)
JOIN `designs` ON `designs`.`design_id` = `scenes`.`design_id`
JOIN `games` ON `designs`.`game_id` = `games`.`game_id`
JOIN `games_genres` ON `games`.`genre_id` = `games_genres`.`genre_id`
WHERE `scenes`.`private` = 0
ORDER BY `designs`.`design_name` asc, `scenes`.`scene_name` asc
LIMIT 6
The query uses CodeIgniter's active record ($this->db->select() / $this->db->where()) but that is somewhat irrelevant.
--
I've tried things like an INNER JOIN with user_scenes and then grouping by scene_id, but that presents an issue with only returning scenes that are present in user_scenes. I then made an attempt at a subquery but then questioned whether that was the correct route.
I understand there are other ways - looping through the returned data and querying whether that record exists for a specific user, but that I suspect would be highly inefficient. As such, I'm at a loss as to what to try and would appreciate any help.
I don't know if your setup permits it, but I would do a subselect:
Either via a NOT IN:
SELECT * FROM `scenes`
WHERE `scenes`.`scene_id` NOT IN (SELECT `scene_id` FROM `user_scenes` WHERE `user_id` = 5)
Or maybe via a LEFT JOIN:
SELECT * FROM `scenes`
LEFT JOIN (SELECT `scene_id`, `user_id` FROM `user_scenes` WHERE `user_id` = 5) AS `user_scenes`
ON `scenes`.`scene_id` = `user_scenes`.`scene_id`
WHERE `user_scenes`.`user_id` IS NULL
Bit I guess the first way is faster.

mysql select data when one field may or may not contain values

Have a MYSQL look up table that returns the points received for a certain place(P) among a number of finishers(N), with a variety of formats(points_id). Different point structures are used for different events. Some times the points awarded depend on the number of finishers(N) Sometimes they don't.
Here is a short version of the table, with two sample structures.
points_id -1 the points depends on N Point_id -2 the points don't.
points
points_id | P | N | points |
1 | 1 | 3 | 90 |
1 | 1 | 2 | 85 |
1 | 1 | 1 | 80 |
1 | 2 | 3 | 60 |
1 | 2 | 2 | 50 |
1 | 3 | 3 | 30 |
3 | 1 | | 100 |
3 | 2 | | 90 |
3 | 3 | | 80 |
3 | 3 | | 70 |
So my question:
1) is there a way to put the wildcard in the table data.
eg if the N column that shows blank had a % in it
and I did this query.
SELECT points from t1 WHERE points_id=3 and P=3 and N=2
It would return 96??
PS I know this doesn't work but is shows my idea.
2) I want it to be fast, may put it in a procedure to use in larger queries. I am guessing unless there is a very simple way to do what I show above. the fastest method will be to have rows for all of the different N's in the points_id =3 case. Is that true?
You might consider UNION ALL:
SELECT points from t1 WHERE points_id=3 AND P=3
UNION ALL
SELECT points from t1 WHERE points_id=3 AND N=2
This will get the results regardless if P=3 or N=2. I copied your database schema and tried this, and it produced:
points
------
80
70
If you do want this to be fast with a large amount of data--you'll really want to have an index and/or primary key.
Try this :
SELECT points from t1 WHERE points_id=3 and P=3 and (N=2 OR (IFNULL(N,'')=''))
// dataType of N varchar
SELECT points from t1 WHERE points_id=3 and P=3 and (N=2 OR (IFNULL(N,0)=0))
// dataType of N numeric type
Let me know if there is any change or am getting you wrong

Categories