Transposing Rows to Columns - php

I am trying to come up with a single result set from two tables in a way that I have never done, and I am having a little bit of trouble figuring out how to do it or even what to search for in these forums. Consider the following hypothetical table data:
Table1
----------------------------------
ID | Name
----------------------------------
1 | aa
2 | bb
3 | cc
4 | dd
5 | ee
Table2
----------------------------------
ID | Table1_ID | Value
----------------------------------
1 | 1 | good
2 | 2 | Dumb
3 | 3 | Fat
4 | 4 | Wet
5 | 5 | High
6 | 1 | Thin
7 | 2 | Tall
8 | 3 | Goofy
9 | 4 | Rich
10 | 5 | Funny
I am looking for a query or method that allows me to end up with the following result set:
Code:
aa | bb | cc | dd | ee
---------------------------------------------------------------
good | Dumb | Fat | Wet | High
Thin | Tall | Goofy | Rich | Funny
Essentially, I want the ability to take the list of names from Table1, transpose them into column headers, then put all of Table2's values into their respective columns with the ability to sort on any column. Is this possible?

Of course this can be done in SQL. But it is tricky. As the data is written in the question, you can group by t2.id in groups of 5. After that, the query is just conditional aggregation.
select max(case when t2.table1_Id = 1 then value end) as aa,
max(case when t2.table1_Id = 2 then value end) as bb,
max(case when t2.table1_Id = 3 then value end) as cc,
max(case when t2.table1_Id = 4 then value end) as dd,
max(case when t2.table1_Id = 5 then value end) as ee
from table2 t2
group by cast(t2.id - 1 / 5 as int);
Having values be implicitly related by their ids seems like a really, really bad database design. There should be some sort of entity id that combines them.

You've got two problems here:
1) Using values as column names can't be done in a clean way
2) You want to split table2.value in 2 rows: Which of the values should be on which row? Gordon Linoff uses the table2.id field for this, but if it's auto increment and your data gets some adds/deletes later on that rhythm will get broken.
There's been similar questions before. This one has an answer that gets pretty close:
mysql select dynamic row values as column names, another column as value
Here they generate the string for the query and make a prepared statement out of it.

Related

Can SELECT, SELECT COUNT and cross reference tables be handled by just one query?

I have a page that displays a list of projects. With each project is displayed the following data retrieved from a mysqli database:
Title
Subtitle
Description
Part number (1 of x)
The total number of photos associated with that project
A randomly selected photo from the project
A list of tags
Projects are displayed 6 per page using a pagination system
As this is based on an old project of mine, it was originally done with sloppy code (I was just learning and did not know any better) using many queries. Three, in fact, just for items 5-7, and those were contained within a while loop that worked with the pagination system. I'm now quite aware that this is not even close to being the right way to do business.
I am familiar with INNER JOIN and the use of subqueries, but I'm concerned that I may not be able to get all of this data using just one select query for the following reasons:
Items 1-4 are easy enough with a basic SELECT query, BUT...
Item 5 needs a SELECT COUNT AND...
Item 6 needs a basic SELECT query with an ORDER by RAND LIMIT 1 to
select one random photo out of all those associated with each project
(using FilesystemIterator is out of the question, because the photos
table has a column indicating 0 if a photo is inactive and 1 if it is
active)
Item 7 is selected from a cross reference table for the tags and
projects and a table containing the tag ID and names
Given that, I'm not certain if all this can (r even should for that matter) be done with just one query or if it will need more than one query. I have read repeatedly how it is worth a swat on the nose with a newspaper to nest one or more queries inside a while loop. I've even read that multiple queries is, in general, a bad idea.
So I'm stuck. I realize this is likely to sound too general, but I don't have any code that works, just the old code that uses 4 queries to do the job, 3 of which are nested in a while loop.
Database structure below.
Projects table:
+-------------+---------+----------+---------------+------+
| project_id | title | subtitle | description | part |
|---------------------------------------------------------|
| 1 | Chevy | Engine | Modify | 1 |
| 2 | Ford | Trans | Rebuild | 1 |
| 3 | Mopar | Diff | Swap | 1 |
+-------------+---------+----------+---------------+------+
Photos table:
+----------+------------+--------+
| photo_id | project_id | active |
|--------------------------------|
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 2 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 3 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
+----------+------------+--------+
Tags table:
+--------+------------------+
| tag_id | tag |
|---------------------------|
| 1 | classic |
| 2 | new car |
| 3 | truck |
| 4 | performance |
| 5 | easy |
| 6 | difficult |
| 7 | hard |
| 8 | oem |
| 9 | aftermarket |
+--------+------------------+
Tag/Project cross-reference table:
+------------+-----------+
| project_id | tag_id |
|------------------------|
| 1 | 1 |
| 1 | 3 |
| 1 | 4 |
| 2 | 2 |
| 2 | 5 |
| 3 | 6 |
| 3 | 9 |
+------------+-----------+
I'm not asking for the code to be written for me, but if what I'm asking makes sense, I'd sincerely appreciate a shove in the right direction. Often times I struggle with both the PHP and MySQLi manuals online, so if there's any way to break this down, then fantastic.
Thank you all so much.
You're able to do subqueries inside your SELECT clause, like this:
SELECT
p.title, p.subtitle, p.description, p.part,
(SELECT COUNT(photo_id) FROM Photos where project_id = p.project_id) as total_photos,
(SELECT photo_id FROM Photos where project_id = p.project_id ORDER BY RAND LIMIT 1) as random_photo
FROM projects as p
Now, for the list of tags, as it returns more than one row, you can't do a subquery and you should do one query for every project. Well, in fact you can if you return all the tags in some kind of concatenation, like a comma separated list: tag1,tag2,tag3... but I don't recommend this one time that you will need to explode the column value. Do it only if you have many many projects and the performance to retrieve the list of tags for each individual project is fairly low. If you really want, you can:
SELECT
p.title, p.subtitle, p.description, p.part,
(SELECT COUNT(photo_id) FROM Photos where project_id = p.project_id) as total_photos,
(SELECT photo_id FROM Photos where project_id = p.project_id ORDER BY RAND LIMIT 1) as random_photo,
(SELECT GROUP_CONCAT(tag SEPARATOR ', ') FROM tags WHERE tag_id in (SELECT tag_id FROM tagproject WHERE project_id = p.project_id)) as tags
FROM projects as p
As you said from item 1 to 4 you already have the solution.
Add to the same query a SQL_CALC_FOUND_ROWS instead of a SELECT COUNT to solve the item 5.
For the item 6 you can use a subquery or maybe a LEFT JOIN limiting to one result.
For the latest item you can also use a subquery joining all the tags in a single result (separated by comma for instance).

MySQL Join and newest lot information

I have four tables. The first describing a mix of items. The second is a linking table between the mix, and the items. The third is the item table, and the fourth holds lot information - lot number, and when that lot starts being used.
mix
mixID | mixName
----------------
1 | Foxtrot
2 | Romeo
mixLink
mixID | itemID
----------------
1 | 1
1 | 2
1 | 3
item
itemID| itemName
----------------
1 | square
2 | triangle
3 | hexagon
itemLots
itemID| lotNo | startDate
-------------------------
1 | 22/5/3| 22/07/16
2 | 03/5 | 25/07/16
2 | 04/19 | 12/08/16
3 | 15/0 | 05/08/16
Now, I need to be able to fetch the information from the database, which details all the items from a mix, as well as the most recently used lot number, something like this:
itemName | lotNo
----------------
square | 22/5/3
triangle | 04/19
hexagon | 15/0
I've tried a dozen different mixes of joins, group by's, maxes, subqueries, and havings; all to no avail. Any help would be much appreciated, I've been pulling my hair out for hours, and I feel like my fingernails are just scraping at the solution!
This will give you the result you're after and will perform pretty well if you have your indexes done properly. I'm not sure how you're meaning to reference mix as it's not apparent in your sample output but I've included it in the WHERE clause so hopefully you can understand where you would use it.
SELECT i.itemName
, (SELECT il.lotNo FROM itemLots il
WHERE il.itemID=i.itemID
ORDER BY il.startDate desc
LIMIT 1) as lotNo
FROM item i
JOIN mixLink ml ON ml.itemID=i.itemID
JOIN mix m ON m.mixID=ml.mixID
WHERE m.mixName="Foxtrot";

What is the proper way to first create a bunch of arrays and second, loop through them?

I'm building a simple website to let people at my work easily match employees names to their baby pictures, using Jquery draggable script.
I have two tables (USERS and ENTRIES). There is a 3rd table called PEOPLE but it's not important for this question.
In entries, the userid of "0" has the CORRECT ordering (i.e. personid 1 should be 4th. personid 2 should be 3rd, etc. And again, personid is from another table called PEOPLE that shouldn't matter for this question).
+----------+---------+--------------+
| userid |firstname| lastname
+----------+---------+--------------+
| 1 | Bob | Wilson |
| 2 | Charlie | Jackson |
| 3 | Jim | Smith |
| 4 | Doug | Jones |
+----------+---------+--------------+
+----------+---------+--------------+
| userid | personid| ordering
+----------+---------+--------------+
| 0 | 1 | 4 |
| 0 | 2 | 3 |
| 0 | 3 | 1 |
| 0 | 4 | 2 |
| 1 | 1 | 2 |
| 1 | 2 | 4 |
| 1 | 3 | 1 |
| 1 | 4 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 3 |
| 2 | 3 | 4 |
| 2 | 4 | 2 |
+----------+---------+--------------+
I will actually have probably 100 users with entries in the entries table. And each user will have 100 personids with an ordering. What I want to do is, in the most efficient, logical way, loop through all of the entries and compare each one to the CORRECT answer (i.e. userid 0).
So my thinking is probably to get all of the entries in arrays and then compare array for userid 1 to the array for userid 0. Then compare the array for userid2 to the array for userid 0. And so on.
I just want to compare how many right answers each subsequent user has. So in my example tables, userid 1 has ONE correct answer (Personid 3 matching with ordering 1) and userid 2 has TWO correct answers (personid 2 matching with ordering 3 and personid 4 matching with ordering 2).
I first did this...
$sql = "SELECT * FROM entries";
$getpeople = mysqli_query($connection, $sql);
if (!$getpeople) {
die("Database query failed: " . mysqli_error($connection));
} else {
while ($row = mysqli_fetch_array($getpeople)) {
$entriesarray[$row['userid']][$row['personid']]=$row['ordering'];
}
}
That would give me a bunch of arrays for all users with their entries.
Then I did this as a test...
$result_array = array_intersect_assoc($entriesarray[1], $entriesarray[0]);
print_r($result_array);
echo "COUNTRIGHT=".count($result_array);
And that essentially does what I want by giving me COUNTRIGHT of "1". It sees how many from the array for userid 1 match value AND key from the array for userid 0 (again, the correct answer array).
But now I'm stumped as to how to do this efficiently in a nice loop, rather than having to do it one by one. Again, I'd probably have 100 users to loop through. And I'm questioning whether my initial mySQL query above is correct or should be done differently.
And ultimately, I'd want to list out all users firstname, lastname and the number they got right. And order them DESC by the number they got right. In essence, it'd be a leaderboard that would look like...
Jim Smith 2
Charlie Wilson 1
and so on (but on a much greater scale where the person in first place will probably have around 80 or 90 correct).
Because I want to show names too on the "leaderboard", I know I need a JOIN somewhere in here to get that info from the USERS table, so it gets even more convoluted for my tiny brain :)
I hope this makes sense to someone. If anyone can point me in the right direction, that would be fantastic. I'm losing my mind and it's probably fairly simple at the end of the day.
Thanks!
The query below will give you a count of correct entries per user by left joining to user 0 and counting the ordering matches for each person
select t1.userid, count(t2.*)
from entries t1
left join entries t2 on t2.userid = 0
and t2.personid = t1.personid
and t2.ordering = t1.ordering
group by t1.userid
If you need names from the user table you can join it
select u.*, count(t2.*)
from entries t1
join users u on u.userid = t1.userid
left join entries t2 on t2.userid = 0
and t2.personid = t1.personid
and t2.ordering = t1.ordering
group by u.userid
You're on the right track. Comparing all the users in your $entriesarray in a loop is not going to be much more complicated than what you've already done.
First, shift user 0 from the $entriesarray to get the correct set to match against. Then you can just iterate over the rest of the entries array calculating the correct matches like so...
$correctSet = $entriesarray[0];
unset($entriesarray[0]); // we don't want to check this against itself
$leaderBoard = []; // initialize an empty leaderboard array
foreach($entriesarray as $userId => $entries) {
$correctMatches = count(array_intersect_assoc($entries, $correctSet));
$leaderBoard[$userId] = $correctMatches;
}
As far as doing this in SQL, it's also possible by just doing a JOIN against user_id 0 as already answered above (so I won't bother repeating). The user information can also be obtained separately and looked up by user_id since you already have that information in your $leaderboard array in this approach.

MySQL: GROUP BY within ranges

I have a table with scores like this:
score | user
-------------------
2 | Mark
4 | Alex
3 | John
2 | Elliot
10 | Joe
5 | Dude
The table is gigantic in reality and the real scores goes from 1 to 25.
I need this:
range | counts
-------------------
1-2 | 2
3-4 | 2
5-6 | 1
7-8 | 0
9-10 | 1
I've found some MySQL solutions but they seemed to be pretty complex some of them even suggested UNION but performance is very important. As mentioned, the table is huge.
So I thought why don't you simply have a query like this:
SELECT COUNT(*) as counts FROM score_table GROUP BY score
I get this:
score | counts
-------------------
1 | 0
2 | 2
3 | 1
4 | 1
5 | 1
6 | 0
7 | 0
8 | 0
9 | 0
10 | 1
And then with PHP, sum the count of scores of the specific ranges?
Is this even worse for performance or is there a simple solution that I am missing?
Or you could probaly even make a JavaScript solution...
Your solution:
SELECT score, COUNT(*) as counts
FROM score_table
GROUP BY score
ORDER BY score;
However, this will not returns values of 0 for count. Assuming you have examples for all scores, then the full list of scores is not an issue. You just won't get counts of zero.
You can do what you want with something like:
select (case when score between 1 and 2 then '1-2'
when score between 3 and 4 then '3-4'
. . .
end) as scorerange, count(*) as count
from score_table
group by scorerange
order by min(score);
There is no reason to do additional processing in php. This type of query is quite typical for SQL.
EDIT:
According to the MySQL documentation, you can use a column alias in the group by. Here is the exact quote:
An alias can be used in a query select list to give a column a
different name. You can use the alias in GROUP BY, ORDER BY, or HAVING
clauses to refer to the column:
SELECT
SUM(
CASE
WHEN score between 1 and 2
THEN ...
Honestly, I can't tell you if this is faster than passing "SELECT COUNT(*) as counts FROM score_table GROUP BY score" into PHP and letting PHP handle it...but it add a level of flexibility to your setup. Create a three column table as 'group_ID', 'score','range'. insert values into it to get your groupings right
1,1,1-2
1,2,1-2
1,3,3-4
1,4,3-4
etc...
Join to it on score, group by range. THe addition of the 'group_ID' allows you to set groups...maybe have group 1 break it into groups of two, and let a group_ID = 2 be a 5 set range (or whatever you might want).
I find the table use like this is decently fast, requires little code changing, and can readily be added to if you require additional groupings or if the groupings change (if you do the groupings in code, the entire case section needs to be redone to change the groupings slightly).
How about this:
select concat((score + (1 * (score mod 2)))-1,'-',(score + (1 * (score mod 2)))) as score, count(*) from TBL1 group by (score + (1 * (score mod 2)))
You can see it working in this fiddle: http://sqlfiddle.com/#!2/215839/6
For the input
score | user
-------------------
2 | Mark
4 | Alex
3 | John
2 | Elliot
10 | Joe
5 | Dude
It generates this:
range | counts
-------------------
1-2 | 2
3-4 | 2
5-6 | 1
9-10 | 1
If you want a simple solution which is very powerful, add an extra field within your table and put a value in it for the score so 1 and 2 have the value 1, 3 and 4 has 2. With that you can group by that value. Only by inserting the score you've to add an extra field. So your table looks like this:
score | user | range
--------------------------
2 | Mark | 1
4 | Alex | 2
3 | John | 2
2 | Elliot | 1
10 | Joe | 5
5 | Dude | 3
Now you can do:
select count(score),range from table group by range;
This is always faster if you've an application where selecting has prior.
By inserting do this:
$scoreRange = 2;
$range = ceil($score/$scoreRange);

WHERE vs HAVING in generated queries

I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.

Categories