How do I JOIN these tables? - php

I have two tables:
**WEEK**
-id
-week #
**ITEM**
-id
-name
-is_marked
-week #
My final desired result is to have a table that has a rowspan="3" table cell with week # in it, followed by the three results for each week from SELECT * FROM item WHERE week = week_number AND is_marked = 1
I don't know if I need to JOIN anything because I don't really need any data from the WEEK table, but I can't quite get how I'd loop through the results to get the desired output. Thoughts?

Ok, some thoughts.
Do you really need a week table in database? What purpose does it serve? It seems like a value for the item table. Its not really an entity, so may not need a separate table IMO.
You are right in that there is nothing to join in the two tables. If the Item table had week_id in it, instead of number, that would make sense, except for my comments in 1
Why only 3 items from each week? If you really want that, I would recommend you order by week number in item table. You can discard the other values in code. Use the order by clause in this case.
Or, you can loop through all the values in week table, for each, select from item table using week number. You can use the limit clause to return only 3 items.

Change your item table to hold the week id and not the value.
then you can do a simple join by doing:
select i.name
from item i
inner join week w on w.id = i.id
where w.week = '2' and i.is_marked = '1'

Related

mysql select across 2 tables - only include rows from 1 if match value in another

I've built a maintenance database for a client with multiple tables that works fine, but now they want to be able to get reports and I'm having trouble creating a select statement across 2 tables.
A user can search repair type, start/end date and location...no issue at all returning results of repair type between 2 dates (all held in the same table), but the tables for different types of repair don't store the location info, that is held against info in the vehicle info table.
So on 1 table I can query something like:
SELECT fid from cm_repair where start_date >= '$date1' AND end_date <= '$date2'
and on the other table I can have:
SELECT id from cm_fleet where location='$loc'
Is there anyway I can combine these so that I only get rows where id and fid match?
You can use an INNER JOIN:
SELECT fid
from cm_repair as t1
join cm_fleet as t2 on t1.fid = t2.id and location='$loc'
where start_date >= '$date1' AND end_date <= '$date2'
Check this link.
Inner, left, and right join's are common options for combining tables that you might be needing here.
Inner will bring in all valid rows and selected columns from BOTH tables. Essentially if the ID you are joining is present in one table, but not the other, you could end up with columns with null values.
Left and right are similar and a bit faster at processing than the Inner since less data is returned (depending on your query and statements). Essentially it'll return all valid rows and selected columns from the left and right table, BUT if the ID that you are joining on is not in the other side of the join, that row of data will not be returned, therefore no null values.
Thanks, will try all this - however we've discovered a flaw in the way the tables are setup anyway and need to edit to include a location column in the repair table, so that will make it all much easier to search as well.

MySQL Join - Sorting data, grouping data

I have two tables:
twitterusers table
twittergrowth Table
I am trying to do JOIN these 2 tables, get all fields from twitteruser and selective fields from twittergrowth, then fetch only the last 3 rows from this data.
Expected Output:
Current Output:
I.e the rows are repeating. I want rows unique by ID or usernames, and the last set of timestamps. So it would be the last 3 rows, which has the most recent timestamps.
The code I could type scribble out is :
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on (t1.username=t2.username)
Searched quite few pages/sites, but cant really figure out how to do it. I would appreciate any help. :)
Additionally, I would like to get a LIMIT parameter added to the final result, so that I can paginate the full result.
First you need to find a maximum new_timestamp (latest) within groups of the same user_id and username in twittergrowth table. This is a classic group-wise maximum problem and the subquery tgmax does that. Then you need to join back the same table (tg this time) to get other columns that aren't in the group by clause of subquery and are not used in aggregate functions (like max()). These columns are new_followers_count and new_friends_count.
If you tried to put them in the select of subquery mysql would return values from an unspecified row from the same group and not necessarily the same as the one with the latest timestamp. This is explained here.
Once you get desired output for twittergrowth table the only thing left is to join twitterusers table to get all other columns.
SELECT tu.*, tg.new_followers_count, tg.new_friends_count, tg.new_timestamp
FROM twitterusers tu
JOIN twittergrowth tg
ON tu.user_id = tg.user_id AND tu.username = tg.username
JOIN
( SELECT tgg.user_id, tgg.username, max(tgg.new_timestamp) as latest_timestamp
FROM twittergrowth tgg
GROUP BY tgg.user_id, tgg.username ) tgmax
ON tg.user_id = tgmax.user_id AND tg.username = tgmax.username
AND tg.new_timestamp = tgmax.latest_timestamp
Note that this query would benefit from a composite index on (user_id,username,new_timestamp) in the twittergrowth table.
You need to group by to achieve your expected output.
GROUP BY id
To limit, or split results into pages, you can simply add LIMIT X,Y where X is the starting record and Y is the total number of records.
So a query to pull the expected results you want, but only the first 10 would be like so:
SELECT
t1.*,
t2.new_followers_count,
t2.new_friends_count,
t2.new_timestamp
FROM twitterusers t1
JOIN twittergrowth t2 on t1.username=t2.username
GROUP BY t1.id
LIMIT 0,10

Multiple table select grouped query

We need to grab the last and newest 20 entries from different tables. However, the GROUP BY statement skips records because we are working with LEFT JOIN on tables.
All these records are linked to unique persons in another table. We store these person's id's in an array for more queries later.
We have a few tables (in which all those person id's are stored) and we want to get them sorted and grouped.
The tables are like this:
SELECT lastRecord+personID FROM t1
SELECT lastRecord+personID FROM t2
SELECT lastRecord+personID FROM t3
SELECT lastRecord+personID FROM t4
WHERE t5.Essential_Column_Name = '1'
GROUP BY personID
ORDER BY 'all the latest entries'
LIMIT 20
With that, the relevance of all the latest entries should be equal.
We do have a timestamp column as well. Perhaps that might work better.
Any input is highly appreciated!
For people looking for an answer on this; this is the right post, answer and update to this Q:
UNION mysql gives weird numbered results
With thanks to all for the ideas and providing the paths to the right solution.

How to find duplicates in database?

There are many questions on how to find duplicates in a database, but not with the specific problem that I have.
I have a table with approx. 120000 entries. I need to find duplicates. To find them, I use a php script that is structured like the following:
//get all entries from database
//loop through them
//get entries with greater id
//compare all of them with the original one
//update database (delete duplicate, update information in linked tables, etc.)
It is not possible to sort out all duplicates already in the initial query, because I have to loop through all entries since my duplicate search is sensitive not only to entries that are 100% alike, but also entries that are 90% alike. I use similar_text() for that.
I think the first loop is okay, but looping through all other entries within the loop is just too much. With 120000 entries this would be close to (120000^2)/2 iterations.
So instead of using a loop within the loop, there must be a better way to do it. Do you have any ideas? I thought about using in_array(), but it is not sensitive to something like 90% string similarity, and also doesn't give me the array's fields it found the duplicates in - I would need those to get the entries' ids to update the database correctly.
Any ideas?
Thank you very much!
Charles
UPDATE 1
The query I am using right now is the following:
SELECT a.host_id
FROM host_webs a
JOIN host_webs b ON a.host_id != b.host_id AND a.web = b.web
GROUP BY a.host_id
It shows originals and duplicates perfectly, but I need to get rid of the originals, i.e. the first ones found with the associated data. How can I accomplish that?
You can JOIN the table onto itself and do it all in SQL (I know you say you don't think you can, but I would be surprised if this is the case). All you need to do is put all the columns you use to test for duplicates into the ON clause of the JOIN.
SELECT id
FROM tablename a
JOIN tablename b ON a.id != b.id AND a.col1 = b.col1 AND a.col2 = b.col2
GROUP BY id
This will return just the ids of the rows where col1 and col2 are duplicated. You can incorporate whatever string comparisons you need into this, the ON clause can be as complicated as you need it to be. For example:
SELECT id
FROM tablename a
JOIN tablename b ON a.id != b.id AND
(a.col1 = b.col1 AND (a.col2 = b.col2 OR a.col3 = b.col3))
OR ((a.col1 = b.col1 OR a.col2 = b.col2) AND a.col3 = b.col3)
OR (SOUNDEX(a.col1) = SOUNDEX(b.col1) AND SOUNDEX(a.col2) = SOUNDEX(b.col2) AND SOUNDEX(a.col3) = SOUNDEX(b.col3))
GROUP BY id
EDIT
Since all you are actually doing with your query is looking for rows where the web column is identical, this would do the job of finding only the duplicates and not the original "good" records - assuming host_id is numeric and that the "good" record would be the one with the lowest host_id:
SELECT b.host_id
FROM host_webs a
INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
GROUP BY b.host_id
I imagine the end game here would be to remove the duplicates, so if you are feeling brave you could actually delete them in one go:
DELETE b.*
FROM host_webs a
INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
The GROUP BY is not necessary in the DELETE statement because it doesn't matter if you try and delete the same row more than once in a single statement.
If you're doing a 1-time removal of duplicate items, I wouldn't bother writing a php script - it's cleaner to do it in sql.
The general algorithm for removing duplicates that I find works the best is:
1. duplicate the table
2. truncate the original table
3. set a unique index on whichever columns need to be unique
4. reinsert the rows using either INSERT IGNORE INTO original_table SELECT * FROM duplicate_table OR REPLACE INTO original_table SELECT * FROM duplicate table
5. fixed linked tables - remove orphaned rows (DELETE x FROM x LEFT JOIN original TABLE ON (...) WHERE original_table.id IS NULL)

Compare database value to number of rows in another table

For each item in the first table, there is a 'numberOf' field. The value of this field must have an identical number of rows in a related table. These are like reservation rows so multiple users can book the item at the same time. This syncronisation sometimes goes out and there are more rows than the 'numberOf' field variable, and vice versa.
So I want to display a table that outputs the 'numberOf' from the first table, and the amount of rows that correspond to it from the other table. They are linked by the Item ID. Hope this isn't too confusing. The query is output with a do while loop. Here is the query I have so far anyway:
$querySync = sprintf("SELECT
COUNT(reserve_id), item_id, details, numberOf
FROM
reservations
JOIN
items ON item_id = itemID_reserved
WHERE
itemID_reserved = 1 ");
So at the moment it counts the number of rows in the reservations table. It then joins the items table so I can display the description and numberOf etc. Of course at the moment it only outputs the item with ID 1. But I can't seem to get it to go though each item, check its numberOf, and compare it to the number of rows in reservations table.
The idea is to have it all on one column and at the end of the row print if it is out of sync etc. I then need to rebuild the rows in the reservations table to match the numberOf.
Sorry thats a long one!
SELECT COUNT(reserve_id), item_id, details, numberOf,
COUNT(reserve_id) > numberOf AS overbook
FROM items
LEFT JOIN
reservations
ON itemID_reserved = item_id
GROUP BY
item_id
It might be easier to just directly calculate which items are "out of sync":
select i.item_id
from reservations r JOIN items i on (i.item_id = r.itemID_reserved)
group by i.item_id
having count(r.itemID_reserved) > i.numberOf
I'm making some assumptions there about which tables have which fields, but it should be sufficiently illustrative.

Categories