Combining duplicate rows with different values in columns - php

I have a script written to import a CSV of my client's product inventory. The problem is there's a bug in the software they use to track their inventory that will duplicate a product with different values for their inventory.
So when I import the CSV they send me there are duplicate rows of the same product with different inventories. Example:
id | product | cases | unit
--------------------------------
1 | MF003 | 3 | 7
2 | MF004 | 5 | 6
3 | MF005 | 1 | 9
4 | MF005 | 7 | 2
5 | MF006 | 2 | 1
The MF005 product has two rows. What I need is this:
id | product | cases | unit
--------------------------------
1 | MF003 | 3 | 7
2 | MF004 | 5 | 6
3 | MF005 | 8 | 11
5 | MF006 | 2 | 1
You'll notice that MF005 is now one row with both cases and units added up correctly.
I suppose the better approach here would be to do this using a SELECT query instead of dealing with it beforehand via INSERT, but if there's a smarter way to do this by INSERTing, I'm definitely open to it.

You can insert and update at once:
CREATE TABLE importdata(
id INT,
product VARCHAR(200) DEFAULT "" PRIMARY KEY,
cases INT,
unit INT
);
INSERT INTO importdata(id,product,cases,unit) VALUES (3,"MF005",1,9) ON
DUPLICATE KEY UPDATE cases=cases+1, unit=unit+9;
Results in:
3 | MF005 | 1 | 9
Executing the second insert:
INSERT INTO importdata(id,product,cases,unit) VALUES (4,"MF005",7,2) ON
DUPLICATE KEY UPDATE cases=cases+7, unit=unit+2;
Results in:
3 | MF005 | 8 | 11

Related

Make a (composite) primary key for an 'item' table where the incremental 'item_id' resets per 'order_id' foreign key [duplicate]

This question already has answers here:
Custom SERIAL / autoincrement per group of values
(3 answers)
MySQL: Add sequence column based on another field
(3 answers)
MySQL auto-increment based on group
(3 answers)
Closed 4 years ago.
Teacher wants us to discuss how to implement the 'line_item' table of a receipt database for a global business.
He wants us to discuss the primary (composite?) key of such a 'line item' table. If our global business is printing thousands of receipts (with hundreds of items) every minute, how much do we need to consider the possibility of hitting the upper limits of an INT for the line_item_id? Is there a more effective way to handle the line_item id?
"Note: Gaps in numbering (generally caused by deletions) are okay. Also look for possible race conditions."
Consider this: You add 3 receipts with 3 items each. You delete the 2nd item of the 3rd receipt. Then add a new item to the 3rd receipt.
So I'm picturing something like this to start with: (Where receipt #1 and #3 were originally identical before the add/delete exercise. Note the numbering on #3.)
| receipt_id (FK) | line_id (INT) | line_text (TINYTEXT) |
---------------------------------------------------------------------
| 1 | 1 | 'Apple' |
| 1 | 2 | 'Banana' |
| 1 | 3 | 'Coconut' |
---------------------------------------------------------------------
| 2 | 1 | 'Apple' |
| 2 | 2 | 'Apple' |
| 2 | 3 | 'Apple' |
---------------------------------------------------------------------
| 3 | 1 | 'Apple' |
| 3 | 3 | 'Coconut' |
| 3 | 4 | 'Dates' |
---------------------------------------------------------------------
We've been learning SQL and PHP (mySQLi in innoDB if that matters much.)
However I feel this may be accomplished in SQL alone, right?
It appears I should use the MAX() function in the second column to make a COMPOSITE PRIMARY KEY and to avoid worrying about hitting the upper INT threshold.
I'm trying to concoct a MySQL INSERT that would do something dynamic with its numbering of the receipt line, like:
INSERT INTO line_item (3, VALUES MAX(receipt_line)+1, 'Dates')
At the end he has other thoughts like: "What if you now delete and add an item at the end of the second receipt?"
I imagine these are the operations he's looking for:
-- Delete the last line of the second receipt
DELETE FROM line_item WHERE receipt_id=2 AND line_id=MAX(line_id)
-- Add a new line to the second receipt.
INSERT INTO line_item VALUES ( 2, MAX(line_id)+1, 'Watermelon')
And I expect results like:
| receipt_id (FK) | line_id (INT) | line_text (TINYTEXT) |
---------------------------------------------------------------------
| 1 | 1 | 'Apple' |
| 1 | 2 | 'Banana' |
| 1 | 3 | 'Coconut' |
---------------------------------------------------------------------
| 2 | 1 | 'Apple' |
| 2 | 2 | 'Apple' |
| 2 | 3 | 'Watermelon' |
---------------------------------------------------------------------
| 3 | 1 | 'Apple' |
| 3 | 3 | 'Coconut' |
| 3 | 4 | 'Dates' |
---------------------------------------------------------------------
But instead I get a bunch of errors and weird results.
Do I need to use PHP to calculate the line_id instead? What am I missing?
You can't use just MAX(line_id) because that gets the maximum value of line_id across all receipts. What you want is the maximum line_id in one specific receipt. Something like SELECT MAX(line_id) FROM line_item WHERE receipt_id = 3.
You could build that into a single SQL query using a sub select to get the new line_id value. But I think it would be better done in PHP. Determine the new line_id first, then insert your new line item. In two separate queries.
Also think about how your receipt application will delete a line item. If I want to delete the second Apple line item in receipt 2 I would need to know it's line_id. So your application will need to know the line_id of new items it adds.

Updating table with unique column

A table contains the following data, is using INNODB, has a UNIQUE constraint on position/fk, and doesn't allow NULL for position.
+----+----------+-----+
| id | position | fk |
+----+----------+-----+
| 1 | 1 | 123 |
| 2 | 2 | 123 |
| 3 | 3 | 123 |
| 4 | 4 | 123 |
| 5 | 5 | 123 |
| 6 | 6 | 123 |
| 7 | 7 | 123 |
| 8 | 8 | 123 |
| 9 | 9 | 123 |
| 10 | 10 | 123 |
+----+----------+-----+
PHP receives a request to update the table to the following. The format of the request can be provided how ever is most convenient such as [2,1,4,3,6,5,8,7,10,9] or [{"id":1, "position":2}, ... ], etc.
+----+----------+-----+
| id | position | fk |
+----+----------+-----+
| 1 | 2 | 123 |
| 2 | 1 | 123 |
| 3 | 4 | 123 |
| 4 | 3 | 123 |
| 5 | 6 | 123 |
| 6 | 5 | 123 |
| 7 | 8 | 123 |
| 8 | 7 | 123 |
| 9 | 10 | 123 |
| 10 | 9 | 123 |
+----+----------+-----+
I've confirmed that SET unique_checks=0; will not allow unique checks to be temporarily disabled, and don't wish to actually remove the unique index, update the table, and reapply the unique index.
How can this table be updated?
If there is no simple means to do so, I thought of a couple of options, but don't like them:
Allowing NULL in position. Is there a way to temporarily allow NULL similar to how SET FOREIGN_KEY_CHECKS=0; can disable foreign keys?
First delete all the records and then reinsert them. This might result in performance issues as there are indexes on the table which will need to be recreated.
All I can think is that you need to first change all the positions to some other values that aren't in the range of new position values you ultimately need to set, but are still unique within the rows.
An easy way to do this, assuming your position column is a signed integer, is to set all the positions to their opposite (negative) value. They'll remain unique, but they won't be in the set of the new values.
You can do this in a transaction along with your subsequent updates, so no other concurrent transaction will ever see the negative values.
BEGIN;
UPDATE MyTable SET position = -position;
UPDATE MyTable SET position = 2 WHERE id = 1;
...etc...
COMMIT;
This is a hack. The sign bit of the integer is being used for a purpose other than showing negative numbers.

tracking mysql changes in the long term -- for formation of line graphs etc

I have a table I manage for a competition I run, currently it has 4 columns.
ID
Name
Tag
Rating
Effectively, I want to track the changes of the tag (and the dates that it changes) for as long as possible to provide my users a line graph of their play in the competition.
I've tried looking around for a way of doing it, but most that I could find with my limited knowledge is ways to only save the last change, or ways to save the whole table when a change is made. In my case the changes are to individual users (rows), so they would need to be tracked individually.
Any help would be greatly appreciated.
edit:
sample
id | name | tag | rating
-------------------------
1 | Khar | 5 | 800
2 | SantaCruz | 3 | 850
3 | Sion | 2 | 900
4 | VT | 1 | 758
5 | newFort | 4 | 535
6 | Bandra | 6 | 483
7 | Worli | 10 | 888
8 | Sanpada | 11 | 999
9 | Joe | 9 | 779
10 | Sally | 15 | 888
11 | Elphiston | 17 | 525
12 | Currey Road | 31 | 879
the tag is effectively the ranking that I want to track in a longer term.
When it comes to the desired outcome, I am not sure what is possible. Effectively I would want to be able to create a line graph for every individual, (y axis: tag, x axis: dates) so another table which tracks all the changes and their dates I guess would be ideal.
You could store the calculated tag values in a separate table ttbl (you never change any values but instead always insert them there!):
tid | uid | tag | changed
------------------------------------------
1 | 1 | 5 | '2017-09-24T15:34:23'
2 | 1 | 8 | '2017-09-24T15:36:23'
3 | 2 | 3 | '2017-09-24T15:38:23'
4 | 3 | 2 | '2017-09-24T15:40:23'
5 | 4 | 1 | '2017-09-24T16:34:23'
6 | 5 | 4 | '2017-09-24T16:44:23'
7 | 1 | 5 | '2017-09-24T18:14:23'
8 | 6 | 6 | '2017-09-24T18:24:23'
9 | 1 | 9 | '2017-09-24T18:34:23'
10 | 7 | 10 | '2017-09-24T18:44:23'
11 | 1 | 5 | '2017-09-25T15:31:23'
12 | 8 | 11 | '2017-09-25T15:32:23'
13 | 9 | 9 | '2017-09-25T15:33:23'
14 | 10 | 15 | '2017-09-25T15:34:23'
15 | 11 | 17 | '2017-09-25T15:35:23'
16 | 12 | 31 | '2017-09-25T15:36:23'
Note, that the tag value for uid=1 changes from 5 to 8, 5, 9 and back to 5 again over time.
Now, with a special select you fish out the latest values for a specific cut-off point in time like this:
SELECT t.* FROM ttbl t
INNER JOIN (SELECT uid i, max(changed) dt
FROM ttbl WHERE changed<'20170925'
GROUP BY uid) u ON i=uid AND dt=changed;
The date value '20170925' here represents the cut-off time. Feel free to join this to your table with the current rating values or other stuff.
You can find a little demo here: http://rextester.com/GZNL52987
You can use a trigger to store ranking values inside another table.
I would define the new table like this :
CREATE TABLE `tag_history` (
`id` int(11) NOT NULL,
`tag` int(8) NOT NULL, -- 8 seems to be enough for your needs
`player` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
ALTER TABLE `tag_history` ADD PRIMARY KEY (`id`);
ALTER TABLE `tag_history` MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
Using this trigger :
CREATE TRIGGER AFTER UPDATE ON `my_table_name`
FOR EACH ROW
BEGIN
IF NEW.tag <> OLD.tag THEN
INSERT INTO `tag_history` (tag, player) VALUES (NEW.tag, NEW.name);
END IF;
END;
Having CURRENT_TIMESTAMP as default value for timestamp will automatically store the date and time of your UPDATE inside the row.
Or, because you tagged this question PHP, you can simply INSERT INTO tag_history when updating your table within your PHP script, without having to use any trigger.
But this means you will have to store future updates to your table before being able to draw any graph as you don't have a timestamp for your entries.

check if comma separated values exist in table

I have a table: users_group
id | group_id | user_ids
---|----------|---------
1 | 1 | 3
2 | 1 | 2
3 | 3 | 2
4 | 2 | 3
5 | 2 | 4
6 | 2 | 2
condition is that user_ids can be inserted only once. But in above case it is inserted for more than one group_id.
I am using this query to insert users_id field within foreach loop:
INSERT INTO users_group (group_id, user_ids) VALUES(2,3)
how can I prevent to insert duplicate user_ids
Is there any better query?
Yes, there is a better way to solve this problem, but the solution doesn't imply the query.
Instead of user_id column you should create a new column called user_id and add data like this:
id | group_id | user_id
1 | 1 | 3
2 | 1 | 4
3 | 2 | 3
4 | 2 | 4
5 | 2 | 2
6 | 3 | 2
7 | 3 | 3
8 | 3 | 4
This is called Many to Many relation and makes everything easier. After that you need to only JOIN the tables;
I think you want to start by creating unique index on the column that you want to have only unique values . In this case that it would be the user_ids column.

MySQL query how to get list of all distinct values from columns that contain multiple string values?

I am trying to get a list of distinct values from the columns out of a table.
Each column can contain multiple comma delimited values. I just want to eliminate duplicate values and come up with a list of unique values.
I know how to do this with PHP by grabbing the entire table and then looping the rows and placing the unique values into a unique array.
But can the same thing be done with a MySQL query?
My table looks something like this:
| ID | VALUES |
---------------------------------------------------
| 1 | Acadian,Dart,Monarch |
| 2 | Cadillac,Dart,Lincoln,Uplander |
| 3 | Acadian,Freestar,Saturn |
| 4 | Cadillac,Uplander |
| 5 | Dart |
| 6 | Dart,Cadillac,Freestar,Lincoln,Uplander |
So my list of unique VALUES would then contain:
Acadian
Cadillac
Dart
Freestar
Lincoln
Monarch
Saturn
Uplander
Can this be done with a MySQL call alone, or is there a need for some PHP sorting as well?
Thanks
Why would you store your data like this in a database? You deliberately nullify all the extensive querying features you would want to use a database for in the first place. Instead, have a table like this:
| valueID | groupID | name |
----------------------------------
| 1 | 1 | Acadian |
| 2 | 1 | Dart |
| 3 | 1 | Monarch |
| 4 | 2 | Cadillac |
| 2 | 2 | Dart |
Notice the different valueID for Dart compared to Matthew's suggestion. That's to have same values have the same valueID (you may want to refer to these later on, and you don't want to make the same mistake of not thinking ahead again, do you?). Then make the primary key contain both the valueID and the groupID.
Then, to answer your actual question, you can retrieve all distinct values through this query:
SELECT name FROM mytable GROUP BY valueID
(GROUP BY should perform better here than a DISTINCT since it shouldn't have to do a table scan)
I would suggest selecting (and splitting) into a temp table and then making a call against that.
First, there is apparently no split function in MySQL http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/ (this is three years old so someone can comment if this has changed?)
Push all of it into a temp table and select from there.
Better would be if it is possible to break these out into a table with this structure:
| ID | VALUES |AttachedRecordID |
---------------------------------------------------------------------
| 1 | Acadian | 1 |
| 2 | Dart | 1 |
| 3 | Monarch | 1 |
| 4 | Cadillac | 2 |
| 5 | Dart | 2 |
etc.

Categories