I am writing a converter to transfer data from old systems to new systems. I am using php+mysql.
I have one table that contains millions records with duplicate entries. I want to transfer that data in a new table and remove all entries. I am using following queries and pseudo code to perform this task
select *
from table1
insert into table2
ON DUPLICATE KEY UPDATE customer_information = concat('$firstName',',','$lastName')
It takes ages to process one table :(
I am pondering that is it possible to use group by and get all grouped record automatically?
Other than going through each record and checking duplicate etc.?
For example
select *
from table1
group by firstName, lastName
insert into table 2 only one record and add all users'
first last name into column ALL_NAMES with comma
EDIT
There are different records for each customers with different information. Each row is called duplicated if first and last name of user is same. In new table, we will just add one customer and their bought product in different columns (we have only 4 products).
I don't know what you are trying to do with customer_information, but if you just want to transfer the non-duplicated set of data from one table to another, this will work:
INSERT IGNORE INTO table2(field1, field2, ... fieldx)
SELECT DISTINCT field1, field2, ... fieldx
FROM table1;
DISTINCT will take care of rows that are exact duplicates. But if you have rows that are only partial duplicates (like the same last and first names but a different email) then IGNORE can help. If you put a unique index on table2(lastname,firstname) then IGNORE will make sure that only the first record with lastnameX, firstnameY from table1 is inserted. Of course, you might not like which record of a pair of partial duplicates is chosen.
ETA
Now that you've updated your question, it appears that you want to put the values of multiple rows into one field. This is, generally speaking, a bad idea because when you denormalize your data this way you make it much less accessible. Also, if you are grouping by (lastname, firstname), there will not be names in allnames. Because of this, my example uses allemails instead. In any event, if you really need to do this, here's how:
INSERT INTO table2(lastname, firstname, allemails)
SELECT lastname, firstname, GROUP_CONCAT(email) as allemails
FROM table1
GROUP BY lastname, firstname;
If they are really duplicate rows (every field is the the same) then you can use:
select DISTINCT * from table1
instead of :
select * from table1
Related
I have this query in php. It's an insert select copying from table2, but I need to get the IDs of the newly created rows and store them into an array. Here is my code:
$sql = "INSERT INTO table1 SELECT distinct * from table2";
$db->query($sql);
I could revert the flow starting with a select on table2 and making all single inserts but it would slow down the script on a big table. Ideas?
You could lock the table, insert the rows, and get the ID of the last item inserted, and then unlock; that way you know that the IDs will be contiguous as no other concurrent user could have changed them. Locking and unlocking is something you want to use with caution though.
An alternative approach could be to use one of the columns in the table - either an 'updated' datetime column, or an insert-id column (for which you put in a value that will be the same across all of your rows.)
That way you can do a subsequent SELECT of the IDs back out of the database matching either the updated time or your chosen insert ID.
I have a voting script which pulls out the number of votes per user.
Everything is working, except I need to now display the number of votes per user in order of number of votes. Please see my database structure:
Entries:
UserID, FirstName, LastName, EmailAddress, TelephoneNumber, Image, Status
Voting:
item, vote, nvotes
The item field contains vt_img and then the UserID, so for example: vt_img4 and both vote & nvotes display the number of votes.
Any ideas how I can relate those together and display the users in order of the most voted at the top?
Thanks
You really need to change the structure of the voting table so that you can do a normal join. I would strongly suggest adding either a pure userID column, or at the very least not making it a concat of two other columns. Based on an ID you could then easily do something like this:
select
a.userID,
a.firstName,
b.votes
from
entries a
join voting b
on a.userID=b.userID
order by
b.votes desc
The other option is to consider (if it is a one to one relationship) simply merging the data into one table which would make it even easier again.
At the moment, this really is an XY problem, you are looking for a way to join two tables that aren't meant to be joined. While there are (horrible, ghastly, terrible) ways of doing it, I think the best solution is to do a little extra work and alter your database (we can certainly help with that so you don't lose any data) and then you will be able to both do what you want right now (easily) and all those other things you will want to do in the future (that you don't know about right now) will be oh so much easier.
Edit: It seems like this is a great opportunity to use a Trigger to insert the new row for you. A MySQL trigger is an action that the database will make when a certain predefined action takes place. In this case, you want to insert a new row into a table when you insert a row into your main table. The beauty is that you can use a reference to the data in the original table to do it:
CREATE TRIGGER Entries_Trigger AFTER insert ON Entries
FOR EACH ROW BEGIN
insert into Voting values(new.UserID,0,0);
END;
This will work in the following manner - When a row is inserted into your Entries table, the database will insert the row (creating the auto_increment ID and the like) then instantly call this trigger, which will then use that newly created UserID to insert into the second table (along with some zeroes for votes and nvotes).
Your database is badly designed. It should be:
Voting:
item, user_id, vote, nvotes
Placing the item id and the user id into the same column as a concatenated string with a delimiter is just asking for trouble. This isn't scalable at all. Look up the basics on Normalization.
You could try this:
SELECT *
FROM Entries e
JOIN Voting v ON (CONCAT('vt_img', e.UserID) = v.item)
ORDER BY nvotes DESC
but please notice that this query might be quite slow due to the fact that the join field for Entries table is built at query time.
You should consider changing your database structure so that Voting contains a UserID field in order to do a direct join.
I'm figuring the Entries table is where votes are cast (you're database schema doesn't make much sense to me, seems like you could work it a little better). If the votes are actually on the Votes table and that's connected to a user, then you should have UserID field in that table too. Either way the example will help.
Lets say you add UserID to the Votes table and this is where a user's votes are stored than this would be your query
SELECT Users.id, Votes.*,
SUM(Votes.nvotes) AS user_votes
FROM Users, Votes
WHERE Users.id = Votes.UserID
GROUP BY Votes.UserID
ORDER BY user_votes
USE ORDER BY in your query --
SELECT column_name(s)
FROM table_name
ORDER BY column_name(s) ASC|DESC
Using PHP and MySQL, I have a query that will look something like this:
UPDATE mytable
SET status='$newstatus'
WHERE (col1='$col1[0]'AND col2='$col2[0]')
OR (col1='$col1[1]'AND col2='$col2[1]')
OR (...);
I actually need to record the current 'status' of each of these rows before the update. Do I need to do a separate SELECT before this, or can (should / how would) I combine the two queries?
You cannot get that from this query (you could only get number of affected rows, but that's it). If you need that, you shall first do SELECT on your conditions like:
SELECT `id` FROM `mytable`
WHERE (`col1`='$col1[0]' AND `col2`='$col2[0]')
OR (`col1`='$col1[1]' AND `col2`='$col2[1]')
OR (...)
and then do UPDATE with WHERE using fetched ids. I do not recommend doing UPDATE with your current WHERE clause as in meantime (between your SELECT and UPDATE) db content could change, so you could be UPDATING different rows that you had SELECTed. Or use table locking (but I do not think it makes sense here).
No OUTPUT clause in Mysql. You need to either read status prior to update or create a trigger that stores value of OLD.status in other table.
You can't have a single query to update the row and record the current status before updating.
You'd better have a "log table", with the same schema of your "table" plus a timestamp, but it would store only historical data, the status of a row in a single point in time, like a versioning system.
Example:
Table User: Id, Username, Email, Telephone
Table UserLog: Id, Username, Email, Telephone, Timestamp
So, before updating a row on table User, you'd first do a SELECT and an INSERT, like this:
insert into UserLog
select Id, Username, Email, Telephone, Now() from User where Id=$Id
I am currently working on a school system where we have a parent course and a child course (meta_courses in Moodle).
So, we have a table mdl_course_meta and it has 3 fields. Id, parent_course and child_course.
My problem is that a parent course can have many child courses so that means, for example, a parent_course = 50 can appear two times in the table which means it has 2 child courses. I just want to be able to find all the parent courses without it returning the same value twice or more times. I'm currently using this query right now which obviously doesn't do what I want:
$q = "SELECT * FROM mdl_course_meta";
I am working with PHP as well by the way.
Thanks a lot.
SELECT DISTINCT parent_course from mdl_course_meta
That should do it if you just want the course names. One thing to keep in mind, if you want other fields this is not going to work the way you want it to(how would it know which record to choose if there are multiple records with the same parent_course and you only want one).
This approach can only be used if you only want to return the parent_courses without duplicates.
DISTINCT helps to eliminate duplicates. If a query returns a result that contains duplicate rows, you can remove duplicates to produce a result set in which every row is unique. To do this, include the keyword DISTINCT after SELECT and before the output column list.
$q = "SELECT DISTINCT parent_course FROM mdl_course_meta";
If you don't want duplicate values in a single column, use GROUP BY parent_course.
In this way you are free to select any column.
If you only want distinct values for a particular column column, then you can use GROUP BY:
SELECT *
FROM mdl_course_meta
GROUP BY parent_course
The values in the other columns will be arbitrary. This will work in MySQL 5.x.
MySQL 4.x won't let you be arbitrary, so you can't mix aggregate and non-aggregate columns. Instead, you'd have to do something like this, which gets a bit complicated:
SELECT MAX(col1), MAX(col2), parent_course, MAX(col4), ...
FROM mdl_course_meta
GROUP BY parent_course
This way, the values aren't arbitrary. You've specified the ones you want.
I have two tables. One table is meant to serve as a transaction history and the other is a log of member details. When a report is run, I want to move pieces of the member details into the transaction history but ALSO update some field records which would not otherwise exist.
Is it possible to select all records which meet a specific criteria, insert only pieces of the matching row into another table AND update other fields in a single query?
For example:
In table 2, i have member name, date registered, and memberid. I want to move the above records into table 1 but also update the field (status) equal to 'processed'.
Note: I am also using php and pdo to connect to a mysql database.
Is this possible within a single query?
You didn't specify whether the rows you want to update are the same as the ones you are inserting. I am assuming they are:
insert into table1
(member_name, date_registered, memberid, status)
select member_name, date_registered, memberid, 'processed'
from table2
where SomeField = MyCriteria
After much consideration - I decided to use ircmaxell's advice and simply run multiple queries. It ends up not only making things easier but allows me to customize my sorting much easier.
As he said above, "Don't get caught in the trap of less is always better"
Yes:
SELECT *, "processed" INTO table2 FROM table1
You will have to adapt based on the table structures, perhaps even write out all the fields:
SELECT field1, field2, field3, "processed" INTO table2 FROM table1
Of note, this assumes you want to write into table 2 including the processed variable (Might I suggest a boolean?) if you want the "Processed" in the other table it will get more complicated.
Edit: Apparently mysql doesn't support select into so...
INSERT INTO table2 SELECT field1, field2, field3, "processed" FROM table1
Redfilters code works