How to remove duplicate entries from MySql database table

How to remove duplicate entries from MySql database table - php

I had heavy SQL dump of a table. I used bigdump lib to import it in MySql database on my server.
Although it worked fine, but now I have duplicated entries in that table.
same table on local server has 8 * 105 records but on server it has 15 * 105 records.
Can you suggest me a query to delete duplicate entries from this table?
Here is my table structure.
Table name is : techdata_products
P.S. This table does not have any primary key.

SQL is not my strong point but I think you can export the result of this query:
SELECT DISTINCT * FROM table;
And then, create a new table and import your results.

First starters why do you have no primary key? You could have simply made that id field that auto increments a primary key to prevent duplicates. My suggestion would be to create a new table and do a
Select Distinct * from table and put the results into a new table that has a primary key

Related

How would I go about automatically shifting the auto-increment id element in my sql table after I delete an element? [duplicate]

How would I reset the primary key counter on a sql table and update each row with a new primary key?

I would add another column to the table first, populate that with the new PK.
Then I'd use update statements to update the new fk fields in all related tables.
Then you can drop the old PK and old fk fields.
EDIT: Yes, as Ian says you will have to drop and then recreate all foreign key constraints.

Not sure which DBMS you're using but if it happens to be SQL Server:
SET IDENTITY_INSERT [MyTable] ON
allows you to update/insert the primary key column. Then when you are done updating the keys (you could use a CURSOR for this if the logic is complicated)
SET IDENTITY_INSERT [MyTable] OFF
Hope that helps!

This may or not be MS SQL specific, but:
TRUNCATE TABLE resets the identity counter, so one way to do this quick and dirty would be to
1) Do a Backup
2) Copy table contents to temp table:
3) Copy temp table contents back to table (which has the identity column):
SELECT Field1, Field2 INTO #MyTable FROM MyTable
TRUNCATE TABLE MyTable
INSERT INTO MyTable
(Field1, Field2)
SELECT Field1, Field2 FROM #MyTable
SELECT * FROM MyTable
-----------------------------------
ID Field1 Field2
1 Value1 Value2

Why would you even bother? The whole point of counter-based "identity" primary keys is that the numbers are arbitrary and meaningless.

you could do it in the following steps:
create copy of yourTable with extra column new_key
populate copyOfYourTable with the affected rows from yourTable along with desired values of new_key
temporarily disable constraints
update all related tables to point to the value of new_key instead of the old_key
delete affected rows from yourTable
SET IDENTITY_INSERT [yourTable] ON
insert affected rows again with the new proper value of the key (from copy table)
SET IDENTITY_INSERT [yourTable] OFF
reseed identity
re-enable constraints
delete the copyOfYourtable
But as others said all that work is not needed.
I tend to look at the identity type primary keys as if they were equivalent of pointers in C, I use them to reference other objects but never modify of access them explicitly

If this is Microsoft's SQL Server, one thing you could do is use the [dbcc checkident](http://msdn.microsoft.com/en-us/library/ms176057(SQL.90).aspx)
Assume you have a single table that you want to move around data within along with renumbering the primary keys. For the example, the name of the table is ErrorCode. It has two fields, ErrorCodeID (which is the primary key) and a Description.
Example Code Using dbcc checkident
-- Reset the primary key counter
dbcc checkident(ErrorCode, reseed, 7000)
-- Move all rows greater than 8000 to the 7000 range
insert into ErrorCode
select Description from ErrorCode where ErrorCodeID >= 8000
-- Delete the old rows
delete ErrorCode where ErrorCodeID >= 8000
-- Reset the primary key counter
dbcc checkident(ErrorCode, reseed, 8000)
With this example, you'll effectively be moving all rows to a different primary key and then resetting so the next insert takes on an 8000 ID.
Hope this helps a bit!

How to speed up my mysql JOIN query?

I am developing one social chatting application. In my app having 5000 users. I want to fetch username which was last 1 hour in online.
I have two tables users and messages. My database is very heavy. users table having 4983 records and messages table having approximately 15 millions records. I want to show 20 users which user sending message between last 1 hour.
My Query -
SELECT a.username,a.id FROM users a JOIN messages b
WHERE a.id != ".$getUser['id']." AND
a.is_active=1 AND
a.is_online=1 AND
a.id=b.user_id AND
b.created > DATE_SUB(NOW(), INTERVAL 1 HOUR)
GROUP BY b.user_id
ORDER BY b.id DESC LIMIT 20
Users Table -
Messages Table -
Above query working fine. But my query is getting too much slow. And some times page hanged out. I want to get faster record.
Note - $getUser['id'] is login user id.
Any idea?

You can use indexes
A database index is a data structure that improves the speed of
operations in a table. Indexes can be created using one or more
columns, providing the basis for both rapid random lookups and
efficient ordering of access to records.
While creating index, it should be considered that what are the
columns which will be used to make SQL queries and create one or more
indexes on those columns.
Practically, indexes are also type of tables, which keep primary key
or index field and a pointer to each record into the actual table.
The users cannot see the indexes, they are just used to speed up
queries and will be used by Database Search Engine to locate records
very fast.
INSERT and UPDATE statements take more time on tables having indexes
where as SELECT statements become fast on those tables. The reason is
that while doing insert or update, database need to insert or update
index values as well.
Simple and Unique Index:
You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table
CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);
You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.
CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)
You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.
If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.
mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)
ALTER command to add and drop INDEX:
There are four types of statements for adding indexes to a table:
ALTER TABLE tbl_name ADD PRIMARY KEY (column_list):
This statement adds a PRIMARY KEY, which means that indexed values must be unique and cannot be NULL.
ALTER TABLE tbl_name ADD UNIQUE index_name (column_list):
This statement creates an index for which values must be unique (with the exception of NULL values, which may appear multiple times).
ALTER TABLE tbl_name ADD INDEX index_name (column_list):
This adds an ordinary index in which any value may appear more than once.
ALTER TABLE tbl_name ADD FULLTEXT index_name (column_list):
This creates a special FULLTEXT index that is used for text-searching purposes.
Here is the example to add index in an existing table.
mysql> ALTER TABLE testalter_tbl ADD INDEX (c);
You can drop any INDEX by using DROP clause along with ALTER command. Try out the following example to drop above-created index.
mysql> ALTER TABLE testalter_tbl DROP INDEX (c);
You can drop any INDEX by using DROP clause along with ALTER command. Try out the following example to drop above-created index.
ALTER Command to add and drop PRIMARY KEY:
You can add primary key as well in the same way. But make sure Primary Key works on columns, which are NOT NULL.
Here is the example to add primary key in an existing table. This will make a column NOT NULL first and then add it as a primary key.
mysql> ALTER TABLE testalter_tbl MODIFY i INT NOT NULL;
mysql> ALTER TABLE testalter_tbl ADD PRIMARY KEY (i);
You can use ALTER command to drop a primary key as follows:
mysql> ALTER TABLE testalter_tbl DROP PRIMARY KEY;
To drop an index that is not a PRIMARY KEY, you must specify the index name.
Displaying INDEX Information:
You can use SHOW INDEX command to list out all the indexes associated with a table. Vertical-format output (specified by \G) often is useful with this statement, to avoid long line wraparound:
Try out the following example:
mysql> SHOW INDEX FROM table_name\G

Optimize Your query by removing mysql function like date_sub , and do the same in php and pass it
DATE_SUB PHP VERSION

How to migrate data to one table to another in MySQL

I've two same tables(same table columns and primary key) in two different databases. I want to add 2nd table data to the first table that not exist in the first table (according to the primary key).
what is the best method to do that?
I can export 2nd table data as csv, php array or sql file.
Thanks

There are lots of ways to do this.
The simplest is probably this one:
INSERT IGNORE
INTO table_1
SELECT *
FROM table_2
;
which allows those rows in table_1 to supersede those in table_2 that
have a matching primary key, while still inserting rows with new
primary keys.
Alternatively, you can use a subquery to find out the rows that are not shared by both tables and insert them. If you've got a lot of records, you may want to consider using a temporary table to speed up the process.

How to import "a lot" of data to MySQL with PHP and foreign keys?

I have these tables:
create table person (
person_id int unsigned auto_increment,
person_key varchar(40) not null,
primary key (person_id),
constraint uc_person_key unique (person_key)
)
-- person_key is a varchar(40) that identifies an individual, unique
-- person in the initial data that is imported from a CSV file to this table
create table marathon (
marathon_id int unsigned auto_increment,
marathon_name varchar(60) not null,
primary key (marathon_id)
)
create table person_marathon (
person_marathon _id int unsigned auto_increment,
person_id int unsigned,
marathon_id int unsigned,
primary key (person_marathon_id),
foreign key person_id references person (person_id),
foreign key marathon_id references person (marathon_id),
constraint uc_marathon_person unique (person_id, marathon_id)
)
Person table is populated by a CSV that contains about 130,000 rows. This CSV contains a unique varchar(40) for each person and some other person data. There is no ID in the CSV.
For each marathon, I get a CSV that contains a list of 1k - 30k persons. The CSV contains essentially just a list of person_key values that show which people participated in that specific marathon.
What is the best way to import the data into the person_marathon table to maintain the FK relationship?
These are the ideas I can currently think of:
Pull the person_id + person_key information out of MySQL and merge the person_marathon data in PHP to get the person_id in there before inserting into the person_marathon table
Use a temporary table for insert... but this is for work and I have been asked to never use temporary tables in this specific database
Don't use a person_id at all and just use the person_key field but then I would have to join on a varchar(40) and that's usually not a good thing
Or, for the insert, make it look something like this (I had to insert the <hr> otherwise it wouldn't format the whole insert as code):
insert into person_marathon
select p.person_id, m.marathon_id
from ( select 'person_a' as p_name, 'marathon_a' as m_name union
select 'person_b' as p_name, 'marathon_a' as m_name )
as imported_marathon_person_list
join person p
on p.person_name = imported_marathon_person_list.p_name
join marathon m
on m.marathon_name = imported_marathon_person_list.m_name
The problem with that insert is that to build it in PHP, the imported_marathon_person_list would be huge because it could easily be 30,000 select union items. I'm not sure how else to do it, though.

I've dealt with similar data conversion problems, though at a smaller scale. If I'm understanding your problem correctly (which I'm not sure of), it sounds like the detail that makes your situation challenging is this: you're trying to do two things in the same step:
import a large number of rows from CSV into mysql, and
do a transformation such that the person-marathon associations work through person_id and marathon_id, rather than the (unwieldy and undesirable) varchar personkey column.
In a nutshell, I would do everything possible to avoid doing both of these things in the same step. Break it into those two steps - import all the data first, in tolerable form, and optimize it later. Mysql is a good environment to do this sort of transformation, because as you import the data into the persons and marathons tables, the IDs are set up for you.
Step 1: Importing the data
I find data conversions easier to perform in a mysql environment than outside of it. So get the data into mysql, in a form that preserves the person-marathon associations even if it isn't optimal, and worry about changing the association approach afterwards.
You mention temp tables, but I don't think you need any. Set up a temporary column "personkey", on the persons_marathons table. When you import all the associations, you'll leave person_id blank for now, and just import personkey. Importantly, ensure that personkey is an indexed column both on the associations table and on the persons table. Then you can go through later and fill in the correct person_id for each personkey, without worrying about mysql being inefficient.
I'm not clear on the nature of the marathons table data. Do you have thousands of marathons to enter? If so, I don't envy you the work of handling 1 spreadsheet per marathon. But if it's fewer, then you can perhaps set up the marathons table by hand. Let mysql generate marathon IDs for you. Then as you import the person_marathon CSV for each marathon, be sure to specify that marathon ID in each association relevant to that marathon.
Once you're done importing the data, you have three tables:
* persons - you have the ugly personkey, as well as a newly generated person_id, plus any other fields
* marathons - you should have a marathon_id at this point, right? either newly generated, or a number you've carried over from some older system.
* persons_marathons - this table should have marathon_id filled in & pointing to the correct row in the marathons table, right? You also have personkey (ugly but present) and person_id (which is still null).
Step 2: Use personkey to fill in person_id for each row in the association table
Then you either use straight Mysql, or write a simple PHP script, to fill in person_id for each row in the persons_marathons table. If I'm having trouble getting mysql to do this directly, I'll often write a php script to deal with a single row at a time. The steps in this would be simple:
look up any 1 row where person_id is null but personkey is not null
look up that personkey's person_id
write that person_id in the associations table for that row
You can tell PHP to repeat this 100 times then end script, or 1000 times, if you keep getting timeout problems or anything like taht.
This transformation involves a huge number of lookups, but each lookup only needs to be for a single row. That's appealing because at no point do you need to ask mysql (or PHP) to "hold the whole dataset in its head".
At this point, your associations table should have person_id filled in for every row. It's now safe to delete the personkey column, and voila, you have your efficient foreign keys.

PHPMyAdmin / MySql - Add ID field and autopopulate ID numbers

I have an extremely large database table - nearly 20 million records.
The records do not have a unique ID number. So, I've inserted the new field.
Now, I would like to populate it with ID numbers, increasing by 1, starting with the first ID number being 10,000,001.
FYI - I am using WAMP on a local machine and I've dialed all my max times upto 5000 seconds and dialed up several other variables in php.ini and mysql.ini in order to do the upload in the first place (which took more than 10 hours!!).
In the past, or with other DB's, I might have exported the data into excel and then whipped up some text to paste back into phpmyadmin to UPDATE the records. This is fine when working with 5K records, or even 100K records, but this seems unmanagable with 20 million records.
Thanks in advance!!

Just run these two queries one after the other in the SQL tab:
ALTER TABLE mytable AUTO_INCREMENT=10000001;
ALTER TABLE mytable ADD `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MySQL will then create the id field and fill it in sequentially starting at 10000001.

This works in SQL Server, maybe you can adapt it to MySQL:
declare #value int
set #value=10000000
update your_table
set #value+=1,id=#value
It will update all your ID rows starting at 10000001 increasing by 1.
I hope, at least, gives you some ideas.

All you need to do is set the column to be AUTO_INCREMENT and mysql will number the rows for you. Let's say you want your new column to be named 'id'.
alter table yourtable auto_increment = 10000001;
alter table yourtable add id int unsigned primary key auto_increment;
You can issue these commands in the sql panel of phpMyAdmin -- just leave off the semicolon at the end.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.