Make MySQL table unique - php

Hay, I created a spider to crawl through a PDF document and log every word in the document into a table in a MySQL database.
Obviously words like 'the', 'and', 'or' etc appear in a book many, many times.
I'm just wondering what's the quickest method to remove dupe values from a table?

Create a table without indexing the words and put in all the words from the book using mass inserts (you could also use LOAD DATA). When you're done with insertions, add a new Index on the word field
Then create a second table using:
CREATE TABLE newTable SELECT DISTINCT word FROM oldTable

Instead of removing duplicates, you could make sure that no duplicates ever make it into the table.
Presuming your table has only 2 fields, id and word:
INSERT INTO table SELECT null, 'word' FROM table WHERE NOT EXISTS (SELECT * FROM table WHERE word = 'word') LIMIT 1;
This will insert the word into the table only if it's not already in there

If you can rerun the script to populate the database, you could add a unique key on the "word" field and instead of INSERT INTO do a REPLACE INTO. This will delete the previous instance of the record before adding a duplicate field. This may not be the most efficient way to do it, but it's rather simple. See here for more details:
http://dev.mysql.com/doc/refman/5.0/en/replace.html

select distinct on word field, and then delete all rows that have a different id? I'm not a master in subqueries so no example atm :)

delete from words where idcolumn not in
(select min(idcolumn)
from words T2
where T2.plain = WordsTable.plain)
This works if you added (idcolumn, plain) for every word you found.
If you do not have an id column (pk) then you can use Anax's solution.
In addition to not inserting duplicates (codeburger comment), you can just set a unique index on your plain column.

Related

Mysql Insert .... select, obtain last insert ID

I have this query in php. It's an insert select copying from table2, but I need to get the IDs of the newly created rows and store them into an array. Here is my code:
$sql = "INSERT INTO table1 SELECT distinct * from table2";
$db->query($sql);
I could revert the flow starting with a select on table2 and making all single inserts but it would slow down the script on a big table. Ideas?
You could lock the table, insert the rows, and get the ID of the last item inserted, and then unlock; that way you know that the IDs will be contiguous as no other concurrent user could have changed them. Locking and unlocking is something you want to use with caution though.
An alternative approach could be to use one of the columns in the table - either an 'updated' datetime column, or an insert-id column (for which you put in a value that will be the same across all of your rows.)
That way you can do a subsequent SELECT of the IDs back out of the database matching either the updated time or your chosen insert ID.

Uniquely identify same column name of two different table after join - mysql

I have 2 tables. suppose a & b
a has id, name, roll. b has id,group,name
This name column data are not same. How can I select and uniquely identify them?
I know about
SELECT a.id,a.name,a.group FROM a,b ............
I know this. But this is an example. I am working with huge amount of data with 20-30 columns in each table. So I don't want to write the column names I need to select rather I want to write the names that I want to exclude.
Like
SELECT * Except b.name............
OR is there any way to uniquely identify after join. Like
.......... a,b WHERE a.name as name1
Please don't ask why those column names are same. I admit it was a mistake. But it's already implemented and heavily used. So finding another way. Is there any simple way to exclude a column while merging them?
Well, you can't write the names you wish to exclude. That is not how SQL works.
However, if writing out 20-30 column names is that much of a burden, you can use information_schema.columns. I write it that way, because 20-30 column names is not particularly large and writing them out is probably less effort than writing the question.
But, back to the solution. It looks something like this:
select concat(c.column_name, ' as ', 'a_', column_name, ', ')
from information_schema.columns c
where table_name = 'a' ;
You might want to include the table schema as well.
As an IDEA, what you can do is, if you want to avoid columns of specific table & your statements have multiple table, you can try following,
Suppose you have 20 columns in table a & 5 columns in table b, you want to avoid col2,col3 & col4 of table b. Standard method is that you should write name of all columns of table a & required columns of table b. But you can avoid to write long list of 20 columns of table by writing a.* & then type required columns of table b. Please see below statement.
Select a.*,b.col1,b.col4,b.col5 from a,b
But if you require to exclude some columns from both table, then I think there is no other way than writing all required column names from both table.
There is no way to exclude a column in SQL SELECT Statement, you can only select a column. You can give alias name to columns while selecting them like below, so that you can identity columns using those alias names.
SELECT a.id as [column1],a.name as [column2],a.group as [column3] FROM a,b ............
There is no way to exclude a specific column but you can avoid to write all columns name and easy your job by below steps-
Step1: Execute below query-
SELECT a.*,b.* FROM a,b ............limit 1;
Step2: Export it into csv format with headings.
Step3: Copyp first (heading) row from csv.
Step4: Delete columns, those are not required and use other columns in your query.
There's only one waY i could see-
first create a temorary table
CREATE TEMPORARY TABLE IF NOT EXISTS mytable
(id int(11) NOT NULL, PRIMARY KEY (id)) ENGINE=MyISAM;
then put your column in temporary table-
SELECT * INTO mytable
FROM YourTable
/* Drop the cloumns that are not needed */
ALTER TABLE mytable
DROP COLUMN ColumnToDrop
/* Get results and drop temp table */
SELECT * FROM #TempTable
DROP TABLE #TempTable

Mysql with regular expression

I have a query regarding regular expression.I have design a table which contain three column one column contain member ids which are separated by commas.I am showing you my table structure.Please follow
send_id member_id
1 1211,23,34
2 1,23
I want to select only send_id 2 data which contain member_id as 1.
this is the query that i am using
SELECT * FROM table WHERE column REGEXP '^[1]+$';
but this query giving me both row.Please help me.
With Regards
Rahul
Never store separate values in one column
Normalize your structure like
send_id member_id
1 1211
1 23
1 34
2 1
2 23
If you still want your regex, then it will be
SELECT * FROM t WHERE column REGEXP '(^|[^0-9])1([^0-9]|$)'
First, you should be normalizing your data so you're not in this horrible mess in the first place. Here's a good resource explaining normalization.
Second, I believe your problem lies with your regular expression. Try this instead:
SELECT * FROM table WHERE column REGEXP '^[1]$';
The regular expression you're using uses the [1]+ group. The + means it has to match [1] 1 or more times, hence why you're getting two rows instead of one. Removing the + means it will match [1] once.
However, that still won't fix your problem, as more than one row contains 1. This is why normalization is so important.
Having multiple values inside a column isn't a good practice for designing a DB.
You should normalize your data, i.e., put just one piece of atomic information inside each element of your table.
You can find more information regarding to this in Wikipedia:
http://en.wikipedia.org/wiki/Database_normalization
Like they have told you, perfect solution would be normalize your data, I think Alma Do Mundo answer explains it quite well.
If you want to use REGEXP anyway you have to take in account four approaches; id is the only one, id is the first, id is in the middle and id is at the end. I have use id=74 for the example:
SELECT * FROM table WHERE member_id REGEXP '(^74$|^74,|,74,|,74$)';
depending on your requirements, you should either normalize your data i.e. make 3 tables, one with the send ID, one with the member id, and one that combines the two, then you can link them up with INNER JOINS.
However, if you are going to do it that way, you can use a "WHERE member_id LIKE %1%" to pull in all the relevant fields. You'll have to use the application to filter the relevant records.
In any case, if you're not going to normalize the data you will have to use the front end to filter out the results.
An example of the inner join syntax would look like this
SELECT * FROM SendTable
JOIN Send_Member ON SendTable.send_id = Send_Member.send_id
JOIN Member ON Member.member_id = Send_Member.member_id
WHERE Member.member_id = 1;
where the schema looks like:
Sendtable:
send_Id (primary key)
...other fields
Send_Member:
send_id (primary key and foreign key to SendTable)
member_id (primary key and foreign key to member)
...any fields you might want that are relevant to the particular send table and member table link
Member:
member_id (primarykey)
...other fields

how to check whether value exists or not in mysql database column while inserting

i have a contactnumber column in mysql database. In contactnumber column there are more than 20,000 entries. Now when i upload new numbers through .csv file, i dont want duplicate numbers in database.
How can i avoid duplicate numbers while inserting in database.
I initially implemented logic that checks each number in .csv file with each of the number in database.
this works but takes lot of time to upload .csv file containing 1000 numbers.
Pleae suggest how to minimize time required to upload .csv file while not uploading duplicate values.
Simply add a UNIQUE constraint to the contactnumber column:
ALTER TABLE `mytable` ADD UNIQUE (`contactnumber`);
From there you can use the IGNORE option to ignore the error you'd usually be shown when inserting a duplicate:
INSERT IGNORE INTO `mytable` VALUES ('0123456789');
Alternatively, you could use the ON DUPLICATE KEY UPDATE to do something with the dupe, as detailed in this question: MySQL - ignore insert error: duplicate entry
If your contactnumber should not be repeated then make it PRIMARY or at least a UNIQUE key. That way when a value is being inserted as a duplicate, insert will fail automatically and you won't have to check beforehand.
The way I would do it is to create a temporary table.
create table my_dateasyyyymmddhhiiss as select * from mytable where 1=0;
Do your inserts into that table.
and then query out the orphans on the between mytable and the temp table based on contactnumber
then run an inner join query between the two tables and fetch out the duplicate for your telecaller tracking.
finally drop the temporary table.
Thing that this does not address are duplicates within the supplied file (don't know if that would be an issue in this problem)
Hope this help
If you don't want to insert duplicate values in table and rather wants to keep that value in different table.
You can create trigger on table.
like this:
DELIMITER $$
CREATE TRIGGER unique_key BEFORE INSERT ON table1
FOR EACH ROW BEGIN
DECLARE c INT;
SELECT COUNT(*) INTO c FROM table1 WHERE itemid = NEW.itemid;
IF (c > 0) THEN
insert into table2 (column_name) values (NEW.itemid);
END IF;
END$$
DELIMITER ;
I would recommend this way
Alter the contactnumber column as UNIQUE KEY
Using phpmyadmin import the .csv file and check the option 'Do not abort on INSERT error' under Format-Specific Options before submitting

How do I delete a row from one table, when no more rows exist in another

I would like to know the best way to check one table of data, and when no more rows exist matching the WHERE clause, delete a row from another table.
I have tried myself but it has become too cumbersome with 6 queries and nested if/else, and it doesn't work to top it off.
I have never used SQL join's before, so examples will help me to understand responses.
I have a table of devices, there is a master table with a device and a password.
There is a second table containing the multiple rows of the device in the above table, and a series of serial numbers.
When the second table no longer contains any of the serial numbers listed in the master table, I want the row containing the device and password from the master table.
If you mean like when you have a table customer and a table order, delete the customers if they have no orders? Then you can use subselect:
delete from customer where customerid not in (select customerid from order)
You coult make a DELETE statement like
DELETE FROM masterTable WHERE ID NOT IN (SELECT masterTableID FROM secondaryTable)
This would delete all the rows from the master-table which don't have any references in the second table. That also means it would not delete only one row, but all of the matching ones. The only necessary thing you need is that every row in the second table references to the master table.
DELETE table_devices
FROM table_devices
left JOIN serial ON table_devices.id= serial.device_id
WHERE serial.device_id is null

Categories