I want to import records from Gmail into a table, and I do not need duplicates for each account.
Description:
I have a table named list with definition below:
id int(11),
account_id int(11),
email varchar(255),
phone varchar(30),
primary key(id),
FOREIGN KEY (account_id) REFERENCES accounts (id)
This table holds records for different accounts and an email can be considered valid for two or more accounts. This means that an email can repeat in a table but can only appear once for each account_id.
I imported my contacts from Gmail (which is above 700 contacts and other users may have more than that).
The challenge:
I have an option of running two queries (one to check if email or phone exists, the second to insert record) for each record which in my case is 1,400 SQL queries to enable me insert all imported records, ensuring there are no duplicates for each account_id in the list table.
I have looked at MySQL IGNORE and similar keywords like ON DUPLICATE KEY UPDATE but they do not seem to work in this scenario as I cannot make the email and phone columns unique as they can contain duplicate content.
What is the best way of inserting these 700 records ensuring that the email and phone are not repeated for each account_id without having to run 1,400 queries?
QUESTION UPDATE:
I do not think INSERT IGNORE CAN WORK HERE FOR THE FOLLOWING REASONS:
I cannot make email and phone unique columns
The phone number may be empty but with an email entry, this may break the unique pattern
QUESTION ILLUSTRATION
I have two offices using the table to store their customer records. Someone can be a customer to both offices. This means his record can appear twice in the table but can only appear once for each account_id in the table.
The challenge now is to insert several records into the table ensuring that a record does not repeat for each account_id.
What you are trying to achieve is not very clear to me, but it looks very much like you just need to add some two-columns unique constraints.
an email must be unique for one given account_id:
ALTER TABLE your_table ADD UNIQUE (account_id, email);
a phone number must be unique for one given account_id:
ALTER TABLE your_table ADD UNIQUE (account_id, phone);
Both indexes may exist at the same time on your table. Either could raise a "duplicate-key violation" error, and would trigger the IGNORE or the ON DUPLICATE clauses of your insertions.
That being said, there is an issue in your structure. You are about to duplicate your customers' details for each account_id they are in business with.
You should have a customers table that contains all your customer's contact details (and only that), another accounts table -- your "offices", if I understand it right -- and finally one relation table to model the n-n relationship between customers and accounts:
CREATE TABLE customers_accounts (
customer_id INT NOT NULL,
account_id INT NOT NULL,
PRIMARY KEY (customer_id, account_id),
FOREIGN KEY (customer_id) REFERENCES customers(id)
FOREIGN KEY (account_id) REFERENCES accounts(id)
);
You had the answer: use "INSERT IGNORE" but what you probably didn't do is add a composite unique index (mentioned by RamdomSeed above), and/or set blank fields to NULL.
1) Create composite index, using the account id. This means that the email must be unique for that user.
ADD UNIQUE(account_id, email)
2) Regarding the phone "may be blank" set this to NULL when blank. Unique indexes ignore NULLS. (A small gotcha, but probably plays in your favour here, and why it's like that. You can then also add
ADD UNIQUE(account_id, phone)
(Aside: general advice is that you don't usually have multiple uniques on a table as it can get confusing and messy, but it might be what you need and it's fine - so long as you can handle the logic)
Seems like you could use INSERT IGNORE assuming AccountId is your unique identifier:
INSERT IGNORE INTO table
SET field = someValue,
anotherfield = someothervalue
If however you can have the same accounts with multiple emails, then this may not be what you're looking for.
So it sounds like you're using a scripting language (php seems to be popular with mysql) to store an array of contacts from gmail?
If so, this insert statement will insert the record if the account id doesn't exist in the table already -- this uses an Outer Join with a Null check, but you can also use Not In or Not Exists as well:
Insert Into YourTable (Id, AccountId, Email, Phone)
Select t.Id, t.AccountId, t.Email, t.Phone
From (Select 1 Id, 1 AccountId, 'someemail' Email, 'somephone' Phone) t
Left Join YourTable t2 On t.AccountId = t2.AccountId
Where t2.AccountId Is Null
EDIT:
Assuming I'm understanding the comments, then just add to the Outer Join:
Insert Into YourTable (Id, AccountId, Email, Phone)
Select t.Id, t.AccountId, t.Email, t.Phone
From (Select 1 Id, 1 AccountId, 'someemail' Email, 'somephone' Phone) t
Left Join YourTable t2 On t.AccountId = t2.AccountId
And (t.email = t2.email Or t.phone = t2.phone)
Where t2.AccountId Is Null
This should ensure no accounts get reinserted if they have a matching phone or email.
Insert Into YourTable (Id, Account_Id, Email, Phone)
Select a.id, a.Account_Id, a.Email, a.Phone
From (Select t.id, t.Account_Id, t.Email, t.Phone from t
group by account_id,email,phone )a;
Suggest to import the records into a temp table (t). Then only filter the records into another table (yourtable) ie remove the duplicate as you like.
Related
I have 3 tables users, products and temp_table. i have imported an xlsx file having 1,00,000 records in to temp_table. now i have to insert these records to products table. and here i have to save user_id from users table to products table as well.
Note: user_id is dynamic(i.e. in xlsx file there is a column called email and I have created a new user for their email). so in products table user_id will be dynamically inserted.
I have used below query, but it is taking too much time. and sometimes my MySQL get locked.
INSERT INTO products
(user_id,
brand_id,
points_discount,
amount,
sub_total,
added_on)
SELECT users.user_id,
brand_id,
discount,
amount,
sub_total,
added_on
FROM temp_table
INNER JOIN users
ON email = users.email;
Please help me to solve this problem.
Create index on email column of temp_table and users tables.
Also ensure both the columns have same datatype
ALTER TABLE temp_table ADD KEY `idx_email` (`email`);
ALTER TABLE users ADD KEY `idx_email` (`email`);
I'd create an artisan command for this, which queries a given number of records from the temp_table (where I'd add a processed flag/column with boolean value) and either schedule this to run once each minute or run it from the terminal by yourself. Like query 10000-20000 entries/run (maybe more, depends on the server and environment) and process it.
Lets say i got two tables in mysql.
1. person (id, name, lastname) - Image
2. someothertable (id, name, lastname, action, quantity) - image
I wanted to ask, if its really bad practice, to update both tables at once? For example if someone updates the last name of Robert Jackson to "Smith" then do 2 queries:
mysql_query("UPDATE person SET lastname = '$lastname' WHERE id = '$id'");
mysql_query("UPDATE someothertable SET lastname = '$lastname' WHERE name = '$name' AND lastname = '$oldlastname'");
Assuming for now, you wont meet 2 same names and surnames (its just an example).
Is it strongly recommended, to join those two tables when displaying data from tables, and change last name only in person table?
I didn't have need to use join before (never had databases big enough), and I just started to wonder if there is another way to do this (than 2 queries). Using join will require some code changing, but i am ready to do it, if its right thing to do.
Using a join is not a function of how big your databases are, it's about normalization and data integrity. The reason you would have lastname in only one table is so that there's no need to worry about keeping values in sync. In your example, if those calls are in a single transaction, then they should stay in sync. Unless one of them is changed somewhere else, or manually in the database.
So an option for you would be to have these tables:
person (id, name, lastname)
someothertable (id, person_id, action, quantity)
Instead of using 2 update, you can use trigger : Tutorial here
One option would be to make someothertable have a foreign key constraint on the lastname field in Person. You could apply an update trigger so it would automatically cascade.
Here is an example below:
Alter table someothertable add constraint foreign key (lastname) references Person (lastname) on delete cascade on update cascade;
A generic version of that can be seen below:
Alter table [table-name] add constraint foreign key (field-in-current-table) references [other-table-name] (field-in-other-table) on delete cascade on update cascade;
This can be applied to any field in any table. You can then set the triggers to be appropriate for you. Here is a reference link.
Have you considered normalization?
Another option would be to assign each person in the Person table a uniqueID (i.e. PersonID). Now in all your other tables you where you reference a person you reference them by the unique id. This adds many advantages:
1) It keeps the data normalized
2) It maintains data integrity
3) No need for updates, triggers, or cascades
4) A change would only be required in one place
Hope this helps. Best of luck!
Been trying to wrap my head around this one, but have been unsuccessful.
Basically I am trying to create a database containing only two tables.
The first table is the logins table, and contains columns:
User ID
Username
Password
First Name
Last Name
The second table is the vote table, and contains columns:
User ID
Vote 1
Vote 2
Vote 3
etc. etc.
Is there anyway that I can relate these tables with the User ID as a primary key, that can cascade update/delete, so that when I add an entry into the logins table, it auto creates an entry in the votes table with the same User ID, and default value for all the vote columns?
I could add the votes to the main table, but wanted to seperate them out as much as possible, as not everyone will be creating votes.
Many thanks
Eds
If you want to cascade updates, you'll need to use a transaction:
START TRANSACTION;
INSERT INTO logins (username, passhash, salt, `first name`, `last name`)
VALUES ('$username', SHA2(CONCAT('$salt','$password'),512), '$salt',
'$firstname', '$lastname');
SELECT #userid:= LAST_INSERT_ID();
INSERT INTO votes (userid) VALUES (#userid);
COMMIT;
Make sure you have default values set in the table definition of the votes table.
Note the use of salted hashed passwords.
You could specify an insert trigger on the logins table.
This will execute each time a row is inserted into the logins table. You can specify what you'd like the trigger to do. In this case you can create the trigger to insert a single row into your vote table using the id just created in the login table along with your default values.
So something like:
CREATE TRIGGER logins_default_vote AFTER INSERT ON logins
FOR EACH ROW
INSERT INTO vote (UserID,Vote1,Vote2,Vote3)
VALUES (NEW.UserId,'vote1 default','vote2 default','vote3 default');
I have a table with 8 columns in, but over time I have picked up numerous duplicates. I have looked at the other question with a similar topic, but it does not solve the issue I am currently having.
+---------------------------------------------------------------------------------------+
| id | market | agent | report_name | producer_code | report_date | entered_date | sync |
+---------------------------------------------------------------------------------------+
What defines a unique entry is based on the market, agent, report_name, producer_code, and report_date fields. What I am looking for is a way to list all the duplicate entries and delete them. Or to just delete the duplicate entries.
I have thought about doing it with a script, but the table contains 2.5mil entries, and the time it would take would be unfeasible.
Could anybody suggest any alternatives? I have seen people get a list of duplicates using the following query, but not sure on how to adapt it to my situation:
SELECT id, count(*) AS n
FROM table_name
GROUP BY id
HAVING n > 1
Here are two strategies you might think about. You will have to adjust the columns used to select duplicates based upon what you actually consider a duplicate. I just included all of your listed columns other than the id column.
The first simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table. Of course, if you have any foreign key constraints you'll have to deal with those as well.
create table table_copy like table_name;
insert into table_copy
(id, market, agent, report_name, producer_code, report_date, entered_date, sync)
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync;
RENAME TABLE table_name TO table_old, table_copy TO table_name;
drop table table_old;
The second strategy, which just deletes the duplicates, uses a temporary table to hold the information about what rows have duplicates since MySQL won't allow you to select from the same table you are deleting from in a subquery. Simply create a temporary table with the columns that identify the duplicates plus an id column that will actually hold the id to keep and then you can do a multi-table delete where you join the two tables to select just the duplicates.
create temporary table dups
select min(id), market, agent, report_name, producer_code, report_date,
entered_date, sync
from table_name
group by market, agent, report_name, producer_code, report_date,
entered_date, sync
having count(*) > 1;
delete t
from table_name t, dups d
where t.id != d.id
and t.market = d.market
and t.agent = d.agent
and t.report_name = d.report_name
and t.producer_code = d.producer_code
and t.report_date = d.report_date
and t.entered_date = d.entered_date
and t.sync = d.sync;
You can find the dupes, based on your "key" fields, by doing:
select id, count(*) as row_count
from table
group by market, agent, report_name, producer_code, report_date
having (row_count > 1)
which you could then use in a delete script. Of course, you'd have to be very careful doing this, as it'll return ALL the duplicate rows, and you'd want to save at least ONE of those rows from each grouping.
Another easy way would be to
create a new table
put a UNIQUE index on the fields you need to be unique (a primary key is a special kind of unique index)
use INSERT IGNORE INTO newtable SELECT * FROM oldtable (ORDER BY if you want the last/first records to remain - should there be a difference in the other columns)
DROP the old table and RENAME the new table to the old table
You may also use Primary key on the columns the unique entries are based on, this will prevent adding new records with duplicate details.
I'm trying to figure out how to get a select statement to be populated by an ever-changing number of where's. This is for an order-status tracking application.
Basically, the idea is a user (customer of our company) logs in, and can see his/her orders, check status, etc. No problem. The problem arises when that user needs to be associated with multiple companies. Say they work or own two different companies, or they work for a company that owns multiple sub-companies, each ordering individually, but the big-shot needs to see everything ordered by all of the companies. This is where I'm running into a problem. I can't seem to figure out a good way of making this happen. The only thing I have come up with is this:
client='Client Name One' OR client='Client name two' AND hidden='0' OR client='Client name three' AND hidden='0' OR client='Client name four' AND hidden='0'
(note that client in the previous code refers to the user's company, thus our client)
placed inside of a column called company in my users table of the database. This then gets called like this:
$clientnamequery = "SELECT company FROM mtc_users WHERE username='testing'";
$clientnameresult = mysql_query($clientnamequery); list($clientname)=mysql_fetch_row($clientnameresult);
$query = "SELECT -redacted lots of column names- FROM info WHERE hidden='0' AND $clientname ORDER BY $col $dir";
$result = mysql_query($query);
Thing is, while this works I can't seem to make PHP add in the client=' and ' AND hidden='0' correctly. Plus, it's kind of kludgy.
Any ideas? Thanks in advance!
Expanding on Tim's answer, you can use the IN operator and subqueries:
SELECT *columns* FROM info
WHERE hidden='0' AND client IN
( SELECT company FROM co_members
WHERE username=?
)
ORDER BY ...
Or you can try a join:
SELECT info.* FROM info
JOIN co_members ON info.client = co_members.company
WHERE co_members.username=?
AND hidden='0'
ORDER BY ...
A join is the preferred approach. Among other reasons, it will probably be the most efficient (though you should test this with EXPLAIN SELECT ...). You probably shouldn't grab all table columns (the info.*) in case you can later change the table definition; I only put that in because I didn't know which columns you wanted.
On an unrelated note, look into using prepared queries with either the mysqli or PDO drivers. Prepared queries are more efficient when you execute a query multiple times and also obviate the need to sanitize user input.
The relational approach involves tables like:
CREATE TABLE mtc_users (
username PRIMARY KEY,
-- ... other user info
) ENGINE=InnoDB;
CREATE TABLE companies (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR() NOT NULL,
-- ... other company info
) ENGINE=InnoDB;
CREATE TABLE co_members (
username NOT NULL,
company NOT NULL,
FOREIGN KEY (`username`) REFERENCES mtc_users (`username`)
ON DELETE CASCADE
ON UPDATE CASCADE,
FOREIGN KEY (`company`) REFERENCES companies (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE,
INDEX (`username`, `company`)
) ENGINE=InnoDB;
If company names are to be unique, you could use those as a primary key rather than an id field. "co_members" is a poor name, but "employees" and "shareholders" didn't quite seem the correct terms. As you are more familiar with the system, you'll be able to come up with a more appropriate name.
You can use the IN keyword
client IN('client1','client2',...)