Speed up MySQL inner join with LIKE clause

Speed up MySQL inner join with LIKE clause - php

I have the following 2 tables, api_analytics_data, and telecordia.
CREATE TABLE `api_analytics_data` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`upload_file_id` bigint(20) NOT NULL,
`partNumber` varchar(100) DEFAULT NULL,
`clei` varchar(45) DEFAULT NULL,
`description` varchar(150) DEFAULT NULL,
`processed` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_aad_clei` (`clei`),
KEY `idx_aad_pn` (`partNumber`),
KEY `id_aad_processed` (`processed`),
KEY `idx_combo1` (`partNumber`,`clei`,`upload_file_id`)
) ENGINE=InnoDB CHARSET=latin1;
CREATE TABLE `telecordia` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`ProdID` varchar(50) DEFAULT NULL,
`Mfg` varchar(20) DEFAULT NULL,
`Pn` varchar(50) DEFAULT NULL,
`Clei` varchar(50) DEFAULT NULL,
`Series` varchar(50) DEFAULT NULL,
`Dsc` varchar(50) DEFAULT NULL,
`Eci` varchar(50) DEFAULT NULL,
`AddDate` date DEFAULT NULL,
`ChangeDate` date DEFAULT NULL,
`Cost` float DEFAULT NULL,
PRIMARY KEY (`tid`),
KEY `telecordia.ProdID` (`ProdID`) USING BTREE,
KEY `telecordia.clei` (`Clei`),
KEY `telecordia.pn` (`Pn`),
KEY `telcordia.eci` (`Eci`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Users upload data via a web interface using Excel/CSV files into api_analytics_data. The data contains EITHER the partNumbers or CLEIs. I then update the api_analytics_data table by joining the telecordia table. The telecordia table is the master list of partNumber and Cleis.
So if a user uploads a file of CLEIs, the update/join I use is:
update api_analytics_data aad
inner join telecordia t on aad.clei = t.Clei
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
It works quickly, but not very thoroughly. The problem I have is that the CLEI uploaded may only be a substring of the CLEI in the telecordia table.
For example, the uploaded CLEI may be "5SC1DX0". In the telcordia table, the correct matching row is:
tid: 184324
ProdID: 472467
Mfg: PLSE
Pn: AUA58-2-REV-E
Clei: 5SC1DX04AA
Series: null
Dsc: DL SGL-PTY POTS CU RT
Eci: 205756
AddDate: 1994-03-18
ChangeDate: 1998-04-13
Cost: null
So obviously my update doesn't work in this case, even though 5SC1DX0 and 5SC1DX04AA are the same part.
What I need is a wildcard search. However, when I try this, it is crazy slow. With about 4500 rows uploaded into the api_analytics_data table, it runs for about 10 minutes, and then loses the connection with the server.
update api_analytics_data aad
inner join telecordia t on aad.clei like concat(t.Clei,'%')
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
Is there a way to optimize this so that it runs quickly?

The correct answer is "no". The better course of action is to create a new column in telecordia with the correct Clei value in it, one that can be used for joining the tables. In the most recent versions of MySQL, this can even be a computed column and be indexed.
That said, you might be able to do something if the matching portion is always the same length. If so, try this:
update api_analytics_data aad inner join
telecordia t
on t.Clei = left(aad.clei, 7)
set aad.partNumber = t.Pn
where aad.partNumber is null and aad.upload_file_id = 5;
For this query, you want an index on api_analytics_data(upload_fiel_id, partNumber, clei) and telecordia(clei, pn).

Related

MYSQL query performs very slow

I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But when the database already has about 30 000 records the upload is very very slow. It takes about 11 hours to upload 20 000 records. I am just reading a CSV file via fgetcsv method.
if (($handle = fopen($filePath, "r")) !== FALSE) {
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
if (count($peopleData) == $fieldsCount) {
//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}
Below are the queries that run. (I am using Yii framework)
SELECT *
FROM `AdvanceBulkInsert` `t`
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv'
LIMIT 1
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1
If exist update the user:
UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE',
`mobile`='', `name`=NULL, `firstName`='Franck',
`lastName`='ALLEGAERT ', `username`=NULL,
`password`=NULL, `email`=NULL, `gender`=0,
`zip`='60310', `countryCode`='DZ',
`joinedDate`='2016-02-23 10:44:18',
`signUpDate`='0000-00-00 00:00:00',
`supporterDate`='2016-02-25 13:26:37', `userType`=3,
`signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29',
`reqruiteCount`=0, `keywords`='70,71,72,73,74,75',
`delStatus`=0, `city`='AMY', `isUnsubEmail`=0,
`isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL,
`totalDonations`=NULL, `isMcContact`=NULL,
`emailStatus`=NULL, `notes`=NULL,
`addressInvalidatedAt`=NULL,
`createdAt`='2016-02-23 10:44:18',
`updatedAt`='2016-02-25 13:26:37', `longLat`=NULL
WHERE `User`.`id`='51394'
If user don't exist, insert new record.
Table engine type is MYISAM. Only the email column has a index.
How can I optimize this to reduce the processing time?
Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. approx 6 hours.
Update
CREATE TABLE IF NOT EXISTS `User` (
`id` bigint(20) NOT NULL,
`address1` text COLLATE utf8_unicode_ci,
`mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
`zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
`joinedDate` datetime DEFAULT NULL,
`signUpDate` datetime NOT NULL COMMENT 'User signed up date',
`supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
`userType` tinyint(2) NOT NULL,
`signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
`isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
`dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
`reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
`keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
`delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
`city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
`isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
`longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
`isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
`profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
`totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
`isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
`emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
`notes` text COLLATE utf8_unicode_ci,
`addressInvalidatedAt` datetime DEFAULT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
`id` int(11) NOT NULL,
`source` varchar(256) NOT NULL,
`renameSource` varchar(256) DEFAULT NULL,
`countryCode` varchar(3) NOT NULL,
`userType` tinyint(2) NOT NULL,
`size` varchar(128) NOT NULL,
`errors` varchar(512) NOT NULL,
`status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
`createdAt` datetime NOT NULL,
`createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `CustomField` (
`id` int(11) NOT NULL,
`customTypeId` int(11) NOT NULL,
`fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`defaultValue` text COLLATE utf8_unicode_ci,
`sortOrder` int(11) NOT NULL DEFAULT '0',
`enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
`onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listValues` text COLLATE utf8_unicode_ci,
`label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomValue` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`relatedId` int(11) NOT NULL,
`fieldValue` text COLLATE utf8_unicode_ci,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Entire PHP Code is here http://pastie.org/10737962
Update 2
Explain output of the Query

Indexes are your friend.
UPDATE User ... WHERE id = ... -- Desperately needs an index on ID, probably PRIMARY KEY.
Similarly for renameSource.
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1;
Needs INDEX(firstName, lastName, dateOfBirth); the fields can be in any order (in this case).
Look at each query to see what it needs, then add that INDEX to the table. Read my Cookbook on building indexes.

Try these things to increase your query performance:
define indexing in your database structure, and get only columns that you want.
Do not use * in select query.
And do not put ids in quotes like User.id='51394', instead do User.id= 51394.
If you are giving ids in quotes then your indexing will not work. That approach improve your query performance by 20% faster.
If you are using ENGINE=MyISAM then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB. And create some indexing like foreign keys, full text indexing.

If I understand, for all the result of SELECT * FROM AdvanceBulkInsert ... you run a request SELECT cf.*, and for all the SELECT cf.*, you run the SELECT * FROM User
I think the issue is that you send way too much requests to the base.
I think you should merge all your select request in only one big request.
For that:
replace the
SELECT * FROM AdvanceBulkInsert by a EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...) or a JOIN
replace the SELECT * FROM User by a NOT EXISTS IN(SELECT * from User WHERE )
Then you call the update on all the result of the merged select.
You should too time one by one your request to find which of this requests take the most time, and
you should too use ANALYSE
to find what part of the request take time.
Edit:
Now I have see your code :
Some lead:
have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?
you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.
You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?
if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label

When you need to find out why a query takes long, you need to inspect individual parts. As you shown in the question Explain statement can help you very much. Usually the most important columns are:
select_type - this should always be simple query/subquery. Related subqueries give a lot of troubles. Luckily you don't use any
possible keys - What keys is this select going to search by
rows - how many candidate rows are determined by the keys/cache and other techniques. Smaller number is better
Extra - "using" tells you how exactly are the rows found, this is the most useful information
Query analysis
I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. Here is the breakdown for the query that gives you troubles:
EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
Solution
Let me explain above list. Bold columns totally must have an index. Joining tables is expensive operation that otherwise needs to go through all rows of both tables. If you make index on the joinable columns the DB engine will find much faster and better way to do it. This should be common practice for any database
The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time.
So you need to add indicies to theese columns
CustomType - id
CustomField - customTypeId, id, relatedTable, enabled, onCreate, sortOrder, label
CustomValue - customFieldId
CustomFieldSubArea - customFieldId, subarea
To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).
Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code.

For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.
So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE
I copy/past my own code for example, but adapt him to your needs
$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
REPLACE
INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
$result = (bool)$this->db->query($dataload);
Where $filename is a local path of your CSV (you can use dirname(__FILE__) for get it )
This SQL command is very quick (just 1 or 2 second for add/update all the CSV)
EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.

You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).
Can you add a unique key on first name / last name / DOB and one on email address?
If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.
But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.
You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.
Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).
The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label

see to it you can try to reduce the query and check with sql online compiler check the time period then include under the project.

Always do bulk importing within a transation
$transaction = Yii::app()->db->beginTransaction();
$curRow = 0;
try
{
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
$curRow++;
//process $peopleData
//insert row
//best to use INSERT ... ON DUPLICATE KEY UPDATE
// a = 1
// b = 2;
if ($curRow % 5000 == 0) {
$transaction->commit();
$transaction->beginTransaction();
}
}
catch (Exception $ex)
{
$transaction->rollBack();
$result = $e->getMessage();
}
//don't forget the remainder.
$transaction->commit();
I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.

Optimization Needed For Dual Left Join Query

I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.
My situation is I have 3 tables:
/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */
CREATE TABLE `fans` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) DEFAULT NULL,
`middle_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`join_date` datetime DEFAULT NULL,
`twitter` varchar(255) DEFAULT NULL,
`twitterCrawled` datetime DEFAULT NULL,
`twitterImage` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;
/* A TABLE OF OUR TWITTER FOLLOWERS */
CREATE TABLE `twitterFollowers` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`screenName` varchar(25) DEFAULT NULL,
`twitterId` varchar(25) DEFAULT NULL,
`customerId` int(11) DEFAULT NULL,
`uniqueStr` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;
/* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS
IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */
CREATE TABLE `contentSuggestion` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`fanId` int(11) DEFAULT NULL,
`twitterAccountId` int(11) DEFAULT NULL,
`contentType` varchar(50) DEFAULT NULL,
`contentString` varchar(255) DEFAULT NULL,
`added` datetime DEFAULT NULL,
`dismissed` datetime DEFAULT NULL,
`uniqueStr` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unstr` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
What I'm trying to get is:
SELECT [fan columns]
WHERE fan screen name IS IN twitterfollowers
AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)
My attempts so far:
~33 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername
LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL
GROUP BY(fans.id) HAVING col1 != ''
~14 seconds
SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);
9.53 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)
I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.
Thanks for any help you can give me.

I would go with a variation of the second method. Instead of IN, use EXISTS. Then add the correct indexes and remove the aggregation:
SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
FROM twitterFollowers tf
WHERE f.emailUsername = tf.screenName
) AND
NOT EXISTS (SELECT 1
FROM contentSuggestion cs
WHERE f.emailUsername = cs.contentString AND
cs.dismissed IS NULL
) ;
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Some notes:
When using IN, don't use SELECT DISTINCT. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT in the subquery (it is redundant).
Historically, EXISTS was faster than IN in MySQL. The optimizer has improved in recent versions.
For performance, you need the correct indexes.
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Assuming that fan.id is unique (a very reasonable assumption), you don't need the final group by.

deleting inactive products from oscommerce

i am using latest oscommerce.
I got a huge amount of inactive products.I want to remove them.Going though admin one at a time is really slow.
I thought if i create a new temp category and move all inactive products to this temp category then using back end of oscommerce i can easily delete them.Doing this will also remove the associated image.
Products are associated via product id and categories association is done by product to category table. inactive products are set via products_status = 0;
CREATE TABLE IF NOT EXISTS `products` (
`products_id` int(11) NOT NULL AUTO_INCREMENT,
`products_quantity` int(4) NOT NULL,
`products_model` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`products_ean` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
`google_product_category` varchar(300) COLLATE utf8_unicode_ci DEFAULT NULL,
`products_image` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`products_price` decimal(15,4) NOT NULL,
`products_date_added` datetime NOT NULL,
`products_last_modified` datetime DEFAULT NULL,
`products_date_available` datetime DEFAULT NULL,
`products_weight` decimal(5,2) NOT NULL,
`products_status` tinyint(1) NOT NULL,
`products_tax_class_id` int(11) NOT NULL,
`manufacturers_id` int(11) DEFAULT NULL,
`products_ordered` int(11) NOT NULL DEFAULT '0',
`products_last_import` datetime DEFAULT NULL,
`products_submit_google` smallint(6) NOT NULL DEFAULT '1',
`icecat_prodid` int(10) unsigned NOT NULL,
`vendors_id` int(11) DEFAULT '1',
`products_availability` smallint(6) NOT NULL DEFAULT '0',
PRIMARY KEY (`products_id`),
KEY `idx_products_model` (`products_model`),
KEY `idx_products_date_added` (`products_date_added`),
KEY `idx_icecat_prodid` (`icecat_prodid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=292067 ;
CREATE TABLE IF NOT EXISTS `products_to_categories` (
`products_id` int(11) NOT NULL,
`categories_id` int(11) NOT NULL,
PRIMARY KEY (`products_id`,`categories_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
i have tried using the following query but i get an error #1062 - Duplicate entry '276917-29240' for key 'PRIMARY'
Update products p ,products_to_categories pc
set pc.categories_id = 29598
where p.products_id = pc.products_id
and p.products_status = 0

You most likely have a product that is linked in one or more categories.
Example: Product ABC with products_id = 123 can exist twice in the products_to_categories table if it is in two categories (say 'categories_id` 222 and 333. So you have two entries in your table, 123-222 and 123-333.
When you run your update, the first time it encounters product/category 123-222, it will change it's category to 123-29598. When it encounters the product/category 123-333, it will also try to update the row to 123-29598, due to your primary key constraint, and would cause the problem you see.
Perhaps in your script you can check if the product (123) already exists in the category, and if so, then remove the second entry (123-333) rather than change it's category to (123-29598). See here for information on deleting entries with the same id from your table.

MySql Properly Join Complex Data/Tables

Abstract:
Every client is given a specific xml ad feed (publisher_feed table). Everytime there is a query or a click on that feed, it gets recorded (publisher_stats_raw table) (Each query/click will have multiple rows depending on the subid passed by the client (We can sum the clicks together)). The next day, we pull stats from an API to grab the previous days revenue numbers (rev_stats table) (Each revenue stat might have multiple rows depending on the country of the click (We can sum the revenue together)). Been having a hard time trying to link together these three tables to find the average RPC for each client for the previous day.
Table Structure:
CREATE TABLE `publisher_feed` (
`publisher_feed_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`alias` varchar(45) DEFAULT NULL,
`user_id` int(10) unsigned DEFAULT NULL,
`remote_feed_id` int(10) unsigned DEFAULT NULL,
`subid` varchar(255) DEFAULT '',
`requirement` enum('tq','tier2','ron','cpv','tos1','tos2','tos3','pv1','pv2','pv3','ar','ht') DEFAULT NULL,
`status` enum('enabled','disabled') DEFAULT 'enabled',
`tq` decimal(4,2) DEFAULT '0.00',
`clicklimit` int(11) DEFAULT '0',
`prev_rpc` decimal(20,10) DEFAULT '0.0000000000',
PRIMARY KEY (`publisher_feed_id`),
UNIQUE KEY `alias_UNIQUE` (`alias`),
KEY `publisher_feed_idx` (`remote_feed_id`),
KEY `publisher_feed_user` (`user_id`),
CONSTRAINT `publisher_feed_feed` FOREIGN KEY (`remote_feed_id`) REFERENCES `remote_feed` (`remote_feed_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `publisher_feed_user` FOREIGN KEY (`user_id`) REFERENCES `user` (`user_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=124 DEFAULT CHARSET=latin1$$
CREATE TABLE `publisher_stats_raw` (
`publisher_stats_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`unique_data` varchar(350) NOT NULL,
`publisher_feed_id` int(10) unsigned DEFAULT NULL,
`date` date DEFAULT NULL,
`subid` varchar(255) DEFAULT NULL,
`queries` int(10) unsigned DEFAULT '0',
`impressions` int(10) unsigned DEFAULT '0',
`clicks` int(10) unsigned DEFAULT '0',
`filtered` int(10) unsigned DEFAULT '0',
`revenue` decimal(20,10) unsigned DEFAULT '0.0000000000',
PRIMARY KEY (`publisher_stats_id`),
UNIQUE KEY `unique_data_UNIQUE` (`unique_data`),
KEY `publisher_stats_raw_remote_feed_idx` (`publisher_feed_id`)
) ENGINE=InnoDB AUTO_INCREMENT=472 DEFAULT CHARSET=latin1$$
CREATE TABLE `rev_stats` (
`rev_stats_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`date` date DEFAULT NULL,
`remote_feed_id` int(10) unsigned DEFAULT NULL,
`typetag` varchar(255) DEFAULT NULL,
`subid` varchar(255) DEFAULT NULL,
`country` varchar(2) DEFAULT NULL,
`revenue` decimal(20,10) DEFAULT NULL,
`tq` decimal(4,2) DEFAULT NULL,
`finalized` int(11) DEFAULT '0',
PRIMARY KEY (`rev_stats_id`),
KEY `rev_stats_remote_feed_idx` (`remote_feed_id`),
CONSTRAINT `rev_stats_remote_feed` FOREIGN KEY (`remote_feed_id`) REFERENCES `remote_feed` (`remote_feed_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=58 DEFAULT CHARSET=latin1$$
Context:
Each remote_feed has a specific subid/typetag given to it. So we need to match up the both the remote_feed_id and the subid columsn from the publisher_feed table to the remote_feed_id and typetag columns in the revenue stats table.
My current, non working, implementation:
SELECT
pf.publisher_feed_id, psr.date, sum(clicks), sum(rs.revenue)
FROM
xml_network.publisher_feed pf
JOIN
xml_network.publisher_stats_raw psr
ON
psr.publisher_feed_id = pf.publisher_feed_id
JOIN
xml_network.rev_stats rs
ON
rs.remote_feed_id = pf.remote_feed_id
WHERE
pf.requirement = 'tq'
AND
pf.subid = rs.typetag
AND
psr.date <> date(curdate())
GROUP BY
psr.date
ORDER BY
psr.date DESC
LIMIT 1;
The above keeps pulling the wrong data out of the rev_stats table (pulls the sum of the correct stats, but repeats it over because of a join). Any help with how I would be able to properly pull the correct data would be greatly helpful ( I could use multiple queries and PHP to get the correct results, but what's the fun in that!)

Figured out a way to get this accomplished. Its def not a fast method by any means, needing 4 selects to get it done, but it works flawlessly =)
SELECT
pf.publisher_feed_id,
round(
(
SELECT
SUM(rs.revenue)
FROM
xml_network.rev_stats rs
WHERE
rs.remote_feed_id = pf.remote_feed_id
AND
rs.typetag = pf.subid
AND
rs.date = subdate(current_date, 1)
),10)as revenue,
(
SELECT
MAX(rs.tq)
FROM
xml_network.rev_stats rs
WHERE
rs.remote_feed_id = pf.remote_feed_id
AND
rs.typetag = pf.subid
AND
rs.date = subdate(current_date, 1)
) as tq,
(
SELECT
SUM(psr.clicks)-SUM(psr.filtered)
FROM
xml_network.publisher_stats_raw psr
WHERE
psr.publisher_feed_id = pf.publisher_feed_id
AND
psr.date = subdate(current_date, 1)
) as clicks
FROM
xml_network.publisher_feed pf
WHERE
pf.requirement = 'tq';

perform calculations on multiple users-own records in mysql

I need help extending a functional update query that performs calculations on one record to be able to perform calculations not only on one record in the database, but on all records associated with a particular user #.
Functionally, I need to extend an "edit this record" to "reevaluate all records of user#?"
Current calculations make use of 3 tables, sum a column, divide by the sum of another, and then creates a variable from that result(there are two columns summed separately to create two variables). Then I have a simple UPDATE query to update the record with the values of those variables. Each record has different values, and the sums will be different for every id#
#lastid is the unique record id.(allinfsds.id1)
I need to have the calculations done on all records that = a particular owners id (allinfsds.own_id)[i.e. WHERE allinfsds.own_id= usernum]
Any ideas???
Thanks ahead of time,
Nat
CREATE TABLE `allingred` (
`id6` int(8) NOT NULL auto_increment,
`usernum` varchar(255) default NULL,
`fsdsnum` int(8) unsigned zerofill NOT NULL,
`mfdfsds` varchar(255) default NULL,
`maybe` decimal(2,1) NOT NULL default '1.0',
`amount` float(10,2) default NULL,
`unit` int(6) default NULL,
`name` varchar(255) default NULL,
`wgt` int(9) NOT NULL,
PRIMARY KEY (`id6`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=38 ;
CREATE TABLE `weight` (
`NDB_No2` int(8) unsigned zerofill NOT NULL,
`Seq` smallint(6) NOT NULL,
`amt2` decimal(5,3) NOT NULL,
`Msre_Desc` varchar(80) NOT NULL,
`Gm_Wgt` decimal(7,1) NOT NULL,
`Num_Data_Pts` tinyint(4) default NULL,
`Std_Dev` decimal(7,1) default NULL,
`uni` int(7) NOT NULL auto_increment,
PRIMARY KEY (`uni`),
KEY `fb_join_NDB_No2_INDEX` (`NDB_No2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=21731 ;
CREATE TABLE `allinnot2` (
`NDB_No` int(8) unsigned zerofill NOT NULL,
`Water` decimal(10,2) default NULL,
`Energ_Kcal` decimal(10,0) default NULL
CREATE TABLE `allinfsds` (
`id1` int(8) unsigned zerofill NOT NULL,
`own_id` int(11) NOT NULL
UNIQUE KEY `id` (`id1`),
KEY `fb_groupbyorder_item_number_INDEX` (`item_number`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET #cal = (SELECT SUM( Energ_Kcal * allingred.amount * Gm_Wgt) / SUM( allingred.amount * Gm_Wgt ) AS nut100
FROM `allingred`
LEFT JOIN weight ON allingred.unit = weight.uni
LEFT JOIN allinnot2 ON allingred.mfdfsds = allinnot2.NDB_No
LEFT JOIN allinfsds ON allingred.fsdsnum = allinfsds.own_id
WHERE fsdsnum = #lastid)
SET #prot = (SELECT SUM(Protein * allingred.amount * Gm_Wgt) / SUM( allingred.amount * Gm_Wgt ) AS nut100
FROM `allingred`
LEFT JOIN weight ON allingred.unit = weight.uni
LEFT JOIN allinnot2 ON allingred.mfdfsds = allinnot2.NDB_No
WHERE fsdsnum = #lastid)
UPDATE `allinnot2` SET
`Energ_Kcal` = #cal,
`Protein` = #prot
WHERE `NDB_No` = #lastid

How to update multiple tuples at once
If you have a list of Ids you want to update, use
UPDATE `myTable` SET `myColumn` = 'newValue'
WHERE `userId` IN (
/*list of relevant Ids for instance: */ 15, 20, 63, 987
)
or if you dont have this list, but you can query the database for this list, use
UPDATE `myTable` SET `myColumn` = 'newValue'
WHERE `userId` IN (
SELECT `userId` FROM `myOtherTable` WHERE `relevantColumn` = 'value'
)
Beware that you are not allowed to use the same table as both the update target and source of ids in the subselect, so myTable != myOtherTable.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.