MYSQL query performs very slow - php

I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But when the database already has about 30 000 records the upload is very very slow. It takes about 11 hours to upload 20 000 records. I am just reading a CSV file via fgetcsv method.
if (($handle = fopen($filePath, "r")) !== FALSE) {
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
if (count($peopleData) == $fieldsCount) {
//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}
Below are the queries that run. (I am using Yii framework)
SELECT *
FROM `AdvanceBulkInsert` `t`
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv'
LIMIT 1
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1
If exist update the user:
UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE',
`mobile`='', `name`=NULL, `firstName`='Franck',
`lastName`='ALLEGAERT ', `username`=NULL,
`password`=NULL, `email`=NULL, `gender`=0,
`zip`='60310', `countryCode`='DZ',
`joinedDate`='2016-02-23 10:44:18',
`signUpDate`='0000-00-00 00:00:00',
`supporterDate`='2016-02-25 13:26:37', `userType`=3,
`signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29',
`reqruiteCount`=0, `keywords`='70,71,72,73,74,75',
`delStatus`=0, `city`='AMY', `isUnsubEmail`=0,
`isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL,
`totalDonations`=NULL, `isMcContact`=NULL,
`emailStatus`=NULL, `notes`=NULL,
`addressInvalidatedAt`=NULL,
`createdAt`='2016-02-23 10:44:18',
`updatedAt`='2016-02-25 13:26:37', `longLat`=NULL
WHERE `User`.`id`='51394'
If user don't exist, insert new record.
Table engine type is MYISAM. Only the email column has a index.
How can I optimize this to reduce the processing time?
Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. approx 6 hours.
Update
CREATE TABLE IF NOT EXISTS `User` (
`id` bigint(20) NOT NULL,
`address1` text COLLATE utf8_unicode_ci,
`mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
`zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
`joinedDate` datetime DEFAULT NULL,
`signUpDate` datetime NOT NULL COMMENT 'User signed up date',
`supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
`userType` tinyint(2) NOT NULL,
`signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
`isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
`dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
`reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
`keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
`delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
`city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
`isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
`longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
`isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
`profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
`totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
`isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
`emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
`notes` text COLLATE utf8_unicode_ci,
`addressInvalidatedAt` datetime DEFAULT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
`id` int(11) NOT NULL,
`source` varchar(256) NOT NULL,
`renameSource` varchar(256) DEFAULT NULL,
`countryCode` varchar(3) NOT NULL,
`userType` tinyint(2) NOT NULL,
`size` varchar(128) NOT NULL,
`errors` varchar(512) NOT NULL,
`status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
`createdAt` datetime NOT NULL,
`createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `CustomField` (
`id` int(11) NOT NULL,
`customTypeId` int(11) NOT NULL,
`fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`defaultValue` text COLLATE utf8_unicode_ci,
`sortOrder` int(11) NOT NULL DEFAULT '0',
`enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
`onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listValues` text COLLATE utf8_unicode_ci,
`label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomValue` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`relatedId` int(11) NOT NULL,
`fieldValue` text COLLATE utf8_unicode_ci,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Entire PHP Code is here http://pastie.org/10737962
Update 2
Explain output of the Query

Indexes are your friend.
UPDATE User ... WHERE id = ... -- Desperately needs an index on ID, probably PRIMARY KEY.
Similarly for renameSource.
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1;
Needs INDEX(firstName, lastName, dateOfBirth); the fields can be in any order (in this case).
Look at each query to see what it needs, then add that INDEX to the table. Read my Cookbook on building indexes.

Try these things to increase your query performance:
define indexing in your database structure, and get only columns that you want.
Do not use * in select query.
And do not put ids in quotes like User.id='51394', instead do User.id= 51394.
If you are giving ids in quotes then your indexing will not work. That approach improve your query performance by 20% faster.
If you are using ENGINE=MyISAM then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB. And create some indexing like foreign keys, full text indexing.

If I understand, for all the result of SELECT * FROM AdvanceBulkInsert ... you run a request SELECT cf.*, and for all the SELECT cf.*, you run the SELECT * FROM User
I think the issue is that you send way too much requests to the base.
I think you should merge all your select request in only one big request.
For that:
replace the
SELECT * FROM AdvanceBulkInsert by a EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...) or a JOIN
replace the SELECT * FROM User by a NOT EXISTS IN(SELECT * from User WHERE )
Then you call the update on all the result of the merged select.
You should too time one by one your request to find which of this requests take the most time, and
you should too use ANALYSE
to find what part of the request take time.
Edit:
Now I have see your code :
Some lead:
have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?
you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.
You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?
if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label

When you need to find out why a query takes long, you need to inspect individual parts. As you shown in the question Explain statement can help you very much. Usually the most important columns are:
select_type - this should always be simple query/subquery. Related subqueries give a lot of troubles. Luckily you don't use any
possible keys - What keys is this select going to search by
rows - how many candidate rows are determined by the keys/cache and other techniques. Smaller number is better
Extra - "using" tells you how exactly are the rows found, this is the most useful information
Query analysis
I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. Here is the breakdown for the query that gives you troubles:
EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
Solution
Let me explain above list. Bold columns totally must have an index. Joining tables is expensive operation that otherwise needs to go through all rows of both tables. If you make index on the joinable columns the DB engine will find much faster and better way to do it. This should be common practice for any database
The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time.
So you need to add indicies to theese columns
CustomType - id
CustomField - customTypeId, id, relatedTable, enabled, onCreate, sortOrder, label
CustomValue - customFieldId
CustomFieldSubArea - customFieldId, subarea
To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).
Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code.

For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.
So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE
I copy/past my own code for example, but adapt him to your needs
$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
REPLACE
INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
$result = (bool)$this->db->query($dataload);
Where $filename is a local path of your CSV (you can use dirname(__FILE__) for get it )
This SQL command is very quick (just 1 or 2 second for add/update all the CSV)
EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.

You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).
Can you add a unique key on first name / last name / DOB and one on email address?
If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.
But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.
You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.
Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).
The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label

see to it you can try to reduce the query and check with sql online compiler check the time period then include under the project.

Always do bulk importing within a transation
$transaction = Yii::app()->db->beginTransaction();
$curRow = 0;
try
{
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
$curRow++;
//process $peopleData
//insert row
//best to use INSERT ... ON DUPLICATE KEY UPDATE
// a = 1
// b = 2;
if ($curRow % 5000 == 0) {
$transaction->commit();
$transaction->beginTransaction();
}
}
catch (Exception $ex)
{
$transaction->rollBack();
$result = $e->getMessage();
}
//don't forget the remainder.
$transaction->commit();
I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.

Related

Speed up MySQL inner join with LIKE clause

I have the following 2 tables, api_analytics_data, and telecordia.
CREATE TABLE `api_analytics_data` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`upload_file_id` bigint(20) NOT NULL,
`partNumber` varchar(100) DEFAULT NULL,
`clei` varchar(45) DEFAULT NULL,
`description` varchar(150) DEFAULT NULL,
`processed` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_aad_clei` (`clei`),
KEY `idx_aad_pn` (`partNumber`),
KEY `id_aad_processed` (`processed`),
KEY `idx_combo1` (`partNumber`,`clei`,`upload_file_id`)
) ENGINE=InnoDB CHARSET=latin1;
CREATE TABLE `telecordia` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`ProdID` varchar(50) DEFAULT NULL,
`Mfg` varchar(20) DEFAULT NULL,
`Pn` varchar(50) DEFAULT NULL,
`Clei` varchar(50) DEFAULT NULL,
`Series` varchar(50) DEFAULT NULL,
`Dsc` varchar(50) DEFAULT NULL,
`Eci` varchar(50) DEFAULT NULL,
`AddDate` date DEFAULT NULL,
`ChangeDate` date DEFAULT NULL,
`Cost` float DEFAULT NULL,
PRIMARY KEY (`tid`),
KEY `telecordia.ProdID` (`ProdID`) USING BTREE,
KEY `telecordia.clei` (`Clei`),
KEY `telecordia.pn` (`Pn`),
KEY `telcordia.eci` (`Eci`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Users upload data via a web interface using Excel/CSV files into api_analytics_data. The data contains EITHER the partNumbers or CLEIs. I then update the api_analytics_data table by joining the telecordia table. The telecordia table is the master list of partNumber and Cleis.
So if a user uploads a file of CLEIs, the update/join I use is:
update api_analytics_data aad
inner join telecordia t on aad.clei = t.Clei
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
It works quickly, but not very thoroughly. The problem I have is that the CLEI uploaded may only be a substring of the CLEI in the telecordia table.
For example, the uploaded CLEI may be "5SC1DX0". In the telcordia table, the correct matching row is:
tid: 184324
ProdID: 472467
Mfg: PLSE
Pn: AUA58-2-REV-E
Clei: 5SC1DX04AA
Series: null
Dsc: DL SGL-PTY POTS CU RT
Eci: 205756
AddDate: 1994-03-18
ChangeDate: 1998-04-13
Cost: null
So obviously my update doesn't work in this case, even though 5SC1DX0 and 5SC1DX04AA are the same part.
What I need is a wildcard search. However, when I try this, it is crazy slow. With about 4500 rows uploaded into the api_analytics_data table, it runs for about 10 minutes, and then loses the connection with the server.
update api_analytics_data aad
inner join telecordia t on aad.clei like concat(t.Clei,'%')
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
Is there a way to optimize this so that it runs quickly?
The correct answer is "no". The better course of action is to create a new column in telecordia with the correct Clei value in it, one that can be used for joining the tables. In the most recent versions of MySQL, this can even be a computed column and be indexed.
That said, you might be able to do something if the matching portion is always the same length. If so, try this:
update api_analytics_data aad inner join
telecordia t
on t.Clei = left(aad.clei, 7)
set aad.partNumber = t.Pn
where aad.partNumber is null and aad.upload_file_id = 5;
For this query, you want an index on api_analytics_data(upload_fiel_id, partNumber, clei) and telecordia(clei, pn).

PHP SQL query slow?

Here is my function which i am using to un-follow users.It first DELETE the relationship between users and all the notifications that are related to this relationship.Then it INSERT a new notification for user which we are going to un-follow and then UPDATE his followers count (as one follower has left).I am using multi_query and this query seems to be bit slower on large database and i want to know whether it's a good practice or not or is there is any more complex form of query to get the job done.
PHP Function
// 'By' is the array that hold logged user and 'followed' is the user id which we are going to unfollow
function unFollowUser($followed,$by) {
$following = $this->getUserByID($followed);// Return fetch_assoc of user row
if(!empty($following['idu'])) { // if user exists
// return user followers as number of rows
$followers = $this->db->real_escape_string($this->numberFollowers($following['idu'])) - 1;
$followed_esc = $this->db->real_escape_string($following['idu']);
$by_user_esc = $this->db->real_escape_string($by['idu']);
// delete relationship
$query = "DELETE FROM `relationships` WHERE `relationships`.`user2` = '$followed_esc' AND `relationships`.`user1` = '$by_user_esc' ;" ;
// delete notification (user started following you )
$query.= "DELETE FROM `notifications` WHERE `notifications`.`not_from` = '$by_user_esc' AND `notifications`.`not_to` = '$followed_esc' ;" ;
// Insert a new notification( user has unfollowed you)
$query.= "INSERT INTO `notifications`(`id`, `not_from`, `not_to`, `not_content_id`,`not_content`,`not_type`,`not_read`, `not_time`) VALUES (NULL, '$by_user_esc', '$followed_esc', '0','0','5','0', CURRENT_TIMESTAMP) ;" ;
// update user followers (-1)
$query .= "UPDATE `users` SET `followers` = '$followers' WHERE `users`.`idu` = '$followed_esc' ;" ;
if($this->db->multi_query($query) === TRUE) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
}
Table structures
--
-- Table structure for table `notifications`
--
CREATE TABLE IF NOT EXISTS `notifications` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`not_from` int(11) NOT NULL,
`not_to` int(11) NOT NULL,
`not_content_id` int(11) NOT NULL,
`not_content` int(11) NOT NULL,
`not_type` int(11) NOT NULL,
`not_read` int(11) NOT NULL,
`not_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
--
-- Table structure for table `relationships`
--
CREATE TABLE IF NOT EXISTS `relationships` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user1` int(11) NOT NULL,
`user2` int(11) NOT NULL,
`status` int(11) NOT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`idu` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(32) NOT NULL,
`password` varchar(256) NOT NULL,
`email` varchar(256) NOT NULL,
`first_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`last_name` varchar(32) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`verified` int(11) NOT NULL,
`posts` text CHARACTER SET utf32 NOT NULL,
`photos` text CHARACTER SET utf32 NOT NULL,
`followers` text CHARACTER SET utf32 NOT NULL,
UNIQUE KEY `id` (`idu`),
UNIQUE KEY `idu` (`idu`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
In my testing, multi_query has been the fastest way to execute multiple different queries. Why do you feel it's running slow? Compared to what?
Anyway, improvements could come from adding indexes to some of the columns you search frequently:
relationships.users2
relationships.users1
notifications.not_from
notifications.not_to
users.idu
Adding indexes makes searching faster, but it has at least two downsides:
Makes the DB a lot more resource hungry, which could affect your server performance
Makes writing operations take longer
I don't see any problem with your current queries. Really consider whether the slow performance you're seeing comes from the DB queries themselves, or from the rest of your PHP process. Try measuring the script time with the queries, then skipping the queries and taking another measurement (you could hardcode query results). It will give you an idea of whether the slowness is attributable to something else.
Either way, benchmark.
Try creating index on user where deletes are running , this may speed up query

Optimization Needed For Dual Left Join Query

I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.
My situation is I have 3 tables:
/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */
CREATE TABLE `fans` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) DEFAULT NULL,
`middle_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`join_date` datetime DEFAULT NULL,
`twitter` varchar(255) DEFAULT NULL,
`twitterCrawled` datetime DEFAULT NULL,
`twitterImage` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;
/* A TABLE OF OUR TWITTER FOLLOWERS */
CREATE TABLE `twitterFollowers` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`screenName` varchar(25) DEFAULT NULL,
`twitterId` varchar(25) DEFAULT NULL,
`customerId` int(11) DEFAULT NULL,
`uniqueStr` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;
/* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS
IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */
CREATE TABLE `contentSuggestion` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`fanId` int(11) DEFAULT NULL,
`twitterAccountId` int(11) DEFAULT NULL,
`contentType` varchar(50) DEFAULT NULL,
`contentString` varchar(255) DEFAULT NULL,
`added` datetime DEFAULT NULL,
`dismissed` datetime DEFAULT NULL,
`uniqueStr` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unstr` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
What I'm trying to get is:
SELECT [fan columns]
WHERE fan screen name IS IN twitterfollowers
AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)
My attempts so far:
~33 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername
LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL
GROUP BY(fans.id) HAVING col1 != ''
~14 seconds
SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);
9.53 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)
I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.
Thanks for any help you can give me.
I would go with a variation of the second method. Instead of IN, use EXISTS. Then add the correct indexes and remove the aggregation:
SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
FROM twitterFollowers tf
WHERE f.emailUsername = tf.screenName
) AND
NOT EXISTS (SELECT 1
FROM contentSuggestion cs
WHERE f.emailUsername = cs.contentString AND
cs.dismissed IS NULL
) ;
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Some notes:
When using IN, don't use SELECT DISTINCT. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT in the subquery (it is redundant).
Historically, EXISTS was faster than IN in MySQL. The optimizer has improved in recent versions.
For performance, you need the correct indexes.
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Assuming that fan.id is unique (a very reasonable assumption), you don't need the final group by.

How to determine the ON clause for a dynamic query using PHP?

I am trying to write a script that will allow the user to select a list of fields to be displayed from different column/table in a database. This script will need be able to generate the full query and execute it.
I am able to select the field and add the proper where clause. However, I am being challenged on how to generate the ON clause which is part of the JOIN statement.
Here is what I have done so far.
First, I defined 3 tables like so
-- list of all tables available in the database
CREATE TABLE `entity_objects` (
`object_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`object_name` varchar(60) CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL,
`object_description` varchar(255) DEFAULT NULL,
PRIMARY KEY (`object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
-- list of all tables available in the database
CREATE TABLE `entity_definitions` (
`entity_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`display_name` varchar(255) NOT NULL,
`entity_key` varchar(60) NOT NULL,
`entity_type` enum('lookup','Integer','text','datetime','date') CHARACTER SET latin1 COLLATE latin1_general_ci NOT NULL,
`object_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`entity_id`),
KEY `object_id` (`object_id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8;
-- a list of the fields that are related to each other. For example entity 12 is a foreign key to entity 11.
CREATE TABLE `entity_relations` (
`relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`entity_a` int(11) unsigned NOT NULL,
`entity_b` int(11) unsigned NOT NULL,
`relation_type` enum('1:1','1:N') NOT NULL DEFAULT '1:1',
PRIMARY KEY (`relation_id`),
UNIQUE KEY `entity_a` (`entity_a`,`entity_b`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
To get a list of the relations that are available, I run this query
SELECT
CONCAT(oa.object_name, '.', ta.entity_key) AS entityA
, CONCAT(ob.object_name, '.', tb.entity_key) AS entityB
FROM entity_relations as r
INNER JOIN entity_definitions AS ta ON ta.entity_id = r.entity_a
INNER JOIN entity_definitions AS tb ON tb.entity_id = r.entity_b
INNER JOIN entity_objects AS oa ON oa.object_id = ta.object_id
INNER JOIN entity_objects AS ob ON ob.object_id = tb.object_id
I am having hard time trying to figure out how to generated the JOIN statement of the query. I am able to generate the SELECT ..... and the WHERE... but need help trying to generate the ON.... part of the query.
My final query should look something like this
SELECT
accounts.account_name
, accounts.industry_id
, accounts.primary_number_id
, accounts.person_id
, industries.industry_id
, industries.name
, contact_personal.first_name
, contact_personal.person_id
, account_phone_number.number_id
FROM accounts
LEFT JOIN industries ON industries.industry_id = accounts.industry_id
LEFT JOIN contact_personal ON contact_personal.person_id = accounts.person_id
LEFT JOIN account_phone_number ON account_phone_number.number_id = accounts.primary_number_id
WHERE industries.name = 'Marketing'
I created a SQL Fiddle with my MySQL code.
How can I define the ON clause of the join statement correctly?
It is completely unnecessary to create these tables, mysql can handle all of this for you so long as you are using the InnoDB storage engine by using foreign keys.
list all tables on current database
SHOW TABLES;
get list of columns on a given table
SELECT
*
FROM
information_schema.COLUMNS
WHERE
TABLE_SCHEMA = :schema
AND TABLE_NAME = :table;
get list of relationships between tables
SELECT
*
FROM
information_schema.TABLE_CONSTRAINTS tc
INNER JOIN
information_schema.INNODB_SYS_FOREIGN isf ON
isf.ID = concat(tc.CONSTRAINT_SCHEMA, '/', tc.CONSTRAINT_NAME)
INNER JOIN
information_schema.INNODB_SYS_FOREIGN_COLS isfc ON
isfc.ID = isf.ID
WHERE
tc.CONSTRAINT_SCHEMA = :schema
AND tc.TABLE_NAME = :table;

Optimizing queries on larger MySQL database

I'm coding a website which will store some offers (ex. job offers). In the end, it could contain more than 1M offers. Now I have problems with some inefficient SQL queries.
Scenario:
Each offer can be assigned into category (ex. IT jobs)
Each category has custom fields (ex. IT jobs can have custom field of type "price" which will represent text box accepting number (price) - in our example, let's say we have price input of expected salary)
Each offer stores meta data with values of these category custom fields
DB fields which will be used for filtering have indexes
Table category (I'm using nested sets to store categories hierarchy):
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
`depth` int(11) DEFAULT NULL,
`order` int(11) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `category_parent_id_index` (`parent_id`),
KEY `category_lft_index` (`lft`),
KEY `category_rgt_index` (`rgt`)
) ENGINE=InnoDB AUTO_INCREMENT=44 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table category_field:
CREATE TABLE `category_field` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`category_id` int(10) unsigned NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`optional` tinyint(1) NOT NULL DEFAULT '0',
`type` enum('price','number','date','color') COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `category_field_category_id_index` (`category_id`),
CONSTRAINT `category_field_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer:
CREATE TABLE `offer` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL,
`category_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_category_id_index` (`category_id`),
CONSTRAINT `offer_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer_meta:
CREATE TABLE `offer_meta` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`offer_id` int(10) unsigned NOT NULL,
`category_field_id` int(10) unsigned NOT NULL,
`price` double NOT NULL,
`number` int(11) NOT NULL,
`date` date NOT NULL,
`color` varchar(7) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_meta_offer_id_index` (`offer_id`),
KEY `offer_meta_category_field_id_index` (`category_field_id`),
KEY `offer_meta_price_index` (`price`),
KEY `offer_meta_number_index` (`number`),
KEY `offer_meta_date_index` (`date`),
KEY `offer_meta_color_index` (`color`),
CONSTRAINT `offer_meta_category_field_id_foreign` FOREIGN KEY (`category_field_id`) REFERENCES `category_field` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `offer_meta_offer_id_foreign` FOREIGN KEY (`offer_id`) REFERENCES `offer` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=107769 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I set up some filters on my page (for example, for our salary custom field) I have to start with query which returns MIN and MAX prices in available offer_meta records (I want to show a range slider to user in front-end, so I need MIN/MAX values for this range):
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
I found out that these queries are most inefficient from all queries I'm making (above query takes over 500ms when offer_meta table has few thousand of records).
Other inefficient queries (offer_meta has 107k records):
Obtaining MIN and MAX values for slider to filter numbers
select MIN(`number`) AS min, MAX(`number`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX prices for slider to filter by prices
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX date for date range restrictions
select MIN(`date`) AS min, MAX(`date`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining colors with counts to show list of colors with numbers
select `color`, count(*) as `count` from `offer_meta` where `category_field_id` = ? group by `color`
Example of full query to get offers count with multiple filter criteria (0.5 sec)
select count(*) as count from `offer` where id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct om.offer_id
from offer_meta om
join category_field cf on om.category_field_id = cf.id
where
cf.category_id in (2,3,4,41,43,5,6,7,8,37) and
om.category_field_id = 1 and
om.number >= 1 and
om.number <= 50) and
om.category_field_id = 2 and
om.price >= 2 and
om.price <= 4545) and
om.category_field_id = 3 and
om.date >= '0000-00-00' and
om.date <= '2015-04-09') and
category_field_id = 4 and
om.color in ('#0000ff'))
The same query without aggregation function (COUNT) is few times faster (just to get IDs).
Question:
Is it possible to tweak those queries, or do you have any suggestion on how to implement my logic (offers with categories and custom fields dynamically added in admin to each category) with different table schema? I tried few more schemes, but no success.
Question 2:
Do you think this is my MySQL server problem and if I buy VPS, it will be okay?
Help to understand even better:
I was strongly inspired by WordPress schema for custom fields, so the logic is similar.
Last notes:
Also, I'm working on Laravel framework and I'm using Eloquent ORM.
Sorry for my english, I hope I made my problem clear :-)
Thank you in advance,
Patrik
It is not a MySql problem. in your scenario we found huge data collection. naturally relational databases are not efficient for some queries.(i faced a situation with oracle)
the practice for win this kind of situations is using graph databases.
it seems it is hard with the situation you are facing at the movement.
I heard that the Lucene has some kind of support for indexing large databases for selecting purpose. i dont know how exactly do it.
http://en.wikipedia.org/wiki/Lucene

Categories