Related
(Spoiler: The Title has nothing to do with what is wrong with the code.)
I'm creating a live-search system just to show the user possible event types already listed on my website. During my speculations I may have an error with Wildcard binding which I'm unable to see.
I tried using different types of "WHERE LIKE" statements, and most of them didn't work at all. Such as I tried using placeholder query (question mark) and that did not work at all. If I ran this query manually on my database I will get results which I'm expecting.
This is how my code looks, the variable $q is obtained using $_GET method.
$query = $pdo->prepare('SELECT DISTINCT EventCategory FROM Events
WHERE EventCategory LIKE CONCAT(\'%\',:q,\'%\')');
$query->bindParam(":q", $q);
$query->execute();
$row = $query->fetch(PDO::FETCH_ASSOC);
while ($row = $query->fetchObject()) {
echo "<div> $row->EventCategory </div>";
}
The expected results would be: If the $q is equal to n, Meeting and Nightlife is returned. When $q is equal to ni, then Nightlife is only returned.
The search is NOT CASE SENSITIVE, N and n is treated equally.
The SHOW CREATE TABLE Events query returned the following:
CREATE TABLE `Events` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Name` varchar(100) NOT NULL,
`Image` varchar(600) NOT NULL,
`Date` date NOT NULL,
`Description` varchar(1200) NOT NULL,
`SpacesAvailable` int(11) NOT NULL,
`EventCategory` varchar(50) NOT NULL,
`Trending` varchar(30) DEFAULT NULL,
`TrendingID` int(255) NOT NULL,
`Sale` int(255) NOT NULL,
PRIMARY KEY (`ID`)
)DEFAULT CHARSET=latin1
Images to show the operation of the website: https://imgur.com/a/yP0hTm3
Please if you are viewing the images the view from bottom to top. Thanks
I suspect the default collation in your EventCategory column is case-sensitive. That's why Ni and ni don't match in Nightlife.
Try this query instead.
'SELECT DISTINCT EventCategory FROM Events WHERE EventCategory COLLATE utf8_general_ci LIKE CONCAT(\'%\',:q,\'%\')'
Or, if your column's character set is not unicode but rather iso8859-1, try this:
'SELECT DISTINCT EventCategory FROM Events WHERE EventCategory COLLATE latin1_general_ci LIKE CONCAT(\'%\',:q,\'%\')'
This explains how to look up the available character sets and collations on MySQL.
How to change collation of database, table, column? explains how to alter the default collation of a table or a column. It's generally a good idea because collations are baked into indexes.
The problem is not in LIKE, but in PHP and PDO. Stare at the 3 conflicting uses of $row in your code:
$row = $query->fetch(PDO::FETCH_ASSOC);
while ($row = $query->fetchObject()) {
echo "<div> $row->EventCategory </div>"; }
Then review the documentation and examples. (Sorry, I'm not going to feed you the answer; you need to study to understand it.)
In complement to the comprehensive answer by O.Jones, another, simpler solution would be to just perform a case-insensitive search, like :
'SELECT DISTINCT EventCategory
FROM Events
WHERE UPPER(EventCategory) LIKE CONCAT(\'%\',UPPER(:q),\'%\')'
I have the following 2 tables, api_analytics_data, and telecordia.
CREATE TABLE `api_analytics_data` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`upload_file_id` bigint(20) NOT NULL,
`partNumber` varchar(100) DEFAULT NULL,
`clei` varchar(45) DEFAULT NULL,
`description` varchar(150) DEFAULT NULL,
`processed` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_aad_clei` (`clei`),
KEY `idx_aad_pn` (`partNumber`),
KEY `id_aad_processed` (`processed`),
KEY `idx_combo1` (`partNumber`,`clei`,`upload_file_id`)
) ENGINE=InnoDB CHARSET=latin1;
CREATE TABLE `telecordia` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`ProdID` varchar(50) DEFAULT NULL,
`Mfg` varchar(20) DEFAULT NULL,
`Pn` varchar(50) DEFAULT NULL,
`Clei` varchar(50) DEFAULT NULL,
`Series` varchar(50) DEFAULT NULL,
`Dsc` varchar(50) DEFAULT NULL,
`Eci` varchar(50) DEFAULT NULL,
`AddDate` date DEFAULT NULL,
`ChangeDate` date DEFAULT NULL,
`Cost` float DEFAULT NULL,
PRIMARY KEY (`tid`),
KEY `telecordia.ProdID` (`ProdID`) USING BTREE,
KEY `telecordia.clei` (`Clei`),
KEY `telecordia.pn` (`Pn`),
KEY `telcordia.eci` (`Eci`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Users upload data via a web interface using Excel/CSV files into api_analytics_data. The data contains EITHER the partNumbers or CLEIs. I then update the api_analytics_data table by joining the telecordia table. The telecordia table is the master list of partNumber and Cleis.
So if a user uploads a file of CLEIs, the update/join I use is:
update api_analytics_data aad
inner join telecordia t on aad.clei = t.Clei
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
It works quickly, but not very thoroughly. The problem I have is that the CLEI uploaded may only be a substring of the CLEI in the telecordia table.
For example, the uploaded CLEI may be "5SC1DX0". In the telcordia table, the correct matching row is:
tid: 184324
ProdID: 472467
Mfg: PLSE
Pn: AUA58-2-REV-E
Clei: 5SC1DX04AA
Series: null
Dsc: DL SGL-PTY POTS CU RT
Eci: 205756
AddDate: 1994-03-18
ChangeDate: 1998-04-13
Cost: null
So obviously my update doesn't work in this case, even though 5SC1DX0 and 5SC1DX04AA are the same part.
What I need is a wildcard search. However, when I try this, it is crazy slow. With about 4500 rows uploaded into the api_analytics_data table, it runs for about 10 minutes, and then loses the connection with the server.
update api_analytics_data aad
inner join telecordia t on aad.clei like concat(t.Clei,'%')
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
Is there a way to optimize this so that it runs quickly?
The correct answer is "no". The better course of action is to create a new column in telecordia with the correct Clei value in it, one that can be used for joining the tables. In the most recent versions of MySQL, this can even be a computed column and be indexed.
That said, you might be able to do something if the matching portion is always the same length. If so, try this:
update api_analytics_data aad inner join
telecordia t
on t.Clei = left(aad.clei, 7)
set aad.partNumber = t.Pn
where aad.partNumber is null and aad.upload_file_id = 5;
For this query, you want an index on api_analytics_data(upload_fiel_id, partNumber, clei) and telecordia(clei, pn).
This is my query of creation of table
CREATE TABLE `result` (
`id` int(11) NOT NULL,
`l_id` varchar(25) NOT NULL,
`lname` varchar(200) NOT NULL,
`first_prize` varchar(9) DEFAULT NULL,
`consolation_prize` varchar(9) DEFAULT NULL,
`second_prize` varchar(9) DEFAULT NULL,
`third_prize` varchar(9) DEFAULT NULL,
`fourth_prize` int(11) DEFAULT NULL,
`fifth_prize` int(11) DEFAULT NULL,
`sixth_prize` int(11) DEFAULT NULL,
`seventh_prize` int(11) DEFAULT NULL,
`eigth_prize` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
and this is the query of insertion of values
INSERT INTO `result` (`id`, `l_id`, `lname`, `first_prize`, `consolation_prize`, `second_prize`, `third_prize`, `fourth_prize`, `fifth_prize`, `sixth_prize`, `seventh_prize`, `eigth_prize`) VALUES
(1, '1', 'Win-Win', 'WO-878475', 'WO-878474', 'WO-878477', 'WO-878455', 8474, 8477, 8412, 8473, 8689),
(2, '2', 'KARUNNYA', NULL, NULL, NULL, NULL, 6, NULL, NULL, NULL, NULL),
(3, '3', 'SOUBHAGYA', 'WE-878656', NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(4, '4', 'SREE SAKTHI', 'NB-750180', 'NE-750180', 'KO-594630', 'KF-678534', 6786, 4356, 2456, 4566, 7657);
the problem is, I cant insert multiple values to any of the columns such as first_prize,second_prize,third_prize etc...
How to do that ,please help me I have no idea and I am new to all these.
INSERT INTO result (id, l_id, lname, first_prize, consolation_prize, second_prize, third_prize, fourth_prize, fifth_prize, sixth_prize, seventh_prize, eigth_prize)
VALUES (1, '1', 'Win-Win', 'WO-878475,WO-878474,WO-878477', NULL,NULL, 'WO-878455', 8474, 8477, 8412, 8473, 8689);
increase the length of column.
I suggest you use "2", the last option in my answer.
i need to enter two different values(ie,WO-878475,WE-878475) in the column first_prize
To answer your question directly, you can have a delimiter between values (eg | or comma) and then when you select the field data you split based on your delimiter. This is 99% of the time a bad design approach, it really is.
I sincerely suggest that you instead consider a slight re-design and do either:
1)
Add more fields, such as first_prize_1 and first_prize_2. This is likely not the best way but I don't know all of your setup - this will make maintenance problems if you need another in the future, such as first_prize_3, but a lot better than multiple values in a single field.
2)
This is likely the better way - Make a new table prizes and have the fields:
id
prize_id
prize_level (this will be 1, 2, 3 etc for first prize and so on)
competition_id
Then just have a row per prize per competition, and select everything for the competition_id, and the fields allow you to state what prize it is, etc.
So id field is auto increment for unique row ID. Then rows would look like:
id = 1; prize_id = WO-878475; prize_level = 1; competition_id = 1;
id = 2; prize_id = WE-878475; prize_level = 1; competition_id = 1;
Then you select where competition_id = 1 and you get all the possible prizes for that competition, with the prize_id and prize_levels (1st, 2nd etc). This way, as per the above example, you can have more than one first prize, and as rows are endless you can have as many or as little prizes as you require per competition and thus are not restricted by the fields you have in a table (like field first_prize, second_prize etc).
MySQL is a relational database. I suggest you to use two tables.
The first one event will contain the event data (id and name, etc.), the second one can be structured as following:
`event_id` varchar(25) NOT NULL,
`winner` varchar(9) NOT NULL,
`prize_type` int(11) NOT NULL,
Then you can define your keys, e.g.:
0 = consolation prize
1 = first prize
2 = second prize
etc..
I am building a color search function utilizing php and mysql. The requirement of the search is that it needs to be fast, not use joins, and allow for 1-5 hex color inputs that query the database and return the "most accurate" results. By "most accurate" I mean that results will be reflective of the search input. I have a few pieces of data to help that such as the distance between the mapped color value (mapped against an array of pre-defined colors) and the original search input hex value (eg. ff0000).
The way the color search engine works is that you input 1-5 hex values (eg. #ff0000, #000000, #9ef855, etc), click search, and it searches the database to find images that contain the highest percentage of those colors. See this color search for reference to how a color search engine works. Note: I built this one, but it has a completely different schema, which has scaling problems and cant add indexes because the number of colors is directly related to the number of table columns which is 120. Suggesting I use what I have built is out of the question for right now.
The data in the database comes from measurements taken on images. Up to 5 colors are extracted from an image, and then each hex color value (hex) is mapped to the closest predefined hex value (map_hex). Both of these pieces of data as well as the following are stored in the database:
media_id
hex (actual true value from image measurement)
map_hex (mapped value of the previous hex value)
percentage (the amount of this color found in the image)
distance (the distance between the true hex value and the mapped hex value)
sequence (unix timestamp, for ordering)
Before a color search query gets sent to the database, it is mapped to a set of colors so we can use the mapping to do a direct lookup on map_hex. This to me seemed like a faster way than trying to do a range type of query.
As of right now I am experimenting with two database design schemas but both seem to have their own problems.
Schema 1
CREATE TABLE `media_has_colors` (
`media_id` int(9) unsigned NOT NULL,
`hex` varchar(6) NOT NULL DEFAULT '',
`map_hex` varchar(6) NOT NULL,
`percentage` double unsigned NOT NULL,
`distance` double unsigned NOT NULL,
`sequence` int(11) unsigned NOT NULL,
PRIMARY KEY (`media_id`,`hex`),
KEY `index_on_hex` (`hex`),
KEY `index_on_percentage` (`percentage`),
KEY `index_on_timestamp` (`sequence`),
KEY `index_on_media_id` (`media_id`),
KEY `index_on_mapping_distance` (`distance`),
KEY `index_on_mapping_hex` (`map_hex`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample query:
SELECT sql_no_cache media_id, hex, map_hex, distance,
avg(percentage) as percentage,
SUM((IF(map_hex = '61615a',1,0)) + (IF(map_hex = '34362d',1,0)) + (IF(map_hex = 'dbd5dd',1,0))) as matchCount
FROM media_has_colors
WHERE map_hex = '61615a' or map_hex = '34362d' or map_hex = 'dbd5dd'
GROUP BY media_id
ORDER BY matchCount DESC, distance, percentage DESC
LIMIT 100;
The First problem I see with schema 1 is that I am forced to use group by and sum. I'll admit I have not tested with a ton of records yet but it seems like it could get slow. On top of that I can't tell what map_hex values are matching (which is why I'm trying to get with matchCount.
Schema 2
CREATE TABLE `media_has_colors` (
`media_id` int(9) unsigned NOT NULL,
`color_1_hex` varchar(6) NOT NULL DEFAULT '',
`color_2_hex` varchar(6) NOT NULL DEFAULT '',
`color_3_hex` varchar(6) NOT NULL DEFAULT '',
`color_4_hex` varchar(6) NOT NULL DEFAULT '',
`color_5_hex` varchar(6) NOT NULL DEFAULT '',
`color_1_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_2_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_3_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_4_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_5_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_1_percent` double unsigned NOT NULL DEFAULT '0',
`color_2_percent` double unsigned NOT NULL DEFAULT '0',
`color_3_percent` double unsigned NOT NULL DEFAULT '0',
`color_4_percent` double unsigned NOT NULL DEFAULT '0',
`color_5_percent` double unsigned NOT NULL DEFAULT '0',
`color_1_distance` double unsigned NOT NULL DEFAULT '0',
`color_2_distance` double unsigned NOT NULL DEFAULT '0',
`color_3_distance` double unsigned NOT NULL DEFAULT '0',
`color_4_distance` double unsigned NOT NULL DEFAULT '0',
`color_5_distance` double unsigned NOT NULL DEFAULT '0',
`sequence` int(11) unsigned NOT NULL,
PRIMARY KEY (`media_id`),
KEY `index_on_timestamp` (`sequence`),
KEY `index_on_map_hex` (`color_1_map_hex`,`color_2_map_hex`,`color_3_map_hex`,`color_4_map_hex`,`color_5_map_hex`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This second schema is not as simple but it does avoid using group by only allowing 1 row per media. However, it seems to have the same problem of figuring out what map_hex values are matching. Here is a sample query:
SELECT sql_no_cache media_id,
(IF(color_1_percent = '61615a',color_1_percent,1)) *
(IF(color_2_percent = '34362d',color_2_percent,1)) *
(IF(color_3_percent = 'dbd5dd',color_3_percent,1)) as percentage,
(IF(color_1_distance = '61615a',color_1_distance,1)) +
(IF(color_2_distance = '34362d',color_2_distance,1)) +
(IF(color_3_distance = 'dbd5dd',color_3_distance,1)) as distance,
color_1_map_hex, color_2_map_hex, color_3_map_hex, color_4_map_hex, color_5_map_hex,
(IF(color_1_map_hex = '61615a',1,1)) +
(IF(color_2_map_hex = '34362d',1,1)) +
(IF(color_3_map_hex = 'dbd5dd',1,1)) as matchCount
FROM media_has_colors
WHERE color_1_map_hex IN ('61615a','34362d','dbd5dd') OR
color_2_map_hex IN ('61615a','34362d','dbd5dd') OR
color_3_map_hex IN ('61615a','34362d','dbd5dd')
ORDER BY matchCount DESC, distance, percentage DESC
LIMIT 100;
You can see that there is a problem with calculating percentage and distance because the actual map_hex value may not appear in those specific columns.
Update:
I don't need to know specifically what colors matched in the query but I do need to sort by which has the highest matches.
So my question is, How can the schema or queries be fixed? If not, is there a better solution?
I have developed a user bulk upload module. There are 2 situations, when I do a bulk upload of 20 000 records when database has zero records. Its taking about 5 hours. But when the database already has about 30 000 records the upload is very very slow. It takes about 11 hours to upload 20 000 records. I am just reading a CSV file via fgetcsv method.
if (($handle = fopen($filePath, "r")) !== FALSE) {
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
if (count($peopleData) == $fieldsCount) {
//inside i check if user already exist (firstName & lastName & DOB)
//if not, i check if email exist. if exist, update the records.
//other wise insert a new record.
}}}
Below are the queries that run. (I am using Yii framework)
SELECT *
FROM `AdvanceBulkInsert` `t`
WHERE renameSource='24851_bulk_people_2016-02-25_LE CARVALHO 1.zip.csv'
LIMIT 1
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1
If exist update the user:
UPDATE `User` SET `id`='51394', `address1`='49 GRANDE RUE',
`mobile`='', `name`=NULL, `firstName`='Franck',
`lastName`='ALLEGAERT ', `username`=NULL,
`password`=NULL, `email`=NULL, `gender`=0,
`zip`='60310', `countryCode`='DZ',
`joinedDate`='2016-02-23 10:44:18',
`signUpDate`='0000-00-00 00:00:00',
`supporterDate`='2016-02-25 13:26:37', `userType`=3,
`signup`=0, `isSysUser`=0, `dateOfBirth`='1971-07-29',
`reqruiteCount`=0, `keywords`='70,71,72,73,74,75',
`delStatus`=0, `city`='AMY', `isUnsubEmail`=0,
`isManual`=1, `isSignupConfirmed`=0, `profImage`=NULL,
`totalDonations`=NULL, `isMcContact`=NULL,
`emailStatus`=NULL, `notes`=NULL,
`addressInvalidatedAt`=NULL,
`createdAt`='2016-02-23 10:44:18',
`updatedAt`='2016-02-25 13:26:37', `longLat`=NULL
WHERE `User`.`id`='51394'
If user don't exist, insert new record.
Table engine type is MYISAM. Only the email column has a index.
How can I optimize this to reduce the processing time?
Query 2, took 0.4701 seconds which means for 30 000 records it will take 14103 sec, which is about 235 minutes. approx 6 hours.
Update
CREATE TABLE IF NOT EXISTS `User` (
`id` bigint(20) NOT NULL,
`address1` text COLLATE utf8_unicode_ci,
`mobile` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`name` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`firstName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`lastName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`username` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`email` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL,
`gender` tinyint(2) NOT NULL DEFAULT '0' COMMENT '1 - female, 2-male, 0 - unknown',
`zip` varchar(15) COLLATE utf8_unicode_ci DEFAULT NULL,
`countryCode` varchar(3) COLLATE utf8_unicode_ci DEFAULT NULL,
`joinedDate` datetime DEFAULT NULL,
`signUpDate` datetime NOT NULL COMMENT 'User signed up date',
`supporterDate` datetime NOT NULL COMMENT 'Date which user get supporter',
`userType` tinyint(2) NOT NULL,
`signup` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'whether user followed signup process 1 - signup, 0 - not signup',
`isSysUser` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 - system user, 0 - not a system user',
`dateOfBirth` date DEFAULT NULL COMMENT 'User date of birth',
`reqruiteCount` int(11) DEFAULT '0' COMMENT 'User count that he has reqruited',
`keywords` text COLLATE utf8_unicode_ci COMMENT 'Kewords',
`delStatus` tinyint(2) NOT NULL DEFAULT '0' COMMENT '0 - active, 1 - deleted',
`city` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`isUnsubEmail` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Unsubscribed form email',
`isManual` tinyint(1) NOT NULL DEFAULT '0' COMMENT '0 - ok, 1 - Manualy add',
`longLat` varchar(45) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Longitude and Latitude',
`isSignupConfirmed` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Whether user has confirmed signup ',
`profImage` tinytext COLLATE utf8_unicode_ci COMMENT 'Profile image name or URL',
`totalDonations` float DEFAULT NULL COMMENT 'Total donations made by the user',
`isMcContact` tinyint(1) DEFAULT NULL COMMENT '1 - Mailchimp contact',
`emailStatus` tinyint(2) DEFAULT NULL COMMENT '1-bounced, 2-blocked',
`notes` text COLLATE utf8_unicode_ci,
`addressInvalidatedAt` datetime DEFAULT NULL,
`createdAt` datetime NOT NULL,
`updatedAt` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `AdvanceBulkInsert` (
`id` int(11) NOT NULL,
`source` varchar(256) NOT NULL,
`renameSource` varchar(256) DEFAULT NULL,
`countryCode` varchar(3) NOT NULL,
`userType` tinyint(2) NOT NULL,
`size` varchar(128) NOT NULL,
`errors` varchar(512) NOT NULL,
`status` char(1) NOT NULL COMMENT '1:Queued, 2:In Progress, 3:Error, 4:Finished, 5:Cancel',
`createdAt` datetime NOT NULL,
`createdBy` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `CustomField` (
`id` int(11) NOT NULL,
`customTypeId` int(11) NOT NULL,
`fieldName` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`relatedTable` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`defaultValue` text COLLATE utf8_unicode_ci,
`sortOrder` int(11) NOT NULL DEFAULT '0',
`enabled` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listItemTag` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`required` char(1) COLLATE utf8_unicode_ci DEFAULT '0',
`onCreate` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onEdit` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`onView` char(1) COLLATE utf8_unicode_ci DEFAULT '1',
`listValues` text COLLATE utf8_unicode_ci,
`label` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`htmlOptions` text COLLATE utf8_unicode_ci
) ENGINE=MyISAM AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomFieldSubArea` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`subarea` varchar(256) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=MyISAM AUTO_INCREMENT=43 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE IF NOT EXISTS `CustomValue` (
`id` int(11) NOT NULL,
`customFieldId` int(11) NOT NULL,
`relatedId` int(11) NOT NULL,
`fieldValue` text COLLATE utf8_unicode_ci,
`createdAt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM AUTO_INCREMENT=86866 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Entire PHP Code is here http://pastie.org/10737962
Update 2
Explain output of the Query
Indexes are your friend.
UPDATE User ... WHERE id = ... -- Desperately needs an index on ID, probably PRIMARY KEY.
Similarly for renameSource.
SELECT *
FROM `User` `t`
WHERE `t`.`firstName`='Franck'
AND `t`.`lastName`='ALLEGAERT '
AND `t`.`dateOfBirth`='1971-07-29'
AND (userType NOT IN ("1"))
LIMIT 1;
Needs INDEX(firstName, lastName, dateOfBirth); the fields can be in any order (in this case).
Look at each query to see what it needs, then add that INDEX to the table. Read my Cookbook on building indexes.
Try these things to increase your query performance:
define indexing in your database structure, and get only columns that you want.
Do not use * in select query.
And do not put ids in quotes like User.id='51394', instead do User.id= 51394.
If you are giving ids in quotes then your indexing will not work. That approach improve your query performance by 20% faster.
If you are using ENGINE=MyISAM then you not able to define indexing in between your database table, change database engine to ENGINE=InnoDB. And create some indexing like foreign keys, full text indexing.
If I understand, for all the result of SELECT * FROM AdvanceBulkInsert ... you run a request SELECT cf.*, and for all the SELECT cf.*, you run the SELECT * FROM User
I think the issue is that you send way too much requests to the base.
I think you should merge all your select request in only one big request.
For that:
replace the
SELECT * FROM AdvanceBulkInsert by a EXISTS IN (SELECT * FROM AdvanceBulkInsert where ...) or a JOIN
replace the SELECT * FROM User by a NOT EXISTS IN(SELECT * from User WHERE )
Then you call the update on all the result of the merged select.
You should too time one by one your request to find which of this requests take the most time, and
you should too use ANALYSE
to find what part of the request take time.
Edit:
Now I have see your code :
Some lead:
have you index for cf.customTypeId , cfv.customFieldId , cfsa.customFieldId, user. dateOfBirth ,user. firstName,user.lastName ?
you don't need to do a LEFT JOIN CustomFieldSubArea if you have a WHERE who use CustomFieldSubArea, a simple JOIN CustomFieldSubArea is enougth.
You will launch the query 2 a lot of time with relatedId = 0 , maybe you can save the result in a var?
if you don't need sorted data, remove the "ORDER BY cf.sortOrder, cf.label" . Else, add index on cf.sortOrder, cf.label
When you need to find out why a query takes long, you need to inspect individual parts. As you shown in the question Explain statement can help you very much. Usually the most important columns are:
select_type - this should always be simple query/subquery. Related subqueries give a lot of troubles. Luckily you don't use any
possible keys - What keys is this select going to search by
rows - how many candidate rows are determined by the keys/cache and other techniques. Smaller number is better
Extra - "using" tells you how exactly are the rows found, this is the most useful information
Query analysis
I would have posted analytics for the 1st and 3rd query but they are both quite simple queries. Here is the breakdown for the query that gives you troubles:
EXPLAIN SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
Solution
Let me explain above list. Bold columns totally must have an index. Joining tables is expensive operation that otherwise needs to go through all rows of both tables. If you make index on the joinable columns the DB engine will find much faster and better way to do it. This should be common practice for any database
The italic columns are not mandatory to have index, but if you have large amount of rows (20 000 is large amount) you should also have index on the columns that you use for searching, it might not have such impact on the processing speed but is worth the extra bit of time.
So you need to add indicies to theese columns
CustomType - id
CustomField - customTypeId, id, relatedTable, enabled, onCreate, sortOrder, label
CustomValue - customFieldId
CustomFieldSubArea - customFieldId, subarea
To verify the results try running explain statement again after adding indicies (and possibly few other select/insert/update queries). The extra column should say something like "Using Index" and possible_keys column should list used keys (even two or more per join query).
Side note: You have some typos in your code, you should fix them in case someone else needs to work on your code too: "reqruiteCount" as table column and "fileUplaod" as array index in your refered code.
For my work, I have to add daily one CSV with 524 Columns and 10k records. When I have try to parse it and add the record with php, it was horrible.
So, I propose to you to see the documentation about LOAD DATA LOCAL INFILE
I copy/past my own code for example, but adapt him to your needs
$dataload = 'LOAD DATA LOCAL INFILE "'.$filename.'"
REPLACE
INTO TABLE '.$this->csvTable.' CHARACTER SET "utf8"
FIELDS TERMINATED BY "\t"
IGNORE 1 LINES
';
$result = (bool)$this->db->query($dataload);
Where $filename is a local path of your CSV (you can use dirname(__FILE__) for get it )
This SQL command is very quick (just 1 or 2 second for add/update all the CSV)
EDIT : read the doc, but of course you need to have an uniq index on your user table for "replace" works. So, you don't need to check if the user exist or not. And you don't need to parse the CSV file with php.
You appear to have the possibility (probability?) of 3 queries for every single record. Those 3 queries are going to require 3 trips to the database (and if you are using yii storing the records in yii objects then that might slow things down even more).
Can you add a unique key on first name / last name / DOB and one on email address?
If so the you can just do INSERT....ON DUPLICATE KEY UPDATE. This would reduce it to a single query for each record, greatly speeding things up.
But the big advantage of this syntax is that you can insert / update many records at once (I normally stick to about 250), so even less trips to the database.
You can knock up a class that you just pass records to and which does the insert when the number of records hits your choice. Also add in a call to insert the records in the destructor to insert any final records.
Another option is to read everything in to a temp table and then use that as a source to join to your user table to do the updates / insert to. This would require a bit of effort with the indexes, but a bulk load to a temp table is quick, and a updates from that with useful indexes would be fast. Using it as a source for the inserts should also be fast (if you exclude the records already updated).
The other issue appears to be your following query, but not sure where you execute this. It appears to only need to be executed once, in which case it might not matter too much. You haven't given the structure of the CustomType table, but it is joined to Customfield and the field customTypeId has no index. Hence that join will be slow. Similarly on the CustomValue and CustomFieldSubArea joins which join based on customFieldId, and neither have an index on this field (hopefully a unique index, as if those fields are not unique you will get a LOT of records returned - 1 row for every possibly combination)
SELECT cf.*, ctyp.typeName, cfv.id as customId, cfv.customFieldId,
cfv.relatedId, cfv.fieldValue, cfv.createdAt
FROM `CustomField` `cf`
INNER JOIN CustomType ctyp on ctyp.id = cf.customTypeId
LEFT OUTER JOIN CustomValue cfv on cf.id = cfv.customFieldId
and relatedId = 0
LEFT JOIN CustomFieldSubArea cfsa on cfsa.customFieldId = cf.id
WHERE ((relatedTable = 'people' and enabled = '1')
AND (onCreate = '1'))
AND (cfsa.subarea='peoplebulkinsert')
ORDER BY cf.sortOrder, cf.label
see to it you can try to reduce the query and check with sql online compiler check the time period then include under the project.
Always do bulk importing within a transation
$transaction = Yii::app()->db->beginTransaction();
$curRow = 0;
try
{
while (($peopleData = fgetcsv($handle, 10240, ",")) !== FALSE) {
$curRow++;
//process $peopleData
//insert row
//best to use INSERT ... ON DUPLICATE KEY UPDATE
// a = 1
// b = 2;
if ($curRow % 5000 == 0) {
$transaction->commit();
$transaction->beginTransaction();
}
}
catch (Exception $ex)
{
$transaction->rollBack();
$result = $e->getMessage();
}
//don't forget the remainder.
$transaction->commit();
I have seen import routines sped up 500% by simply using this technique. I have also seen an import process that did 600 queries (mixture of select, insert, update and show table structure) for each row. This technique sped up the process 30%.