Optimizing queries on larger MySQL database - php

I'm coding a website which will store some offers (ex. job offers). In the end, it could contain more than 1M offers. Now I have problems with some inefficient SQL queries.
Scenario:
Each offer can be assigned into category (ex. IT jobs)
Each category has custom fields (ex. IT jobs can have custom field of type "price" which will represent text box accepting number (price) - in our example, let's say we have price input of expected salary)
Each offer stores meta data with values of these category custom fields
DB fields which will be used for filtering have indexes
Table category (I'm using nested sets to store categories hierarchy):
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`lft` int(11) DEFAULT NULL,
`rgt` int(11) DEFAULT NULL,
`depth` int(11) DEFAULT NULL,
`order` int(11) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `category_parent_id_index` (`parent_id`),
KEY `category_lft_index` (`lft`),
KEY `category_rgt_index` (`rgt`)
) ENGINE=InnoDB AUTO_INCREMENT=44 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table category_field:
CREATE TABLE `category_field` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`category_id` int(10) unsigned NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`optional` tinyint(1) NOT NULL DEFAULT '0',
`type` enum('price','number','date','color') COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `category_field_category_id_index` (`category_id`),
CONSTRAINT `category_field_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer:
CREATE TABLE `offer` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL,
`category_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_category_id_index` (`category_id`),
CONSTRAINT `offer_category_id_foreign` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Table offer_meta:
CREATE TABLE `offer_meta` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`offer_id` int(10) unsigned NOT NULL,
`category_field_id` int(10) unsigned NOT NULL,
`price` double NOT NULL,
`number` int(11) NOT NULL,
`date` date NOT NULL,
`color` varchar(7) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `offer_meta_offer_id_index` (`offer_id`),
KEY `offer_meta_category_field_id_index` (`category_field_id`),
KEY `offer_meta_price_index` (`price`),
KEY `offer_meta_number_index` (`number`),
KEY `offer_meta_date_index` (`date`),
KEY `offer_meta_color_index` (`color`),
CONSTRAINT `offer_meta_category_field_id_foreign` FOREIGN KEY (`category_field_id`) REFERENCES `category_field` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `offer_meta_offer_id_foreign` FOREIGN KEY (`offer_id`) REFERENCES `offer` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=107769 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
When I set up some filters on my page (for example, for our salary custom field) I have to start with query which returns MIN and MAX prices in available offer_meta records (I want to show a range slider to user in front-end, so I need MIN/MAX values for this range):
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
I found out that these queries are most inefficient from all queries I'm making (above query takes over 500ms when offer_meta table has few thousand of records).
Other inefficient queries (offer_meta has 107k records):
Obtaining MIN and MAX values for slider to filter numbers
select MIN(`number`) AS min, MAX(`number`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX prices for slider to filter by prices
select MIN(`price`) AS min, MAX(`price`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining MIN and MAX date for date range restrictions
select MIN(`date`) AS min, MAX(`date`) AS max from `offer_meta` where `category_field_id` = ? limit 1
Obtaining colors with counts to show list of colors with numbers
select `color`, count(*) as `count` from `offer_meta` where `category_field_id` = ? group by `color`
Example of full query to get offers count with multiple filter criteria (0.5 sec)
select count(*) as count from `offer` where id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct offer_id
from offer_meta om
where offer_id in (select
distinct om.offer_id
from offer_meta om
join category_field cf on om.category_field_id = cf.id
where
cf.category_id in (2,3,4,41,43,5,6,7,8,37) and
om.category_field_id = 1 and
om.number >= 1 and
om.number <= 50) and
om.category_field_id = 2 and
om.price >= 2 and
om.price <= 4545) and
om.category_field_id = 3 and
om.date >= '0000-00-00' and
om.date <= '2015-04-09') and
category_field_id = 4 and
om.color in ('#0000ff'))
The same query without aggregation function (COUNT) is few times faster (just to get IDs).
Question:
Is it possible to tweak those queries, or do you have any suggestion on how to implement my logic (offers with categories and custom fields dynamically added in admin to each category) with different table schema? I tried few more schemes, but no success.
Question 2:
Do you think this is my MySQL server problem and if I buy VPS, it will be okay?
Help to understand even better:
I was strongly inspired by WordPress schema for custom fields, so the logic is similar.
Last notes:
Also, I'm working on Laravel framework and I'm using Eloquent ORM.
Sorry for my english, I hope I made my problem clear :-)
Thank you in advance,
Patrik

It is not a MySql problem. in your scenario we found huge data collection. naturally relational databases are not efficient for some queries.(i faced a situation with oracle)
the practice for win this kind of situations is using graph databases.
it seems it is hard with the situation you are facing at the movement.
I heard that the Lucene has some kind of support for indexing large databases for selecting purpose. i dont know how exactly do it.
http://en.wikipedia.org/wiki/Lucene

Related

How to get unread notifications (2 tables with FK)

I have 2 tables on my DB: user_notification and user_notification_read. It's a user notification system, the notifications are on user_notification and when a user reads a notification, it stores on user_notification_read with the notification id and the user id.
CREATE TABLE `user_notification` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`related_user_id` int(11) NOT NULL,
`text` text NOT NULL,
`link` varchar(255) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `user_notification_read` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`notification_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `notification_id` (`notification_id`),
CONSTRAINT `user_notification_read_ibfk_1` FOREIGN KEY (`notification_id`) REFERENCES `user_notification` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I want to make a SELECT to get the number of unread notifications for a certain user (by the user id).
I thought about using a
JOIN/WHERE (notification.id = user_notification_read.notification_id and user_notification.user_id = X)to get the rows from user_notification_read with a CASE to check if the row exists. If it doesn't exists, +1 on unread notifications.
I don't know if that's the appropriate logic to achieve it and don't know the syntax as well. I tried some google, but the examples are more complex than my case, which I believe it's simple.
How can I do that?
Fiddle: http://sqlfiddle.com/#!9/84a5ed/5/0
On the fiddle example, the count for unread notifications would be 2 for the user 1.
You should use a left join.
SELECT sum(r.notification_id is null)
FROM user_notification n
LEFT JOIN user_notification_read r ON r.notification_id = n.id
WHERE n.user_id = 1
r.notification_id is null means a notification wasn't read.

improve MySQL query performance from slow query log

I turned on slow query monitor in MySQL config.
Below is the query and time:
Time: 160330 20:54:11
User#Host: user[user] # [xx.xx.xxx.xxx]
Query_time: 8.794170 Lock_time: 0.000141 Rows_sent: 3942 Rows_examined: 4742825
SET timestamp=1459371251;
SELECT (SELECT (CASE WHEN ce_type = 'IN' then SUM(payment_amount)
END) as debit
FROM customer_payment_options cpo
WHERE wallet_id=cw.id
AND (cpo.real_account_type='HQ')
AND cpo.source_country_id='40'
GROUP BY cpo.wallet_id)
as debit,
(SELECT SUM(payment_amount)
as credit
FROM customer_payment_options cpo
WHERE wallet_id=cw.id
AND (cpo.real_account_type='HQ')
AND cpo.tran_id IS NOT NULL
AND cpo.source_country_id='40'
GROUP BY cpo.wallet_id)
as credit
FROM customer_wallet cw
WHERE cw.company_id='1'
AND cw.currency='40'
AND cw.is_approved = '1'
AND DATE(cw.date_added) < '2016-03-30';
Indexes on customer_payment_options:
company_id
tran_id
ce_id
wallet_id
What should I do to improve it's performance?
EXPLAIN:
http://i.stack.imgur.com/iH8rt.png
SCHEMA
CREATE TABLE `customer_payment_options` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`company_id` int(11) NOT NULL,
`local_branch_id` int(11) NOT NULL,
`tran_id` bigint(11) DEFAULT NULL,
`ce_id` int(11) DEFAULT NULL,
`wallet_id` int(11) DEFAULT NULL,
`reward_credit_id` int(11) DEFAULT NULL,
`ce_invoice_id` varchar(32) DEFAULT NULL,
`ce_type` enum('IN','OUT') DEFAULT NULL,
`payment_type` enum('CASH','DEBIT','CREDIT','CHEQUE','DRAFT','BANK_DEPOSIT','EWIRE','WALLET','LOAN','REWARD_CREDIT') NOT NULL,
`payment_amount` varchar(20) NOT NULL,
`payment_type_number` varchar(100) DEFAULT NULL,
`source_country_id` int(11) NOT NULL,
`real_account_id` int(11) DEFAULT NULL,
`real_account_type` enum('LOCAL','HQ') DEFAULT NULL,
`date_added` datetime NOT NULL,
`event_type` enum('MONEY_TRANSFER','CURRENCY_EXCHANGE','WALLET') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `company_id` (`company_id`),
KEY `real_account_type` (`real_account_type`),
KEY `tran_id` (`tran_id`),
KEY `ce_id` (`ce_id`),
KEY `wallet_id` (`wallet_id`),
CONSTRAINT `customer_payment_options_ibfk_4` FOREIGN KEY (`wallet_id`) REFERENCES `customer_wallet` (`id`),
CONSTRAINT `customer_payment_options_ibfk_1` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `customer_payment_options_ibfk_2` FOREIGN KEY (`tran_id`) REFERENCES `transaction` (`id`),
CONSTRAINT `customer_payment_options_ibfk_3` FOREIGN KEY (`ce_id`) REFERENCES `currency_exchange` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=412 DEFAULT CHARSET=utf8
CREATE TABLE `customer_wallet` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`wallet_unique_id` varchar(100) DEFAULT NULL,
`company_id` int(11) NOT NULL,
`branch_admin_id` int(11) DEFAULT NULL,
`emp_id` int(11) DEFAULT NULL,
`emp_type` enum('SUPER_ADMIN','ADMIN','AGENT_ADMIN','AGENT','OVER_AGENT_ADMIN','OVER_AGENT') DEFAULT NULL,
`cus_id` bigint(11) NOT NULL,
`tran_id` bigint(11) DEFAULT NULL,
`beehive_id` int(11) DEFAULT NULL,
`type` enum('DEPOSIT','WITHDRAW','TRANSACTION') NOT NULL,
`sub_type` enum('MONEY_TRANSFER','BEEHIVE_DEPOSIT') DEFAULT NULL,
`credit_in` varchar(20) DEFAULT NULL,
`credit_out` varchar(20) DEFAULT NULL,
`currency` varchar(20) NOT NULL,
`date_added` datetime NOT NULL,
`note` varchar(255) DEFAULT NULL,
`location` enum('DIRECT') DEFAULT NULL,
`is_approved` enum('0','1') NOT NULL DEFAULT '1',
`idebit_issconf` varchar(50) DEFAULT NULL,
`idebit_issname` varchar(50) DEFAULT NULL,
`idebit_isstrack2` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `cus_id` (`cus_id`),
KEY `company_id` (`company_id`),
KEY `branch_admin_id` (`branch_admin_id`),
KEY `emp_id` (`emp_id`),
KEY `tran_id` (`tran_id`),
KEY `beehive_id` (`beehive_id`),
CONSTRAINT `customer_wallet_ibfk_1` FOREIGN KEY (`cus_id`) REFERENCES `customers` (`id`),
CONSTRAINT `customer_wallet_ibfk_2` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `customer_wallet_ibfk_3` FOREIGN KEY (`tran_id`) REFERENCES `transaction` (`id`),
CONSTRAINT `customer_wallet_ibfk_4` FOREIGN KEY (`emp_id`) REFERENCES `employees` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=152 DEFAULT CHARSET=utf8
What you are doing as a correlated query on every wallet ID to get the corresponding debits and credits. It appears you are getting one record per wallet id. This is very busy. Having a join to the customer payments table on your criteria that is common (including the join per wallet id). Then, simplify the CASE as a SUM( case/when ) as respective debit / credit.
I don't know your underlying criteria of table columns, but I would even hedge to (and did) include NOT the CE_TYPE = 'IN' as that appears basis of a debit and you would not want to falsely count as part of a credit too. Again, dont know correlation of fields, trans_id, types.
Now, as stated, having individual indexes on individual fields will not help optimize this query. I would suggest the following indexes.
table index
customer_wallet ( company_id, is_approved, currency, id, date_added )
customer_payment_options ( wallet_id, account_type, country_id )
SELECT
cw.wallet_id,
SUM( case when cpo.ce_type = 'IN'
then cpo.payment_amount
ELSE 0 end ) as Debit,
SUM( case when NOT cpo.ce_type = 'IN'
AND cpo.tran_id IS NOT NULL
then cpo.payment_amount
ELSE 0 end ) as Credit
FROM
customer_wallet cw
JOIN customer_payment_options cpo
ON cw.id = cpo.wallet_id
AND cpo.real_account_type = 'HQ'
AND cpo.source_country_id = '40'
WHERE
cw.company_id = '1'
AND cw.currency = '40'
AND cw.is_approved = '1'
AND cw.date_added < '2016-03-30'
GROUP BY
cw.id
One additional comment. if your ID columns, Currency flag, country ID, approved are actually numeric values in the table structure, remove the quotes and let compare directly on the numeric value. Also, for your date_added. You had that based on DATE( date_added ). Doing a function on a column can not fully utilize the index. Since date() strips off any time portion of a date/time stamp column, and you are asking for all added less than Mar 30, then date added of March 29 # 11:59:59pm is still less than Mar 30 at 12:00:00am, so no date conversion is required.
As commented by Ivan (below), if you want ALL Wallet IDs regardless of having any payments (debit or credit), then change from a join to a LEFT JOIN.
You need to add indexes and multi-column indexes to make it fast.
Please keep in mind, that if you have large table, extra indexes will slow-down the insertions , since index file update will take more time.
If a multiple-column index exists on col1 and col2, the appropriate
rows can be fetched directly. If separate single-column indexes exist
on col1 and col2, the optimizer attempts to use the Index Merge
optimization (see Section 8.2.1.4, “Index Merge Optimization”), or
attempts to find the most restrictive index by deciding which index
excludes more rows and using that index to fetch the rows.
If the table has a multiple-column index, any leftmost prefix of the
index can be used by the optimizer to look up rows. For example, if
you have a three-column index on (col1, col2, col3), you have indexed
search capabilities on (col1), (col1, col2), and (col1, col2, col3).
Read more

Implicit MySQL Join on Update Statement - 0 rows affected

I'm trying to get this MySQL code to work, but it's saying 0 rows affected.
UPDATE assessments, assessment_types
SET assessments.assessment_type_id = assessment_types.id
WHERE (assessment_types.description = "Skills Assessment" AND assessments.id = 2);
Basically I have assessment_types with id and description column, and I just have the id in the assessments.assessment_type_id
I need to update the id.
I searched and couldn't find quite what I need for this.
Thanks!
Table Data:
assessment_types
id description
1 Knowledge Assessment
2 Skill Assessment
3 Personal Information
4 Natural Skills
Table Structure:
--
-- Table structure for table `assessments`
--
CREATE TABLE IF NOT EXISTS `assessments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_bin NOT NULL,
`acronym` varchar(255) COLLATE utf8_bin NOT NULL,
`assessment_type_id` int(11) NOT NULL,
`language_id` int(11) NOT NULL,
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`date_updated` date NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
KEY `assessment_type_id` (`assessment_type_id`),
KEY `language_id` (`language_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=2385 ;
--
-- Table structure for table `assessment_types`
--
CREATE TABLE IF NOT EXISTS `assessment_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) CHARACTER SET latin1 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=7 ;
You can try doing an explicit join of the two tables in your UPDATE statement:
UPDATE assessments a
INNER JOIN assessment_types at
ON a.assessment_type_id = at.id
SET a.assessment_type_id = at.id
WHERE (at.description = "Skills Assessment" AND a.id = 2);

Optimization Needed For Dual Left Join Query

I've always struggled with mysql joins but have started incorporating more but struggling to understand despite reading dozens of tutorials and mysql manual.
My situation is I have 3 tables:
/* BASICALLY A TABLE THAT HOLDS FAN RECORDS */
CREATE TABLE `fans` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) DEFAULT NULL,
`middle_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
`email` varchar(255) DEFAULT NULL,
`join_date` datetime DEFAULT NULL,
`twitter` varchar(255) DEFAULT NULL,
`twitterCrawled` datetime DEFAULT NULL,
`twitterImage` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=20413 DEFAULT CHARSET=latin1;
/* A TABLE OF OUR TWITTER FOLLOWERS */
CREATE TABLE `twitterFollowers` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`screenName` varchar(25) DEFAULT NULL,
`twitterId` varchar(25) DEFAULT NULL,
`customerId` int(11) DEFAULT NULL,
`uniqueStr` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=13426 DEFAULT CHARSET=utf8;
/* TABLE THAT SUGGESTS A LIKELY MATCH OF A TWITTER FOLLOWER BASED ON THE EMAIL / SCREEN NAME COMPARISON OF THE FAN vs OUR FOLLOWERS
IF SOMEONE (ie. a moderator) CONFIRMS OR DENIES THAT IT'S A GOOD MATCH THEY PUT A DATESTAMP IN `dismissed` */
CREATE TABLE `contentSuggestion` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userId` int(11) DEFAULT NULL,
`fanId` int(11) DEFAULT NULL,
`twitterAccountId` int(11) DEFAULT NULL,
`contentType` varchar(50) DEFAULT NULL,
`contentString` varchar(255) DEFAULT NULL,
`added` datetime DEFAULT NULL,
`dismissed` datetime DEFAULT NULL,
`uniqueStr` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unstr` (`uniqueStr`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
What I'm trying to get is:
SELECT [fan columns]
WHERE fan screen name IS IN twitterfollowers
AND WHERE fan screen name IS NOT IN contentSuggestion (with a datestamp in dismissed)
My attempts so far:
~33 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername
LEFT JOIN contentSuggestion cs ON cs.contentString = tf.screenName WHERE dismissed IS NULL
GROUP BY(fans.id) HAVING col1 != ''
~14 seconds
SELECT id, emailUsername FROM fans WHERE emailUsername IN(SELECT DISTINCT(screenName) FROM twitterFollowers) AND emailUsername NOT IN(SELECT DISTINCT(contentString) FROM contentSuggestion WHERE dismissed IS NULL) GROUP BY (fans.id);
9.53 seconds
SELECT fans.id, tf.screenName as col1, tf.twitterId as col2 FROM fans
LEFT JOIN twitterFollowers tf ON tf.screenName = fans.emailUsername WHERE tf.uniqueStr NOT IN(SELECT uniqueStr FROM contentSuggestion WHERE dismissed IS NULL)
I hope there is a better way. I've been struggling to really use JOINS outside of a single LEFT JOIN which has already helped me speed up other queries by a significant amount.
Thanks for any help you can give me.
I would go with a variation of the second method. Instead of IN, use EXISTS. Then add the correct indexes and remove the aggregation:
SELECT f.id, f.emailUsername
FROM fans f
WHERE EXISTS (SELECT 1
FROM twitterFollowers tf
WHERE f.emailUsername = tf.screenName
) AND
NOT EXISTS (SELECT 1
FROM contentSuggestion cs
WHERE f.emailUsername = cs.contentString AND
cs.dismissed IS NULL
) ;
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Some notes:
When using IN, don't use SELECT DISTINCT. I'm not 100% sure that MySQL is always smart enough to ignore the DISTINCT in the subquery (it is redundant).
Historically, EXISTS was faster than IN in MySQL. The optimizer has improved in recent versions.
For performance, you need the correct indexes.
Then be sure you have the following indexes: twitterFollowers(screenName) and contentSuggestion(contentString, dismissed).
Assuming that fan.id is unique (a very reasonable assumption), you don't need the final group by.

MySql Properly Join Complex Data/Tables

Abstract:
Every client is given a specific xml ad feed (publisher_feed table). Everytime there is a query or a click on that feed, it gets recorded (publisher_stats_raw table) (Each query/click will have multiple rows depending on the subid passed by the client (We can sum the clicks together)). The next day, we pull stats from an API to grab the previous days revenue numbers (rev_stats table) (Each revenue stat might have multiple rows depending on the country of the click (We can sum the revenue together)). Been having a hard time trying to link together these three tables to find the average RPC for each client for the previous day.
Table Structure:
CREATE TABLE `publisher_feed` (
`publisher_feed_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`alias` varchar(45) DEFAULT NULL,
`user_id` int(10) unsigned DEFAULT NULL,
`remote_feed_id` int(10) unsigned DEFAULT NULL,
`subid` varchar(255) DEFAULT '',
`requirement` enum('tq','tier2','ron','cpv','tos1','tos2','tos3','pv1','pv2','pv3','ar','ht') DEFAULT NULL,
`status` enum('enabled','disabled') DEFAULT 'enabled',
`tq` decimal(4,2) DEFAULT '0.00',
`clicklimit` int(11) DEFAULT '0',
`prev_rpc` decimal(20,10) DEFAULT '0.0000000000',
PRIMARY KEY (`publisher_feed_id`),
UNIQUE KEY `alias_UNIQUE` (`alias`),
KEY `publisher_feed_idx` (`remote_feed_id`),
KEY `publisher_feed_user` (`user_id`),
CONSTRAINT `publisher_feed_feed` FOREIGN KEY (`remote_feed_id`) REFERENCES `remote_feed` (`remote_feed_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `publisher_feed_user` FOREIGN KEY (`user_id`) REFERENCES `user` (`user_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=124 DEFAULT CHARSET=latin1$$
CREATE TABLE `publisher_stats_raw` (
`publisher_stats_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`unique_data` varchar(350) NOT NULL,
`publisher_feed_id` int(10) unsigned DEFAULT NULL,
`date` date DEFAULT NULL,
`subid` varchar(255) DEFAULT NULL,
`queries` int(10) unsigned DEFAULT '0',
`impressions` int(10) unsigned DEFAULT '0',
`clicks` int(10) unsigned DEFAULT '0',
`filtered` int(10) unsigned DEFAULT '0',
`revenue` decimal(20,10) unsigned DEFAULT '0.0000000000',
PRIMARY KEY (`publisher_stats_id`),
UNIQUE KEY `unique_data_UNIQUE` (`unique_data`),
KEY `publisher_stats_raw_remote_feed_idx` (`publisher_feed_id`)
) ENGINE=InnoDB AUTO_INCREMENT=472 DEFAULT CHARSET=latin1$$
CREATE TABLE `rev_stats` (
`rev_stats_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`date` date DEFAULT NULL,
`remote_feed_id` int(10) unsigned DEFAULT NULL,
`typetag` varchar(255) DEFAULT NULL,
`subid` varchar(255) DEFAULT NULL,
`country` varchar(2) DEFAULT NULL,
`revenue` decimal(20,10) DEFAULT NULL,
`tq` decimal(4,2) DEFAULT NULL,
`finalized` int(11) DEFAULT '0',
PRIMARY KEY (`rev_stats_id`),
KEY `rev_stats_remote_feed_idx` (`remote_feed_id`),
CONSTRAINT `rev_stats_remote_feed` FOREIGN KEY (`remote_feed_id`) REFERENCES `remote_feed` (`remote_feed_id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=58 DEFAULT CHARSET=latin1$$
Context:
Each remote_feed has a specific subid/typetag given to it. So we need to match up the both the remote_feed_id and the subid columsn from the publisher_feed table to the remote_feed_id and typetag columns in the revenue stats table.
My current, non working, implementation:
SELECT
pf.publisher_feed_id, psr.date, sum(clicks), sum(rs.revenue)
FROM
xml_network.publisher_feed pf
JOIN
xml_network.publisher_stats_raw psr
ON
psr.publisher_feed_id = pf.publisher_feed_id
JOIN
xml_network.rev_stats rs
ON
rs.remote_feed_id = pf.remote_feed_id
WHERE
pf.requirement = 'tq'
AND
pf.subid = rs.typetag
AND
psr.date <> date(curdate())
GROUP BY
psr.date
ORDER BY
psr.date DESC
LIMIT 1;
The above keeps pulling the wrong data out of the rev_stats table (pulls the sum of the correct stats, but repeats it over because of a join). Any help with how I would be able to properly pull the correct data would be greatly helpful ( I could use multiple queries and PHP to get the correct results, but what's the fun in that!)
Figured out a way to get this accomplished. Its def not a fast method by any means, needing 4 selects to get it done, but it works flawlessly =)
SELECT
pf.publisher_feed_id,
round(
(
SELECT
SUM(rs.revenue)
FROM
xml_network.rev_stats rs
WHERE
rs.remote_feed_id = pf.remote_feed_id
AND
rs.typetag = pf.subid
AND
rs.date = subdate(current_date, 1)
),10)as revenue,
(
SELECT
MAX(rs.tq)
FROM
xml_network.rev_stats rs
WHERE
rs.remote_feed_id = pf.remote_feed_id
AND
rs.typetag = pf.subid
AND
rs.date = subdate(current_date, 1)
) as tq,
(
SELECT
SUM(psr.clicks)-SUM(psr.filtered)
FROM
xml_network.publisher_stats_raw psr
WHERE
psr.publisher_feed_id = pf.publisher_feed_id
AND
psr.date = subdate(current_date, 1)
) as clicks
FROM
xml_network.publisher_feed pf
WHERE
pf.requirement = 'tq';

Categories