I am currently developing a an application to allow users to search through a database of documents using various paramaters and returning a set of paged results. I am building it in PHP/MySQL, which is not my usual development platform, but its been grand so far.
The problem I am having is that in order to return a full set of results I have to use LEFT JOIN on every table, which completely destroys my performance. The person who developed the database has said that the query I am using will return the correct results, so thats what I have to use. The query is below, I am by no means an SQL Guru and could use some help on this.
I have been thinking that it might be better to split the query into sub-queries? Below is my current query:
SELECT d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM deposition d
LEFT JOIN person AS p ON p.deposition_id = d.deposition_id
LEFT JOIN person_type AS pt ON p.person_type_id = pt.person_type_id
LEFT JOIN place_link AS pl ON pl.deposition_id = d.deposition_id
LEFT JOIN date AS dt ON dt.deposition_id = d.deposition_id
LEFT JOIN place AS plc ON pl.place_id = plc.place_id
LEFT JOIN county AS c ON plc.county_id = c.county_id
WHERE 1 AND d.manuscript_number = '840'
GROUP BY d.deposition_id ORDER BY d.folio_start ASC
LIMIT 0, 20
Any help or guidance would be greatly appreciated!
Deposition Table:
CREATE TABLE IF NOT EXISTS `deposition` (
`deposition_id` varchar(11) NOT NULL default '',
`manuscript_number` int(10) NOT NULL default '0',
`folio_start` varchar(4) NOT NULL default '0',
`folio_end` varchar(4) default '0',
`page` int(4) default NULL,
`deposition_type_id` int(10) NOT NULL default '0',
`comments` varchar(255) default '',
`title` varchar(255) default NULL,
PRIMARY KEY (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Date Table
CREATE TABLE IF NOT EXISTS `date` (
`deposition_id` varchar(11) NOT NULL default '',
`day` int(2) default NULL,
`month` int(2) default NULL,
`year` int(4) default NULL,
PRIMARY KEY (`deposition_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Person_Type
CREATE TABLE IF NOT EXISTS `person_type` (
`person_type_id` int(10) NOT NULL auto_increment,
`person_type_desc` varchar(255) NOT NULL default '',
PRIMARY KEY (`person_type_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=59 ;
Seems that you want to select one person, place etc. per deposition.
The query you wrote will return you this, but it's not guaranteed which one will it return, and the query is inefficient.
Try this:
SELECT d.title, d.deposition_id, d.folio_start, d.folio_end, pl.place_id, p.surname, p.forename, p.person_type_id, pt.person_type_desc, p.age, d.manuscript_number, dt.day, dt.month, dt.year, plc.county_id, c.county_desc
FROM deposition d
LEFT JOIN
person p
ON p.id =
(
SELECT id
FROM person pi
WHERE pi.deposition_id = d.deposition_id
ORDER BY
pi.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
place_link AS pl
ON pl.id =
(
SELECT id
FROM place_link AS pli
WHERE pli.deposition_id = d.deposition_id
ORDER BY
pli.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
date AS dt
ON dt.id =
(
SELECT id
FROM date AS dti
WHERE dti.deposition_id = d.deposition_id
ORDER BY
dti.deposition_id, pi.id
LIMIT 1
)
LEFT JOIN
place AS plc
ON plc.place_id = pl.place_id
LEFT JOIN
county AS c
ON c.county_id = plc.county_id
WHERE d.manuscript_number = '840'
ORDER BY
d.manuscript_number, d.folio_start
LIMIT 20
Create an index on deposition (manuscript_number, folio_start) for this to work fast
Also create a composite index on (deposition_id, id) on person, place_link and date.
The poor performance is almost certainly from lack of indexes. Your deposition table doesn't have any indexes, and that probably means the other tables you're referencing don't have any either. You can start by adding an index to your deposition table. From the MySQL shell, or phpMyAdmin, issue the following query.
ALTER TABLE deposition ADD INDEX(deposition_id, manuscript_number);
You know you're on the right track if the query executes faster after adding the index. From there you might want to put indexes on the other tables on the referenced columns. For instance for this part of your query "LEFT JOIN person AS p ON p.deposition_id = d.deposition_id", you could try adding an index to the person table using.
ALTER TABLE person ADD INDEX(deposition_id);
You only need a LEFT JOIN if the joined table might not have a matching value. Is it possible in your database schema for a person to not have a matching person_type? Or deposition to not have a matching row in date? A place not have a matching county?
For any of those relationships that must exist for the result to make sense you can change the LEFT JOIN to an INNER JOIN.
These columns should have indexes (unique if possible):
person.deposition_id
date.deposition_id
place_link.deposition_id
place_link.place_id
The date table looks like a bad design; I can't think of a reason to have a table of dates instead of just putting a column of type date (or datetime) in the deposition table. And date is a terrible name for a table because it's a SQL reserved word.
Related
Ok we have inbox table where we keep messages that users send to each other. Here is the table:
CREATE TABLE IF NOT EXISTS `inbox` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`fromid` int(10) unsigned NOT NULL DEFAULT '0',
`toid` int(10) DEFAULT NULL,
`message` text CHARACTER SET utf8 NOT NULL,
`time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `toid` (`toid`),
KEY `fromid` (`fromid`),
KEY `fromid_2` (`fromid`,`toid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
fromid and toid are id's of the users. We have their id's, times when the message is sent. What we need is a query that would return all messages that are not replied by 'our users' (admins).
Table accounts keeps track of users. To simplify:
CREATE TABLE IF NOT EXISTS `accounts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`our` int(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
So basically, we need a query that gives us the users WHOSE messages WERE NOT ANSWERED by admins (our users), their count and the date of the last message they sent to ADMIN, ordered from last to oldest.
So far we only have some basic queries, we didn't come up with anything reasonable that I could post.
Thanks in advance.
EDIT: From what I see we first need to find last interaction from two DISTINCT users in inbox table... then check & filter only those that were sent TO our users
How about this?
SELECT i.* FROM inbox as i
WHERE (i.toid, i.fromid) NOT IN
(SELECT i2.fromid, i2.toid FROM inbox as i2 WHERE i2.`time` >= i1.`time` AND i2.id = 1);
Another way using join:
SELECT DISTINCT i1.*
FROM inbox as i1 LEFT JOIN inbox as i2
ON i1.toid = 1 AND
i1.fromid = i2.toid AND
i1.toid = i2.fromid AND
i1.`time` <= i2.`time`
WHERE i2.id IS NULL;
Two possible solutions presented below: LEFT JOIN solution should perform better.
LEFT JOIN solution
SELECT
i.fromid, COUNT(*) AS unread, MAX(i.time) AS lastmsg
FROM inbox AS i
INNER JOIN accounts AS a
ON i.toid = a.id
LEFT JOIN inbox AS i2
ON i.fromid = i2.toid AND i.toid = i2.fromid AND i.time <= i2.time
WHERE a.our = 1 AND i2.id IS NULL
GROUP BY i.fromid
ORDER BY lastmsg DESC;
NOT IN solution
SELECT
i.fromid, COUNT(*) AS unread, MAX(i.time) AS lastmsg
FROM inbox AS i
INNER JOIN accounts AS a ON i.toid = a.id
WHERE a.our = 1 AND
(i.toid, i.fromid)
NOT IN (SELECT i2.fromid, i2.toid FROM inbox AS i2 WHERE i2.time >= i.time)
GROUP BY i.fromid
ORDER BY lastmsg DESC;
Currently my schema looks like this:
CREATE TABLE IF NOT EXISTS `hours` (
`Project_ID` varchar(10) NOT NULL,
`Project_Name` varchar(50) NOT NULL,
`Res_ID` varchar(40) NOT NULL,
`Date` date NOT NULL,
`Hours` int(10) NOT NULL,
)
CREATE TABLE IF NOT EXISTS `project_resources` (
`Project_ID` varchar(10) NOT NULL,
`Res_ID` varchar(40) NOT NULL
)
//A single project Id can be assosiated with many resource id's
CREATE TABLE IF NOT EXISTS `resources` (
`Res_ID` varchar(40) NOT NULL,
`Res_Name` varchar(50) NOT NULL,
`Email` varchar(50) NOT NULL,
`Phone_Number` bigint(12) NOT NULL,
`Reporting_Manager` varchar(50) NOT NULL,
`Role` varchar(50) NOT NULL,
`Designation` varchar(50) NOT NULL,
`Password` varchar(50) NOT NULL
)
Here I am trying to generate a query such that it displays the data in the below format,
Resource Name | Sum(Hours).
I tried executing the following query
SELECT res_name,sum(hours) FROM hours h
INNER JOIN resources r ON h.res_id=r.res_id
WHERE r.res_id = (SELECT res_id FROM `project_resources` WHERE project_id='someproject')
I know this returns subquery returns more than 1 row error. But I was just wondering what I can do to get this query right.
I think this will help you
Select res_name,sum(hours)
from hours h inner join resources r on h.res_id=r.res_id
where r.res_id IN (
SELECT res_id
FROM `project_resources`
WHERE project_id='someproject'
)
you can use 'IN' clause in your where statement if your sub query return more than 1 rows
You can just use in:
Select res_name,sum(hours)
from hours h inner join
resources r
on h.res_id = r.res_id
where r.res_id in (SELECT res_id
FROM `project_resources`
WHERE project_id = 'someproject'
);
However, I might suggest just doing multiple joins:
Select res_name,sum(hours)
from hours h inner join
resources r
on h.res_id = r.res_id inner join
project_resources pr
on pr.res_id = r.res_id and pr.project_id = 'someproject'
Of course, this will not work if you have duplicates in the project_resources table.
So basically, you want to show 2 things:
Name of the resource
Total amount of hours the resource worked
Under the condition that your Res_ID is in project_resources table and the project id is 'someproject'.
Right?
Then let's break this problem into three small parts:
Part - 1:
To get the name of the resource you should write:
SELECT rs.Res_Name
FROM resources rs
Note that rs is the name of the alias of the table resources.
Ok?
Now Part - 2:
To get the total amount of hours the resource worked you should write:
SELECT SUM(Hours)
FROM hours h
Basically, h is the alias of the table hours. I think you got it, right?
Finally, Part - 3:
Your Project_ID should be 'someproject'.
Also, the Res_ID should be inside project_resources.
Now, let's join all the parts together. Now we get:
SELECT r.Res_Name, SUM(Hours)
FROM hours h
INNER JOIN resources r ON h.Res_ID = r.Res_ID
INNER JOIN project_resources pr ON r.Res_ID = pr.Res_ID
WHERE pr.Project_ID = 'someproject'
Basically, here we've first joined hours with resources given the fact that Res_ID is same in both tables, also we joined the table resources with project_resources given the fact that Res_ID is same in both tables, and that Project_ID of hours is 'someproject'.
Hopefully, this will give you what you want. You got the idea, right?
However, a word of caution. I've noticed that you're using the same name for table hours and it's column Hours. Although, this won't cause any problem because of the case difference in the names, but this is not really a good practice. You should think of a different, meaningful name for your column to avoid confusion and any kinds of unwanted occurences. Enjoy coding!!!
You are selecting total hours over all projects and their ressources that are (also) used in 'someproject'. As there can be more then one ressource associated with 'someproject', use IN to get them all. Then ...
either group by res_name (provided it is unique, else use res_id) to get a result record per ressource
or remove res_name from your result (because when there are more than one, you would only show one of them randomly)
or generate a string containing all ressource names with GROUP_CONCAT
So either:
SELECT
res_name,
sum(hours)
FROM hours h
INNER JOIN resources r ON h.res_id=r.res_id
WHERE r.res_id IN
(
SELECT res_id
FROM `project_resources`
WHERE project_id='someproject'
)
GROUP BY res_id;
Or:
SELECT
GROUP_CONCAT(res_name) AS res_names,
sum(hours)
FROM hours h
INNER JOIN resources r ON h.res_id=r.res_id
WHERE r.res_id IN
(
SELECT res_id
FROM `project_resources`
WHERE project_id='someproject'
);
So I have to tables, products_used aprox 600MB and products_language_description which is about 5MB, but the thing is that this query never finishes running...
I have tried REPAIR,OPTIMIZE ANALYZE, I'm out of ideas how to improve this...
SELECT pu.products_id, count(pu.products_id) as products_count, p.products_name,
pu.time_used FROM products_used pu, products_language_description p
WHERE pu.merchant_id='69'
AND p.products_id=pu.products_id GROUP BY products_id ORDER BY products_count
DESC LIMIT 0, 20
CREATE TABLE `products_used` (
`products_used_id` INT(15) NOT NULL AUTO_INCREMENT,
`plans_key` VARCHAR(255) NOT NULL DEFAULT '0',
`products_id` BIGINT(20) NOT NULL DEFAULT '0',
`customers_id` INT(10) NOT NULL DEFAULT '0',
`merchant_id` INT(10) NOT NULL DEFAULT '0',
`time_used` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`products_used_id`),
INDEX `plans_key` (`plans_key`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM
AUTO_INCREMENT=24625441;
CREATE TABLE `products_language_description` (
`products_id` INT(5) NOT NULL DEFAULT '0',
`products_description` LONGTEXT NOT NULL,
`products_name` TEXT NOT NULL COLLATE 'utf8_general_ci',
`products_help_info` LONGBLOB NOT NULL,
`products_language` VARCHAR(255) NOT NULL DEFAULT '',
PRIMARY KEY (`products_id`, `products_language`)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
Try adding index to product_id & merchant_id fields on products_used table
Take a look at you WHERE section.
WHERE
pu.merchant_id='69'
AND p.products_id=pu.products_id GROUP BY products_id
You are comparing two different data types:
products_id INT(5) NOT NULL DEFAULT '0',
products_id BIGINT(20) NOT NULL DEFAULT '0',
Also, you are using single quotes in an unnecesary way:
pu.merchant_id='69'
Maybe you need to create some indeces in the first case. A foreign key helps too!
Maybe this helps you!
For this query:
SELECT pu.products_id, count(pu.products_id) as products_count,
p.products_name, pu.time_used
FROM products_used pu join
products_language_description p
on p.products_id = pu.products_id
WHERE pu.merchant_id = '69'
GROUP BY pu.products_id
ORDER BY products_count DESC
LIMIT 0, 20
You want an index on products_used(merchant_id, products_id). In MySQL, sometimes you can rewrite an aggregation query using a correlated subquery to improve results:
SELECT pu.products_id,
(select count(*)
from products_language_description p
where p.products_id = pu.products_id
) as products_count,
p.products_name, pu.time_used
FROM products_used pu
WHERE pu.merchant_id = '69'
ORDER BY products_count DESC
LIMIT 0, 20;
This replaces the outer group by with an aggregation in the correlated query that should just used the primary key index.
EDIT:
Wait. Your problem are these two definitions:
CREATE TABLE `products_used` (
. . .
`products_id` BIGINT(20) NOT NULL DEFAULT '0',
)
. . .
CREATE TABLE `products_language_description` (
`products_id` INT(5) NOT NULL DEFAULT '0',
. . .
)
The join conditions use a different data type. Fix the table structures so the columns have the same type (using alter table . . .) or by rebuilding them.
As Gordon mentioned, I would have an extend the index on your products used table to (merchant_id, product_id, time_used ) so it is a covering index and does not have to go to the raw data to get your count(). Now, it appear weird to me that you would have multiple instances of the SAME "products_id" in the products_used table for a given merchant, but that's another thing.
I would do an internal prequery of the product ID, count and time SPECIFIC TO THE MERCHANT you want. Otherwise, as in Gordon's query sample, I was prequerying EVERY Product first, then outside of that getting those for the merchant.
I am proposing prequery the internal products used specific to the merchant, then once THAT is returned, get the product name.
SELECT
JustByMerchant.products_id,
JustByMerchant.products_count,
p.products_name,
JustByMerchant.time_used
FROM
( select
pu.products_id,
count(*) as products_count,
pu.time_used
from
products_used pu
where
pu.merchant_id = 69
group by
pu.products_id
order by
COUNT(*) DESC
limit
0, 20 ) JustByMerchant
JOIN products_language_description p
ON JustByMerchant.products_id = p.products_id
I currently have database setup like so:
CREATE TABLE `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` int(11) NOT NULL,
`body` int(11) NOT NULL,
`link` int(11) NOT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE `translation_pivot` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` text,
PRIMARY KEY (`id`)
)
This is quite a simplified version to illustrate the question, essentially the translation_pivot is used to further look up text strings from a series of language tables, but's that's not relevant. Here, the title, body and link columns in article contain an id referencing content from translation_pivot.
The difficulty is that doing an INNER JOIN will result in a column called content, which will contain only the first match from translation_pivot, in this case title.
The other option I've looked into is using GROUP_CONCAT on translation_pivot.content. This will work but then I'm left with a comma separated list of items and lose the obvious relation to title, body and link other than as the first, second and third items (which is ok, but not great). The more serious problem is that items in the translation can be several paragraphs of text. The default value for group_concat_max_len is 1024, which I can change but does this have performance implications if set to high values?
Ideally I'd like a way of replacing the title, body and link columns with the textual result from translation_pivot, or at least getting the textual content back for each as a separate column. Is this possible in a single query?
My other alternative is to retrieve key, value pairs as an array from translation_pivot with the id as the key and then do a lookup after querying the articles. This is only an extra query and probably a lot simpler.
Which solution will scale best? Or is there something else I'm missing?
Just do multiple joins:
SELECT
article.id AS id,
tptitle.content AS title,
tpbody.content AS body,
tplink.content AS link,
article.`date` AS `date`
FROM
article
INNER jOIN translation_pivot AS tptitle ON article.title=tptitle.id
INNER jOIN translation_pivot AS tpbody ON article.body=tpbody.id
INNER jOIN translation_pivot AS tplink ON article.link=tplink.id
or:
SELECT
article.id AS id,
IFNULL(tptitle.content,'DEFAULT TITLE') AS title,
IFNULL(tpbody.content, 'DEFAULT BODY') AS body,
IFNULL(tplink.content, 'DEFAULT LINK') AS link,
article.`date` AS `date`
FROM
article
LEFT jOIN translation_pivot AS tptitle ON article.title=tptitle.id
LEFT jOIN translation_pivot AS tpbody ON article.body=tpbody.id
LEFT jOIN translation_pivot AS tplink ON article.link=tplink.id
Link to the translation_pivot table for each of title, body and link - like so:
select a.`id`,
a.`date`,
t.`content` title_content,
b.`content` body_content,
l.`content` link_content
from `article` a
left join `translation_pivot` t on a.`title` = t.`id`
left join `translation_pivot` b on a.`body` = b.`id`
left join `translation_pivot` l on a.`link` = l.`id`
I am trying to make a printable page, where there is all the sales of a specified manufacturer, listing all the products, between specified dates. If there has not been any sales, it should display 0.
The tables
// Manufacturer table
// mid, manufacturer
// Products table
// pid, product, ref_manufacturer_id
// Orders table
// oid, orderPrice, orderDateTime, ref_product_id
And the query that works (without date limitation)
SELECT prod.product, COALESCE(COUNT(pord.oid),0) AS orderCount,
COALESCE(SUM(pord.orderPrice),0) AS orderSum
FROM product_manufacturer AS manu
JOIN product_list AS prod ON prod.ref_manufacturer_id = manu.mid
LEFT JOIN product_orders AS pord ON pord.ref_product_id = prod.pid
WHERE manu.mid = :manu_id
GROUP BY prod.product;
But as soon as I add into the WHERE-syntax this
WHERE manu.mid = :manu_id AND DATE(pord.orderDateTime) BETWEEN :orders_start AND :orders_end
I am using PHP PDO on connecting and verifying that the manu_id is int and the orders_start/end is converted to MySQL date format.
But the question I am trying to fidn out is, what is causing the problem, that when I add the date restriction, every product that was not ordered, is not displayed on the output?
SQL on creating the tables
CREATE TABLE product_list (
pid bigint(20) unsigned NOT NULL AUTO_INCREMENT,
product varchar(255) NOT NULL,
ref_manufacturer_id bigint(20) unsigned NOT NULL,
PRIMARY KEY (pid),
KEY ref_manufacturer_id (ref_manufacturer_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE product_manufacturer (
mid bigint(20) unsigned NOT NULL AUTO_INCREMENT,
manufacturer varchar(255) NOT NULL,
PRIMARY KEY (mid),
UNIQUE KEY manufacturer (manufacturer)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE product_orders (
oid bigint(20) unsigned NOT NULL AUTO_INCREMENT,
orderPrice float(10,2) NOT NULL,
orderDatetime timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
red_product_id bigint(20) unsigned NOT NULL,
PRIMARY KEY (oid),
KEY red_product_id (red_product_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
What you need is to move the orderDateTime criteria to the join clause instead of where clause like:
SELECT prod.product, COALESCE(COUNT(pord.oid),0) AS orderCount,
COALESCE(SUM(pord.orderPrice),0) AS orderSum
FROM product_manufacturer AS manu
JOIN product_list AS prod ON prod.ref_manufacturer_id = manu.mid
LEFT JOIN product_orders AS pord
ON pord.ref_product_id = prod.pid
AND DATE(pord.orderDateTime) BETWEEN :orders_start AND :orders_end
WHERE manu.mid = :manu_id
GROUP BY prod.product;
The reason it does not work within the WHERE clause is because of the NULL values returned from the outer join. When you do not have a row in product_orders fot a product, the outer join returns a NULL for the date field orderDateTime and that row will be filtered out because a NULL is not equal to anything.
Try:
SELECT p.product,
COALESCE(o.orderCount, 0) as orderCount,
COALESCE(o.orderSum,0) AS orderSum
FROM product_manufacturer AS m
JOIN product_list AS p ON p.ref_manufacturer_id = m.mid
LEFT JOIN (
SELECT ref_product_id as pid, COUNT(oid) AS orderCount, SUM(orderPrice) AS orderSum
FROM product_orders
WHERE DATE(orderDateTime) BETWEEN :orders_start AND :orders_end
GROUP BY ref_product_id
) AS o ON p.pid = o.pid
WHERE m.mid = :manu_id
Edit: Corrected after ypercube comment.
try this on the where clause.
WHERE manu.mid = :manu_id AND (DATE(pord.orderDateTime) BETWEEN :orders_start AND :orders_end)
It might be reading the second AND function as another where clause that the statement should return true. Just a hunch on that. Let me know if this does the trick.
I don't know how your specific system works, but it may be orderDateTime is not set (ie, NULL or something else) until that product gets ordered. You may want to try:
WHERE manu.mid = :manu_id AND ((DATE(pord.orderDateTime) BETWEEN :orders_start AND :orders_end) OR pord.orderDateTime=NULL)
If this is not the case, could you give an example of the orderDateTime value for something that is not showing up when you want it to?