After searching and reading a little bit I came up with the following SQL query for my application:
SELECT
ROUND(AVG(CASE WHEN gender = 'M' THEN rating END), 1) avgAllM,
COUNT(CASE WHEN gender = 'M' THEN rating END) countAllM,
ROUND(AVG(CASE WHEN gender = 'F' THEN rating END), 1) avgAllF,
COUNT(CASE WHEN gender = 'F' THEN rating END) countAllF,
ROUND(AVG(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18M,
COUNT(CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END) countU18M,
ROUND(AVG(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END), 1) avgU18F,
COUNT(CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END) countU18F
FROM movie_ratings mr INNER JOIN accounts a
ON mr.aid = a.aid
WHERE mid = 5;
And I'm wondering how can I simplify this, if possible. The birth_date field is of type DATE and UserAge is a function to calculate the age from that date field.
The table structures are as follows:
[ACCOUNTS]
aid(PK), birth_date, gender
[MOVIE_RATINGS]
mid(PK), aid(PK,FK), rating
I'm looking for two things:
General simplifications to the code above that more experienced users know about that I don't.
I'm doing this in PHP and for each record I'll have an associative array with all those variables. I'm looking for a way to group them into a multidimensional array, so the PHP code is easier to read. Of course I don't want to do this in PHP itself, it would be pointless.
For instance, something like this:
$info[0]['avgAllM']
$info[0]['countAllM']
$info[1]['avgAllF']
$info[1]['countAllF']
$info[2]['avgU18M']
$info[2]['countU18M']
$info[3]['avgU18F']
$info[3]['countU18F']
Instead of:
$info['avgAllM']
$info['countAllM']
$info['avgAllF']
$info['countAllF']
$info['avgU18M']
$info['countU18M']
$info['avgU18F']
$info['countU18F']
I don't even know if this is possible, so I'm really wondering if it is and how it can be done.
Why I want all this? Well, the SQL query above is just a fragment of the complete SQL I need to do. I haven't done it yet because before doing all the work, I want to know if there's a more compact SQL query to achieve the same result. Basically I'll add a few more lines like the ones above but with different conditions, specially on the date.
You could create a VIEW with the following definition
SELECT
CASE WHEN gender = 'M' THEN rating END AS AllM,
CASE WHEN gender = 'F' THEN rating END AS AllF,
CASE WHEN gender = 'M' AND UserAge(birth_date) <= 18 THEN rating END AS U18M,
CASE WHEN gender = 'F' AND UserAge(birth_date) <= 18 THEN rating END AS U18F
FROM movie_ratings mr INNER JOIN accounts a
ON mr.aid = a.aid
WHERE mid = 5
Then SELECT from that
SELECT ROUND(AVG(AllM), 1) avgAllM,
COUNT(AllM) countAllM,
ROUND(AVG(AllF), 1) avg,
COUNT(AllF) countAllF,
ROUND(AVG(U18M), 1) avgU18M,
COUNT(U18M) countU18M,
ROUND(AVG(U18F), 1) avgU18F,
COUNT(U18F) countU18F
FROM yourview
Might simplify things slightly?
This could just be a case of optimizing too early. The query does what you need and only really looks complicated because it is. I'm not sure that there are necessarily any tricks that would help. It probably depends on the characteristics of your data. Is the query slow? Do you think it could be quicker?
It might be worth rearranging it in the following way. Since all the conditions rely on the ACCOUNTS table which I assume is going to be significantly smaller than the MOVIE_RATINGS table you might be able to do all the calculations on a smaller data set, which might be quicker. Although if you are only selecting on one movie at a time (mid = 5) then that probably won't be the case.
I'm not entirely sure that this will work but think it should.
SELECT
ROUND(AVG(rating * AllM), 1) avgAllM,
COUNT(rating * AllM) countAllM,
ROUND(AVG(rating * AllF), 1) avgAllF,
COUNT(rating * AllF) countAllF,
ROUND(AVG(rating * AllM * U18), 1) avgU18M,
COUNT(rating * AllM * U18) countU18M,
ROUND(AVG(rating * AllM * U18), 1) avgU18F,
COUNT(rating * AllM * U18) countU18F
FROM
movie_ratings mr
INNER JOIN (
select
aid,
case when gender = 'M' then 1 end as AllM,
case when gender = 'F' then 1 end as AllF,
case when UserAge(birth_date) <= 18 then 1 end as U18
from accounts) a ON mr.aid = a.aid
WHERE mid = 5;
On balance though, I would probably just leave the query you have as it is. The query that you have is easy to understand and probably performs fairly well.
Related
So I've been looking around the web about any information about pagination.
From what I've seen there are 3 kinds, (LIMIT, OFFSET) a, (WHERE id > :num ORDER BY id LIMIT 10) b and (cursor pagination) c like those used on facebook and twitter.
I decided that for my project I'll go with the "b" option as it looks pretty straightforward and efficient.
I'm trying to create some kind of "facebook" like post and comment system, but not as complex.
I have a ranking system for the posts and comments and top 2 comments for each post that are fetched with the post.
The rest of the comments for each specific post are being fetched when people click on to see more comments.
This is a query for post comments:
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.weight DESC, c.createdAt ASC LIMIT 10 OFFSET 2
offset 2 is because there were 2 comments being fetched earlier as top 2 comments.
I'm looking for a way similar to this of seeking next 10 comments each time through the DB without going through all the rows like with LIMIT,OFFSET
The problem is that I have two columns that are sorting the results and I won't allow me to use this method:
SELECT * FROM Comments WHERE id > :lastId LIMIT :limit;
HUGE thanks for the helpers !
Solution So Far:
In order to to have an efficient pagination we need to have a single column with as much as possible unique values that make a sequence to help us sort the data and paginate through.
My example uses two columns to sort the data so it makes a problem.
What I did is combine time(asc sorting order) and weight of the comment(desc sorting order), weight is a total of how much that comment is being engaged by users.
I achieved it by getting the pure int number out of the DateTime format and dividing the number by the weight let's call the result,"ranking" .
this way a comment with a weight will always have a lower ranking ,than a comment without a weight.
DateTime after stripping is a 14 digit int ,so it shouldn't make a problem dividing it by another number.
So now we have one column that sorts the comments in a way that comments with engagement will be at the top and after that will come the older comments ,so on until the newly posted comments at the end.
Now we can use this high performance pagination method that scales well:
SELECT * FROM Comments WHERE ranking > :lastRanking ORDER BY ASC LIMIT :limit;
Ok i want to say about other way, in my opinion this very useful.
$rowCount = 10; //this is count of row that is fetched every time
$page = 1; //this is for calculating offset . you must increase only this value every time
$offset = ($page - 1) * $rowCount; //offset
SELECT
c.commentID,
c.externalPostID,
c.numOfLikes,
c.createdAt,
c.customerID,
c.numOfComments,
(CASE WHEN cl.customerID IS NULL THEN false ELSE true END) isLiked,
cc.text,
cu.reputation,
cu.firstName,
cu.lastName,
c.ranking
FROM
(SELECT *
FROM Comments
WHERE Comments.externalPostID = :externalPostID) c
LEFT JOIN CommentLikes cl ON cl.commentID = c.commentID AND cl.customerID = :customerID
INNER JOIN CommentContent cc ON cc.commentTextID = c.commentID
INNER JOIN Customers cu ON cu.customerID = c.customerID
ORDER BY c.ranking DESC, c.createdAt ASC LIMIT $rowCount OFFSET $offset
There can be an error because i didn't check it , please don't make it matter
I'm having some problems with a query that finds the next ID of an orders with certain filters on it - Like it should be from a specific city, etc.
Currently it's used for a function, where it'll either spit out the previous or the next ID based on the current order. So it can either be min(id) or max(id), where max(id) is obviously faster, since it has to go through less rows.
The query is working just fine, but it's rather slow, since it's going through 123953 rows to find the ID. Is there any way I could optimize this?
Function example:
SELECT $minmax(orders.orders_id) AS new_id FROM orders LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id $where_join WHERE orders_status.language_id = '$languages_id' AND orders.orders_date_finished != '1900-01-01 00:00:00' AND orders.orders_id $largersmaller $ordersid $where;
Live example
SELECT min(orders.orders_id)
FROM orders
LEFT JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
LIMIT 1
so concluding:
SELECT orders.orders_id
FROM orders
JOIN orders_status ON orders.orders_status = orders_status.orders_status_id
WHERE orders_status.language_id = '4'
AND orders.orders_date_finished != '1900-01-01 00:00:00'
AND orders.orders_id < 4868771
ORDER BY orders.orders_id ASC
LIMIT 1
Extra:
to get the MAX value, use DESC where ASC is now.
And looking at your question: be sure to escape the values like $language_id etcetera. I suppose they could come from some html form?
(or use prepared statements)
I programmed a filter which generates a Query to show special employees.
I have table employees and a lot of 1:1, 1:n and n:m relationships e.g. for skills and languages for the employees like this:
Employees
id name
1 John
2 Mike
Skills
id skill experience
1 PHP 3
2 SQL 1
Employee_Skills
eid sid
1 1
1 2
Now I want to filter employees which have at least 2 years experience in using PHP and 1 year SQL.
My filter always generates a correct working Query for every table, relationship and field.
But now my problem is when I would like to filter the same field in a related table multiple times with a and it does not work.
e.g.
John PHP 3
John SQL 1
PHP and SQL are different rows so AND can not work.
I tried using group_concat and find_in_set but I have the problem that I can not filter experience over 2 years with find_in_set and find_in_set does not know PHP is 3 and SQL is 1.
I also tried
WHERE emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'PHP' AND experience > 1)) AND emp.id IN (SELECT eid FROM Employee_Skills WHERE sid IN (SELECT id FROM Skills WHERE skill = 'SQL' AND experience > 0))
which works for this example, but it only works for n:m and it too complex to know the relationship type.
I have the final Query with
ski.skill = 'PHP' AND ski.experience > 1 AND ski.skill = 'SQL' AND ski.experience > 0
and I would like to manipulate the Query to make it work.
How does a Query have to look like to deal with relational division.
you can try next approach:
select * from Employees
where id in (
select eid
from Employee_Skills as a
inner join
Skills as ski
on (a.sid = ski.id)
where
(ski.skill = 'PHP' AND a.experience > 2) OR
(ski.skill = 'SQL' AND a.experience > 1)
group by eid
having count(*) = 2
)
so, for every filter you will add OR statement, having will filter employees with all filters passed, just pass appropriate number
You could make a kind of pivot query, where you put the experience in each of all of the known skills in columns. This could be a long query, but you could build it dynamically in php, so it would add all skills as columns to the final query, which would look like this:
SELECT e.*, php_exp, sql_exp
FROM Employee e
INNER JOIN (
SELECT es.eid,
SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) php_exp,
SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) sql_exp,
SUM(CASE s.skill WHEN 'JS' THEN s.experience END) js_exp
-- do the same for other skills here --
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
) pivot ON pivot.eid = e.id
WHERE php_exp > 2 AND sql_exp > 0;
The WHERE clause is then very concise and intuitive: you use the logical operators like in other circumstances.
If the set of skills is rather static, you could even create a view for the sub-query. Then the final SQL is quite concise.
Here is a fiddle.
Alternative
Using the same principle, but using the SUM in the HAVING clause, you can avoid gathering all skill's experiences:
SELECT e.*
FROM Employee e
INNER JOIN (
SELECT es.eid
FROM Employee_Skills es
INNER JOIN Skills s ON es.sid = s.id
GROUP BY es.eid
HAVING SUM(CASE s.skill WHEN 'PHP' THEN s.experience END) > 2
AND SUM(CASE s.skill WHEN 'SQL' THEN s.experience END) > 0
) pivot ON pivot.eid = e.id;
Here is a fiddle.
You can also replace the CASE construct by the IF function, like this:
HAVING SUM(IF(s.skill='PHP', s.experience, 0)) > 2
... etc.
But it comes down to the same.
The straightforward way would be to repeatedly JOIN the skills:
SELECT e.*
FROM Employees AS e
JOIN Employee_Skills AS j1 ON (e.id = j1.eid)
JOIN Skills AS s1 ON (j1.sid = s1.id AND s1.skill = 'PHP' AND s1.experience > 3)
JOIN Employee_Skills AS j2 ON (e.id = j2.eid)
JOIN Skills AS s2 ON (j2.sid = s2.id AND s2.skill = 'SQL' AND s2.experience > 1)
...
Since all the clauses are required this translated to a straight JOIN.
You will need to add two JOINs for each clause, but they're quite fast joins.
A more hackish way would be to compress the skills into a code in a 1:1 relation with the employees. If experience never exceeds, say, 30, then you can multiply the first condition's experience by 1, the second by 30, the third by 30*30, the fourth by 30*30*30... and never get an overflow.
SELECT eid, SUM(CASE skill
WHEN 'PHP' THEN 30*experience
WHEN 'SQL' THEN 1*experience) AS code
FROM Employees_Skills JOIN Skills ON (Skills.id = Employees_Skills.sid)
GROUP BY eid HAVING code > 0;
Actually since you want 3 years PHP, you can HAVE code > 91. If you had three conditions with experiences 2, 3 and 5, you would request more than x = 2*30*30 + 3*30 + 5. This only serves to whittle the results, since 3*30*30 + 2*30 + 4 still passes the filter but is of no use to you. But since you want a restriction on code, and "> x" costs the same as "> 0" and gives better results... (if you needed more complex filtering than a series of AND, > 0 is safer, though).
The table above you join with Employees, then on the result you perform the true filtering, requiring
((code / 30*30) % 30) > 7 // for instance :-)
AND
((code / 30) % 30) > 3 // for PHP
AND
((code / 1) % 30) > 1 // for SQL
(the *1 and /1 are superfluous, and only inserted to clarify)
This solution requires a full table scan on Skills, with no real possibility of automatic optimizations. So it is slower than the other solution. On the other hand, its cost grows much more slowly, so if you have complex queries, or need OR operators or conditional expressions instead of ANDs, it may be more convenient to implement the "hackish" solution.
I have this query in MySQL. This query is taking too long to run, and I know the problem is the selectors (coalesce ((SELECT ...), I do not know how to speed up a query, via join.
I am hoping some of you SQL gurus will be able to help me.
SELECT
COALESCE(
(SELECT CONCAT(d.PRIJEVOZNIK, ' ', d.VOZAC_TRANSFER)
FROM dokum_zag as d
where d.SIFKNJ='NP' and
d.ID_VEZA=dokum_zag.ID and
d.korisnicko_ime=dokum_zag.korisnicko_ime
),'') as PRIJEVOZNIK,
(RELACIJA_TRANS_VOZ_TRANS) as RELACIJA_TRANS_VOZ,
(PRIJEVOZNIK_POVRATNI_TRANS) as PRIJEVOZNIK_POVRATNI,
(VAUC_KNJIZENO_TRANS) as VAUC_KNJIZENO,
ID_NALOGA,
ID_NALOGA_POV,
ID_VAUCHER,
DOLAZAK, VRIJ_TRANSFER,ODLAZAK,VRIJEME_LETA_POVRAT ,BRDOK, NOSITELJ_REZ, RELACIJA_TRANS, VOZILO_NAZIV, BROJ_NALOGA,BROJ_NAL_POV,BROJ_VAUCHER,BROJ_SOBE,VALIZN,PAX, MPIZN,ID
FROM
dokum_zag
WHERE
korisnicko_ime = '10' and
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
STORNO <> 'DA' and
SIFKNJ = 'TR' and
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
order by
(CASE WHEN DOLAZAK < '2015-07-30' THEN ODLAZAK ELSE DOLAZAK END) ,
(CASE WHEN DOLAZAK < '2015-07-30' THEN VRIJEME_LETA_POVRAT ELSE VRIJ_TRANSFER END), ID
Without a DB structure, and a description of what you want to extract it's a bit hard to help you.
From a logical point of view, somethings are redundant, for example
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
...
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
The year part isn't necessary since the year is specified on the first two.
Another thing that might be making the server nuts, is that weird order by clause, since it changes from record to record (test this setting it fixed on a field).
You can also check if your indexes are properly set for all the fields on the external where clause, and those which are not numeric, are not varchars (for example SIFKNJ and STORNO should be char(2) ).
The coalesce part can be solved via an outer join, so it doesn't get calculated on each row. But that depends on what and how you want to extract from the database... (since that subquery has it's own fields on the where section... weird)
Hope this somehow helps
INDEX(korisnicko_ime, SIFKNJ)
(in either order) may help
Turning the correlated subquery into a JOIN may help.
((DOLAZAK='2015-07-30') or (ODLAZAK='2015-07-30')) and
((YEAR(DOLAZAK)= '2015') or (YEAR(ODLAZAK)= '2015'))
is a bit weird. This might help:
( SELECT ...
AND DOLAZAK ='2015-07-30' AND ODLAZAK >= '2015-01-01'
AND ODLAZAK < '2015-01-01' + INTERVAL 1 YEAR
) UNION DISTINCT
( SELECT ...
AND ODLAZAK ='2015-07-30' AND DOLAZAK >= '2015-01-01'
AND DOLAZAK < '2015-01-01' + INTERVAL 1 YEAR
) ORDER BY ...
To help that reformulation, add 2 composite indexes:
INDEX(korisnicko_ime, SIFKNJ, DOLAZAK, ODLAZAK)
INDEX(korisnicko_ime, SIFKNJ, ODLAZAK, DOLAZAK)
I'm currently experimenting with creating a rough ranking/sorting query that will "score" users according to the data that they submit.
Someone with "president" exactly once in the Role/Position field will be given a score of 100, and anyone with "%vice%" (as in vice president) in the Role/Position field will be scored about half of what is given to those with just "president".
SELECT *, sum(relevance)
FROM (
SELECT a.*,
100 AS relevance
FROM application a,
document d
WHERE d.`Role/Position` LIKE 'president'
AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL'
AND `Schoolyear` = '2013-2014'
UNION
SELECT a.*,
50 AS relevance
FROM application a,
document d
WHERE d.`Role/Position` LIKE '%vice%'
AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL'
AND `Schoolyear` = '2013-2014'
) results
GROUP BY AppID
ORDER BY sum(relevance) DESC
My problem is that if I omit the union select portion, I can come up with the total of 200 for someone with two "president" fields. If the union select portion is kept in the query, then relevance only results to 100.
A person with two "president" fields is supposed to have 200 and someone with "%vice%" and "president" will have 150 in their sum(relevance) value supposedly. It also does not go beyond 150 for someone with two "president" and two "%vice%". Could someone point out what I am doing wrong?
I have a lot to learn in regards to SQL and web design, which is why I am asking for help in determining where I've gone wrong in my query. I based my query on this this guide as a basis.
UNION does a DISTINCT, which will eliminate duplicate rows. Since you want multiple hits per row in application to be possible, you should use UNION ALL instead;
SELECT *, sum(relevance)
FROM (
SELECT a.*, 100 AS relevance
FROM application a, document d
WHERE d.`Role/Position` LIKE 'president' AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL' AND `Schoolyear` = '2013-2014'
UNION ALL
SELECT a.*, 50 AS relevance
FROM application a, document d
WHERE d.`Role/Position` LIKE '%vice%' AND d.`AppID` = a.`AppId`
AND `AwardID` != 'NULL' AND `Schoolyear` = '2013-2014'
) results
GROUP BY AppID
ORDER BY SUM(relevance) DESC