Optimize query with big-table-join

Optimize query with big-table-join - php

I have one big table.
Let's call it 'unicorns'.
And second table 'farmers'.
Structure:
farmers
id
name
... some more fields
unicorns
id
farmer_id
... some more fields
And there is query:
SELECT
f.id,
f.name,
COUNT(*)
FROM
farmers f
LEFT JOIN unicorns u
ON u.farmer_id = f.id
GROUP BY f.id
Of course tables are really named not this way and have more fields etc. But I have left only the most important things in my example.
The problem is that the system grows, and there are too many unicorns. Millions.
And this query (like-this) is executed on farmers list page.
And page is loading not so fast as before, because we join a multi-million table each time we load page.
Problem is:
We really need to display each farmer's unicorns count in the list.
We need to improve page load speed.
If you would need to optimize this, how would you achieve this result?
What would you recommend?
P.S.
Personally I think I need to exclude big-table-join from the main query, calculate unicorns counts separately and store them in the cache storage, recalculate them time after time. Maybe there is the best way, so I wanted to hear someone else's opinion.

I would just have an extra column on the farmer's table for NumberUnicorns and store it there. Then, via some database triggers, when a record is added to the unicorn table, it just updates the farmers table with the respective count from the new record. Also consider updating the count if unicorn records are deleted. Then, your query is immediately from the farmers table -- done.

Related

Mysql - How to get Newsfeed from users you follow

I've been working on this for some hours now and it's getting tiring. I want to get users posts from people I follow and myself, just like twitter does.
So there's a table users_posts that has all users' posts with column user_id to determine who made the post.
Another table users_followers that contains all followers ie. user_id follows following_id
So here's my query:
User ID = 2271
SELECT users_followers.following_id AS following_id, users_posts.id
FROM users_followers, users_posts
WHERE users_posts.user_id = users_followers.following_id
AND (users_followers.user_id =2271)
This query works but the problem is, it's kinda slow. Is there a faster way to do this?
Also, as you can see, I can only get posts from those I follow, and not myself.
Any help?

If I'm understanding your tables properly, I would do this with an explicit JOIN. I'm not sure how much that would speed things up versus the implicit JOIN you're using though.
SELECT following_id, p.id
FROM users_followers f
LEFT JOIN users_posts p
ON (p.user_id = f.following_id)
WHERE f.user_id = 2271
Chances are, adding an index or two to your tables would help.
Using the MySQL EXPLAIN command (just put it in front of your SELECT query) will show the indexes, temporary tables, and other resources that are used for a query. It should give you an idea when one query is faster or more efficient than another.

Your query is fine as written, provided you have properly indexed your tables. At a minimum you need an index on users_posts.user_id and users_followers.following_id.
A query can also be slowed by large numbers of records, even when it is fully indexed. In that case, I find phpmyadmin to be an invaluable tool. From the server page (maybe localhost) select the Status tab to see a wealth of information about how your mysql server is performing and suggestions for how to improve it.

nested mysql queries with huge tables

I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.
So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.
My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.
For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0
The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.
The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):
SELECT id, surname, name FROM authors
foreach row {
SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
echo "<option>$surname, $name ($num)</option>";
}
With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.
Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.
I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.
Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?
BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.

Your query style is very inefficient - try using a join and group structure:
SELECT
authors.id,
authors.surname,
authors.name,
COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;
EDIT
Just to clear up some issues I not explicitely said:
Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
Indices on books.id_auth and authors.id (the latter primary or unique) are assumed
EDIT 2
As #GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.
To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):
SELECT
theme.id,
theme.main,
theme.sub,
IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;
EDIT 3
Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way.
If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table).
Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS

After reading a lot about how the JOIN statement works, with the help of
useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.
I made a test using the options I had:
reply with the JOIN query with IFNULL(): 0,5 seconds
reply with the JOIN query without IFNULL(): 0,5 seconds
reply using a stored value: 0,4 seconds
That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).
Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.
update
I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:
SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC
It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.
Is there a way to have back my theme.main and theme.sub?

Too relation or not to relation ? A MySQL, PHP database workflow

im kinda new with mysql and i'm trying to create a kind complex database and need some help.
My db structure
Tables(columns)
1.patients (Id,name,dob,etc....)
2.visits (Id,doctor,clinic,Patient_id,etc....)
3.prescription (Id,visit_id,drug_name,dose,tdi,etc....)
4.payments (id,doctor_id,clinic_id,patient_id,amount,etc...) etc..
I have about 9 tables, all of them the primary key is 'id' and its set to autoinc.
i dont use relations in my db (cuz i dont know if it would be better or not ! and i never got really deep into mysql , so i just use php to run query's to Fitch info from one table and use that to run another query to get more info/store etc..)
for example:
if i want to view all drugs i gave to one of my patients, for example his id is :100
1-click patient name (name link generated from (tbl:patients,column:id))
2-search tbl visits WHERE patient_id=='100' ; ---> that return all his visits ($x array)
3-loop prescription tbl searching for drugs with matching visit_id with $x (loop array).
4- return all rows found.
as my database expanding more and more (1k+ record in visit table) so 1 patient can have more than 40 visit that's 40 loop into prescription table to get all his previous prescription.
so i came up with small teak where i edited my db so that patient_id and visit_id is a column in nearly all tables so i can skip step 2 and 3 into one step (
search prescription tbl WHERE patient_id=100), but that left me with so many duplicates in my db,and i feel its kinda stupid way to do it !!
should i start considering using relational database ?
if so can some one explain a bit how this will ease my life ?
can i do this redesign but altering current tables or i must recreate all tables ?
thank you very much

Yes, you should exploit MySQL's relational database capabilities. They will make your life much easier as this project scales up.
Actually you're already using them well. You've discovered that patients can have zero or more visits, for example. What you need to do now is learn to use JOIN queries to MySQL.
Once you know how to use JOIN, you may want to declare some foreign keys and other database constraints. But your system will work OK without them.
You have already decided to denormalize your database by including both patient_id and visit_id in nearly all tables. Denormalization is the adding of data that's formally redundant to various tables. It's usually done for performance reasons. This may or may not be a wise decision as your system scales up. But I think you can trust your instinct about the need for the denormalization you have chosen. Read up on "database normalization" to get some background.
One little bit of advice: Don't use columns named simply "id". Name columns the same in every table. For example, use patients.patient_id, visits.patient_id, and so forth. This is because there are a bunch of automated software engineering tools that help you understand the relationships in your database. If your ID columns are named consistently these tools work better.
So, here's an example about how to do the steps numbered 2 and 3 in your question with a single JOIN query.
SELECT p.patient_id p.name, v.visit_id, rx.drug_name, rx.drug_dose
FROM patients AS p
LEFT JOIN visits AS v ON p.patient_id = v.patient_id
LEFT JOIN prescription AS rx ON v.visit_id = rx.visit_id
WHERE p.patient_id = '100'
ORDER BY p.patient_id, v.visit_id, rx.prescription_id
Like all SQL queries, this returns a virtual table of rows and columns. In this case each row of your virtual table has patient, visit, and drug data. I used LEFT JOIN in this example. That means that a patient with no visits will have a row with NULL data in it. If you specify JOIN MySQL will omit those patients from the virtual table.

Problem with getting specific data based on several factors in MySQL

I have a system where a user is part of a series of "runs", to each "run", can be added courses, teachers(users), classes and so on.
Each teacher(user) has chosen his/her classes & courses.
Here's a run-down of the tables I have that are relevant:
lam_run - The run in it self.
lam_run_course - Relational table that shows what runs has what courses
lam_teacher_course - Relational table that shows which teacher has which courses
lam_run_teacher - Relational table that shows what teachers are in what courses
What I want to do is show each teacher which runs that are relevant to them (based on which courses they have selected seen in lam_teacher_course) but in which they are not already participating.
Here's the MySQL code I have so far that does not work:
$query_relevant_runs = "
SELECT DISTINCT
lam_run_course.run_id
FROM
lam_teacher_course,
lam_run_course,
lam_run, lam_run_teacher
WHERE
lam_teacher_course.user_id = '1'
AND
lam_teacher_course.course_id = lam_run_course.course_id
AND
lam_run_teacher.user_id != '1'";
Instead this code shows all runs that are relevant, but it doesn't exclude the runs the user is already in..
What can I do to fix this?
Ps. Sorry for bad title, no idea what I should've called it :S
Here is a link to part of the databases (the relevant part): Link!

I think what you're looking for is:
LEFT JOIN `lam_run_teacher` `lam_run_teach_exclude`
ON `lam_run_teacher_exclude`.`user_id` = `lam_teacher_user`.`user_id`
...
WHERE `lam_run_teacher`.`user_id` IS NULL
The LEFT JOIN takes your current query, and appends the additional data to it. However, unlike the INNER JOIN you are using now (using the kinda-strange multiple-from syntax), the LEFT JOIN does not limit your resultset to just those where there is data for the righthand side. The righthand columns will be NULL. By filtering on that NULL, you can find all runs that are interesting, and for which there is not yet a relation to the teacher.
Does this help?
I'd recommend always using the normal join syntax (INNER JOIN target on target.id = source.id) - that way you're more aware of the idea that there are other kinds of join as well, and all your joins will look identical. It takes some getting used to, but definitely helps when your queries get more complex.
Also, in your cross-referencing tables, you can drop the primary key columns. If the only purpose of a table is to define a link between two tables, make the primary key consist of the two keys you've got. Unless you want to be able to related the same teacher to a run multiple times...
OK, took me way longer than it should have, but here's the complete thing:
SELECT
DISTINCT `lam_run_course`.run_id
FROM
`lam_run_course`
INNER JOIN
`lam_teacher_course`
ON `lam_teacher_course`.course_id = `lam_teacher_course`.course_id
LEFT JOIN
`lam_run_teacher` ON (`lam_run_teacher`.`run_id` = `lam_run_course`.`run_id` AND `lam_run_teacher`.`user_id` = 3)
WHERE
`lam_teacher_course`.user_id = 3
and `lam_run_teacher`.`run_id` IS NULL

using joins or multiple queries in php/mysql

Here i need help with joins.
I have two tables say articles and users.
while displaying articles i need to display also the user info like username, etc.
So will it be better if i just use joins to join the articles and user tables to fetch the user info while displaying articles like below.
SELECT a.*,u.username,u.id FROM articles a JOIN users u ON u.id=a.user_id
OR can this one in php.
First i get the articles with below sql
SELECT * FROM articles
Then after i get the articles array i loop though it and get the user info inside each loop like below
SELECT username, id FROM users WHERE id='".$articles->user_id."';
Which is better can i have explanation on why too.
Thank you for any reply or views

There is a third option. You could first get the articles:
SELECT * FROM articles
Then get all the relevant user names in one go:
SELECT id, username FROM users WHERE id IN (3, 7, 19, 34, ...)
This way you only have to hit the database twice instead of many times, but you don't get duplicated data. Having said that, it seems that you don't have that much duplicated data in your queries anyway so the first query would work fine too in this specific case.
I'd probably choose your first option in this specific case because of its simplicity, but if you need more information for each user then go with the third option. I'd probably not choose your second option as it is neither the fastest nor the simplest.

It depends how much data the queries are returning - if you'll be getting a lot of duplicate data (i.e. one user has written many articles) you are better off doing the queries separately.
If you don't have a lot of duplicated data, joins are always preferable as you only have to make one visit to the database server.

The first approach is better if applicable/possible:
SELECT a.*, u.username, u.id
FROM articles a
JOIN users u ON u.id = a.user_id
You have to write less code
There is no need to run multiple queries
Using joins is ideal when possible

Get the articles with one query, then get each username once and not every time you display it (cache them in an array or whatever).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.