It took a while to come up with a title as I wasn't sure what to title it. Basically my problem deals with SQL queries and coming up with an efficient method to go about what I am trying to do.
To give it in an example, say we have two tables:
Table 1 (Articles): ID | ArticleName | AuthorID
Table 2 (Users): ID | AuthorName
What I am attempting to do is pull, say the last 5 articles. From here, with each article it pulls it has a while loop to query the second table to pull AuthorName where ID=AuthorID.
In essence, we have one query for the 5 articles and then another five queries to get the author names. This is further compounded on pages with 10-20 or more articles, where there's an extra 10-20+ queries.
Is there a more efficient method to join these statements together and have it pull the AuthorName for each article it pulls?
The reason for using AuthorID in table 1 is so that if usernames are changed, it doesn't break anything. Along with this, it (as far as I understand) cuts down a lot on the database storage.
I'm still somewhat new to SQL though so any ideas on how to resolve this would be much appreciated.
Thanks in advance, and if there are any questions please don't hesitate to ask!
SELECT * FROM `Articles` INNER JOIN `Users` ON `Articles`.`AuthorID`=`Users`.`ID`
There's two ways to do this. You can either do a one-shot query that JOINs in the additional authors table and presents a complete result set, or you can do a two pass where you fetch all the authors in a subsequent call using SELECT ... FROM Authors WHERE ID IN (...) using the distinct identifiers from the first query.
For small lists and small tables the JOIN method will almost always be more convenient. For large lists the two-pass approach seems "dumber" but often out-performs doing the join in the database. For instance, if the number of articles is very large and the number of authors is small then the JOIN adds significant amounts of work to the large query that could be eliminated by making a small secondary query after the fact.
For this case, with less than one million records and small fetch sizes, go with JOIN.
Related
I am converting an access database to a new format. Currently all data resides in MySQL.
For the purposes of this question, there are 3 tables. tbl_Bills, tbl_Documents, and tbl_Receipts.
I wrote an outer join query , as some bills have documents and receipts, other's don't. And I need a full listing of each set, given those situations, to be processed by a php script later on.
The problem is that the primary identifier, we'll call fld_CommonID, happens to exist in duplicate. For example, 3 bills have the same identifier, with different information. 3 documents and 3 receipts match those 3 bills.
So as you might have guessed, my join query results in 9 indistinct rows (6 duplicates), when there should be 3 (one join from each table). An inner join excludes data that isn't defined in the other table, and so doesn't work for my needs.
SO ... I'm thinking what I want to do, is update those 3 records in each table (across all rows that have duplicates) such that they have a unique counter id. #1, #2, and #3 respectively, so that I can perform join queries on them uniquely per row.
Is that possible without running php code to select the duplicates ordered by natural table order, followed-by updating them with a counter?
Would you advise that I go that route(scripted) instead of some magical SQL query to do such a thing, if such a query can be made?
Or is it possible to outer join based on natural table order (pretty sure that's impossible)?
writing this answer to simply close the question.
Inner joins would be perfect if there were a way to link duplicate fields in separate tables based on natural order (no primary key). The problem isn't that I lack a query, it's that the database is poorly structured. Which is a problem better solved with code not complex queries.
I'm working on an existing application that uses some JOIN statements to create "immutable" objects (i.e. the results are always JOINed to create a processable object - results from only one table will be meaningless).
For example:
SELECT r.*,u.user_username,u.user_pic FROM articles r INNER JOIN users u ON u.user_id=r.article_author WHERE ...
will yield a result of type, let's say, ArticleWithUser that is necessary to display an article with the author details (like a blog post).
Now, I need to make a table featured_items which contains the columnsitem_type (article, file, comment, etc.) and item_id (the article's, file's or comment's id), and query it to get a list of the featured items of some type.
Assuming tables other than articles contain whole objects that do not need JOINing with other tables, I can simply pull them with a dynamicially generated query like
SELECT some_table.* FROM featured_items RIGHT JOIN some_table ON some_table.id = featured_items.item_id WHERE featured_items.type = X
But what if I need to get a featured item from the aforementioned type ArticleWithUser? I cannot use the dynamically generated query because the syntax will not suit two JOINs.
So, my question is: is there a better practice to retrieve results that are always combined together? Maybe do the second JOIN on the application end?
Or do I have to write special code for each of those combined results types?
Thank you!
a view can be thot of as like a table for the faint of heart.
https://dev.mysql.com/doc/refman/5.0/en/create-view.html
views can incorporate joins. and other views. keep in mind that upon creation, they take a snapshot of the columns in existence at that time on underlying tables, so Alter Table stmts adding columns to those tables are not picked up in select *.
An old article which I consider required reading on the subject of MySQL Views:
By Peter Zaitsev
To answer your question as to whether they are widely used, they are a major part of the database developer's toolkit, and in some situations offer significant benefits, which have more to do with indexing than with the nature of views, per se.
I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.
So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.
My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.
For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0
The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.
The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):
SELECT id, surname, name FROM authors
foreach row {
SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
echo "<option>$surname, $name ($num)</option>";
}
With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.
Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.
I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.
Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?
BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.
Your query style is very inefficient - try using a join and group structure:
SELECT
authors.id,
authors.surname,
authors.name,
COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;
EDIT
Just to clear up some issues I not explicitely said:
Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
Indices on books.id_auth and authors.id (the latter primary or unique) are assumed
EDIT 2
As #GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.
To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):
SELECT
theme.id,
theme.main,
theme.sub,
IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;
EDIT 3
Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way.
If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table).
Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS
After reading a lot about how the JOIN statement works, with the help of
useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.
I made a test using the options I had:
reply with the JOIN query with IFNULL(): 0,5 seconds
reply with the JOIN query without IFNULL(): 0,5 seconds
reply using a stored value: 0,4 seconds
That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).
Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.
update
I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:
SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC
It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.
Is there a way to have back my theme.main and theme.sub?
im kinda new with mysql and i'm trying to create a kind complex database and need some help.
My db structure
Tables(columns)
1.patients (Id,name,dob,etc....)
2.visits (Id,doctor,clinic,Patient_id,etc....)
3.prescription (Id,visit_id,drug_name,dose,tdi,etc....)
4.payments (id,doctor_id,clinic_id,patient_id,amount,etc...) etc..
I have about 9 tables, all of them the primary key is 'id' and its set to autoinc.
i dont use relations in my db (cuz i dont know if it would be better or not ! and i never got really deep into mysql , so i just use php to run query's to Fitch info from one table and use that to run another query to get more info/store etc..)
for example:
if i want to view all drugs i gave to one of my patients, for example his id is :100
1-click patient name (name link generated from (tbl:patients,column:id))
2-search tbl visits WHERE patient_id=='100' ; ---> that return all his visits ($x array)
3-loop prescription tbl searching for drugs with matching visit_id with $x (loop array).
4- return all rows found.
as my database expanding more and more (1k+ record in visit table) so 1 patient can have more than 40 visit that's 40 loop into prescription table to get all his previous prescription.
so i came up with small teak where i edited my db so that patient_id and visit_id is a column in nearly all tables so i can skip step 2 and 3 into one step (
search prescription tbl WHERE patient_id=100), but that left me with so many duplicates in my db,and i feel its kinda stupid way to do it !!
should i start considering using relational database ?
if so can some one explain a bit how this will ease my life ?
can i do this redesign but altering current tables or i must recreate all tables ?
thank you very much
Yes, you should exploit MySQL's relational database capabilities. They will make your life much easier as this project scales up.
Actually you're already using them well. You've discovered that patients can have zero or more visits, for example. What you need to do now is learn to use JOIN queries to MySQL.
Once you know how to use JOIN, you may want to declare some foreign keys and other database constraints. But your system will work OK without them.
You have already decided to denormalize your database by including both patient_id and visit_id in nearly all tables. Denormalization is the adding of data that's formally redundant to various tables. It's usually done for performance reasons. This may or may not be a wise decision as your system scales up. But I think you can trust your instinct about the need for the denormalization you have chosen. Read up on "database normalization" to get some background.
One little bit of advice: Don't use columns named simply "id". Name columns the same in every table. For example, use patients.patient_id, visits.patient_id, and so forth. This is because there are a bunch of automated software engineering tools that help you understand the relationships in your database. If your ID columns are named consistently these tools work better.
So, here's an example about how to do the steps numbered 2 and 3 in your question with a single JOIN query.
SELECT p.patient_id p.name, v.visit_id, rx.drug_name, rx.drug_dose
FROM patients AS p
LEFT JOIN visits AS v ON p.patient_id = v.patient_id
LEFT JOIN prescription AS rx ON v.visit_id = rx.visit_id
WHERE p.patient_id = '100'
ORDER BY p.patient_id, v.visit_id, rx.prescription_id
Like all SQL queries, this returns a virtual table of rows and columns. In this case each row of your virtual table has patient, visit, and drug data. I used LEFT JOIN in this example. That means that a patient with no visits will have a row with NULL data in it. If you specify JOIN MySQL will omit those patients from the virtual table.
I want to list the recent activities of a user on my site without doing too many queries. I have a table where I list all the things the user did with the date.
page_id - reference_id - reference_table - created_at - updated_at
The reference_id is the ID I need to search for in the reference_table (example: comments). If I would do a SELECT on my activity table I would then have to query:
SELECT * FROM reference_table where id = reference_id LIMIT 1
An activity can be a comment, a page update or a subscription. Depending which one it is, I need to fetch different data from other tables in my database
For example if it is a comment, I need to fetch the author's name, the comment, if it is a reply I need to fetch the orignal comment username, etc.
I've looked into UNION keyword to union all my tables but I'm getting the error
1222 - The used SELECT statements have a different number of columns
and it seems rather complicated to make it work because the amount of columns has to match and none of my table has the same amount of tables and I'm not to fond of create column for the fun of it.
I've also looked into the CASE statement which also requires the amount of columns to match if I remember correctly (I could be wrong for this one though).
Does anyone has an idea of how I could list the recent activities of a user without doing too many queries?
I am using PHP and MySQL.
You probably want to split out the different activities into different tables. This will give you more flexiblity on how you query the data.
If you choose to use UNION, make sure that the you use the same number of columns in each select query that the UNION is comprised of.
EDIT:
I was down-voted for my response, so perhaps I can give a better explanation.
Split Table into Separate Tables and UNION
I recommended this technique, because it will allow you to be more explicit about the resources for which you are querying. Having a single table for inserting is convenient, but you will always have to do separate queries to join with other tables to get meaningful information. Also, you database schema will be obfuscated by a single column being a foreign key for different tables depending on the data stored in that row.
You could have tables for comment, update and subscription. These would have their own data which could be queried on individually. If, say, you wanted to look at ALL user activity, you could somewhat easily use a UNION as follows:
(SELECT 'comment', title, comment_id AS id, created FROM comment)
UNION
(SELECT 'update', title, update_id as id, created FROM update)
UNION
(SELECT 'subscription', title, subscription_id as id, created
FROM subscription)
ORDER BY created desc
This will provide you with a listing view. You could then link to the details of each type or load it on an ajax call.
You could accomplish this with the method that you are currently using, but this will actually eliminate the need for the 'reference_table' and will accomplish the same thing in a cleaner way (IMO).
The problem is that UNION should be used just to get similar recordsets together. If you try to unify two different queries (for example, with different columns being fetched) it's an error.
If the nature of the queries is different (having different column count, or data types) you'll need to make several different queries and treat them all separately.
Another approach (less elegant, I guess) would be LEFT JOINing your activities table with all the others, so you'll end up with a recordset with a lot of columns, and you'll need to check for each row which columns should be used depending on the activity nature.
Again, I'd rather stick with the first one, since the second procudes a rather sparse recorset.
With UNION you don't have to get all of the columns from each table, just as long as all of the columns have the same datatypes.
So you could do something like this:
SELECT name, comment as description
FROM Comments
UNION
SELECT name, reply as description
FROM Replies
And it wouldn't matter if Comments and Replies have the same number of columns.
This really depends on the amount of traffic on your site. The union approach is a straightforward and possibly the correct one, logically, but you'll suffer on the performance if your site is heavily loaded since the indexing of a UNIONed query is hard.
Joining might be good, but again, in terms of performance and code clarity, it's not the best of ways.
Another totally different approach is to create an 'activities' table, which will be updated with activity (in addition to the real activity, just for this purpose). In old terms of DB correctness, you should avoid this approach since it will create duplicate data on your system, I, however, found it very useful in terms of performance.
[Another side note about the UNION approach if you decide to take it: if you have difference in parameters length, you can SELECT bogus parameters on some of the unions, for example.. (SELECT UserId,UserName FROM users) UNION (SELECT 0,UserName from notes)