Good practice for handling naturally JOINed results across an application - php

I'm working on an existing application that uses some JOIN statements to create "immutable" objects (i.e. the results are always JOINed to create a processable object - results from only one table will be meaningless).
For example:
SELECT r.*,u.user_username,u.user_pic FROM articles r INNER JOIN users u ON u.user_id=r.article_author WHERE ...
will yield a result of type, let's say, ArticleWithUser that is necessary to display an article with the author details (like a blog post).
Now, I need to make a table featured_items which contains the columnsitem_type (article, file, comment, etc.) and item_id (the article's, file's or comment's id), and query it to get a list of the featured items of some type.
Assuming tables other than articles contain whole objects that do not need JOINing with other tables, I can simply pull them with a dynamicially generated query like
SELECT some_table.* FROM featured_items RIGHT JOIN some_table ON some_table.id = featured_items.item_id WHERE featured_items.type = X
But what if I need to get a featured item from the aforementioned type ArticleWithUser? I cannot use the dynamically generated query because the syntax will not suit two JOINs.
So, my question is: is there a better practice to retrieve results that are always combined together? Maybe do the second JOIN on the application end?
Or do I have to write special code for each of those combined results types?
Thank you!

a view can be thot of as like a table for the faint of heart.
https://dev.mysql.com/doc/refman/5.0/en/create-view.html
views can incorporate joins. and other views. keep in mind that upon creation, they take a snapshot of the columns in existence at that time on underlying tables, so Alter Table stmts adding columns to those tables are not picked up in select *.

An old article which I consider required reading on the subject of MySQL Views:
By Peter Zaitsev
To answer your question as to whether they are widely used, they are a major part of the database developer's toolkit, and in some situations offer significant benefits, which have more to do with indexing than with the nature of views, per se.

Related

Joins vs multiple selects

I'm doing a fairly simple system where users can find computers by searching by option type. I want to search by brand, model, and "options".
Essentially I have 5 tables in this scenario-
brand
model
selection
options_group
options
The selection table is a multi-column lookup table containing:
brand_id
model_id
options_group_id
The options_group table is a lookup table with an ID for "groups of options" and an entry for each option_id.
Basically, the options_group table allows for lots of entries to have the same group of options without storing it more than once.
Right. So. I want to select a specific selection of parts that generates a table:
brand
model
options
where "options" is generated based off the options_group.
My question is this: Do I do this with multiple select statements, where I select just from the selection table first, and then use options_group to do a second select and get all of the options for each row, or do I do a join and get a table with lots of rows?
Before you suggest it, I'm not finding that any of the other answers on SO are answering this exact question.
Or is there some other, better way to do it? I read that joins are orders of magnitude faster than multiple selects, but parsing it at the end could take more time.
use a single statement with select distinct to weed out duplicates. the relational-calculus / relational-algebra that underlies SQL automatically eliminates duplicates as part of the project operator. however, SQL by default does not do so and requires you to use distinct. because underlying relational theory encourages a single statement, and it fits neatly into the operators, i recommend it as a best practice.
with two tables parent (id) and child (id, parent_id, property) do select distinct parent.id from parent join child on parent.id = child.id where child.property in ("X", "Z");
Since you asked for good practice, I'll throw in the fact that this doesn't have to be a db-only solution. It's good practice to cache static/lookup data (sounds like models and/or parts don't change very often) in the app layer or something like memcached, etc, and it will save you the joins and reduce your resultset size.

Issue Regarding Join Statements (Or Other Efficient Method) in MySQL

It took a while to come up with a title as I wasn't sure what to title it. Basically my problem deals with SQL queries and coming up with an efficient method to go about what I am trying to do.
To give it in an example, say we have two tables:
Table 1 (Articles): ID | ArticleName | AuthorID
Table 2 (Users): ID | AuthorName
What I am attempting to do is pull, say the last 5 articles. From here, with each article it pulls it has a while loop to query the second table to pull AuthorName where ID=AuthorID.
In essence, we have one query for the 5 articles and then another five queries to get the author names. This is further compounded on pages with 10-20 or more articles, where there's an extra 10-20+ queries.
Is there a more efficient method to join these statements together and have it pull the AuthorName for each article it pulls?
The reason for using AuthorID in table 1 is so that if usernames are changed, it doesn't break anything. Along with this, it (as far as I understand) cuts down a lot on the database storage.
I'm still somewhat new to SQL though so any ideas on how to resolve this would be much appreciated.
Thanks in advance, and if there are any questions please don't hesitate to ask!
SELECT * FROM `Articles` INNER JOIN `Users` ON `Articles`.`AuthorID`=`Users`.`ID`
There's two ways to do this. You can either do a one-shot query that JOINs in the additional authors table and presents a complete result set, or you can do a two pass where you fetch all the authors in a subsequent call using SELECT ... FROM Authors WHERE ID IN (...) using the distinct identifiers from the first query.
For small lists and small tables the JOIN method will almost always be more convenient. For large lists the two-pass approach seems "dumber" but often out-performs doing the join in the database. For instance, if the number of articles is very large and the number of authors is small then the JOIN adds significant amounts of work to the large query that could be eliminated by making a small secondary query after the fact.
For this case, with less than one million records and small fetch sizes, go with JOIN.

Are database queries for everyone in a user list too much?

I am currently using MySQL and MyISAM.
I have a function of which returns an array of user IDs of either friends or users in general in my application, and when displaying them a foreach seemed best.
Now my issue is that I only have the IDs, so I would need to nest a database call to get each user's other info (i.e. name, avatar, other fields) based on the user ID in the loop.
I do not expect hundreds of thousands of users (mainly for hobby learning), although how should I do this one, such as the flexibility of placing code in a foreach for display, but not relying on ID arrays so I am out of luck to using a single query?
Any general structures or tips on what I can display the list appropriately with?
Is my amount of queries (1:1 per users in list) inappropriate? (although pages 0..n of users, 10 at a time make it seem not as bad I just realize.)
You could use the IN() MySQL method, i.e.
SELECT username,email,etc FROM user_table WHERE userid IN (1,15,36,105)
That will return all rows where the userid matches those ID's. It gets less efficient the more ID's you add but the 10 or so you mention should be just fine.
Why couldn't you just use a left join to get all the data in 1 shot? It sounds like you are getting a list, but then you only need to get all of a single user's info. Is that right?
Remember databases are about result SETS and while generally you can return just a single row if you need it, you almost never have to get a single row then go back for more info.
For instance a list of friends might be held in a text column on a user's entry.
Whether you expect to have a small database or large database, I would consider using the InnoDB engine rather than MyISAM. It does have a little higher overhead for processing than MyISAM, however you get all the added benefits (as your hobby grows) including JOIN, which will allow you to pull in specific data from multiple tables:
SELECT u.`id`, p.`name`, p.`avatar`
FROM `Users` AS u
LEFT JOIN `Profiles` AS p USING `id`
Would return id from Users and name and avatar from Profiles (where id of both tables match)
There are numerous resources online talking about database normalization, you might enjoy: http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/

Fetching records from different tables in the database

My application has a facebook-like stream that displays updates of various types. So it will show regular posts (from the "posts" table), and events (from the "events" tables) table and so on.
The problem is I have no idea how to fetch these records from different tables since they have different columns. Shall I query the database multiple times and then organize the data in PHP? if so, how? I'm not sure how I should approach this.
Your help is much appreciated :)
Unless the events and post are related to each other, then you'd probably query them separately, even if they show up on the same page.
You're not going to want to use JOIN just for the sake of it. Only if there is a foreign key relationship. If you don't know what that is, then you don't have one.
If the data tables are related to each other you can generally get the data back in a single query using some combination of JOINs and UNIONs. For a better answer, however, you'll have to post the structure of your data tables and a sample of what (combined) records you need for the website.
If you don't know the columns, you can get the table meta-data and find out what the columns represent and their corresponding data types.
If you know which columns, you can select from the multiple tables or even use nested selects or joins to get the data out.
Ideally you'd simply use a JOIN to obtain data from multiple tables in one query. However, without knowing more about your table schemas it's hard to provide any useful specifics. (It most likely won't be possible unless you've factored this in from the beginning that said.)
As such, you might also want to create a generic "meta" table that provides information for each of the posts/events in a common format, and provides a means to link to the relevant table. (i.e.: It would contain the "parent" type and ID.) You could then use this meta table as the source for the "updates" stream and drill down to the approriate content as required.
Join the tables on user_id i.e.
Select * from posts p
left join status_updates su on p.user_id = su.user_id
limit 25;
or if your tables differ too much then play with a temporary table first
create table tmp_updates
(
select user_id, p.id as update_id, 'post' as update_type, p.text from posts;
);
insert into table tmp_updates
(
select user_id, su.id as update_id, 'status' as update_type, su.text from status_updates;
);
Select * from tmp_updates
where user_id = '...'
limit 25;

What is the best approach to list a user's recent activities in PHP/MySQL?

I want to list the recent activities of a user on my site without doing too many queries. I have a table where I list all the things the user did with the date.
page_id - reference_id - reference_table - created_at - updated_at
The reference_id is the ID I need to search for in the reference_table (example: comments). If I would do a SELECT on my activity table I would then have to query:
SELECT * FROM reference_table where id = reference_id LIMIT 1
An activity can be a comment, a page update or a subscription. Depending which one it is, I need to fetch different data from other tables in my database
For example if it is a comment, I need to fetch the author's name, the comment, if it is a reply I need to fetch the orignal comment username, etc.
I've looked into UNION keyword to union all my tables but I'm getting the error
1222 - The used SELECT statements have a different number of columns
and it seems rather complicated to make it work because the amount of columns has to match and none of my table has the same amount of tables and I'm not to fond of create column for the fun of it.
I've also looked into the CASE statement which also requires the amount of columns to match if I remember correctly (I could be wrong for this one though).
Does anyone has an idea of how I could list the recent activities of a user without doing too many queries?
I am using PHP and MySQL.
You probably want to split out the different activities into different tables. This will give you more flexiblity on how you query the data.
If you choose to use UNION, make sure that the you use the same number of columns in each select query that the UNION is comprised of.
EDIT:
I was down-voted for my response, so perhaps I can give a better explanation.
Split Table into Separate Tables and UNION
I recommended this technique, because it will allow you to be more explicit about the resources for which you are querying. Having a single table for inserting is convenient, but you will always have to do separate queries to join with other tables to get meaningful information. Also, you database schema will be obfuscated by a single column being a foreign key for different tables depending on the data stored in that row.
You could have tables for comment, update and subscription. These would have their own data which could be queried on individually. If, say, you wanted to look at ALL user activity, you could somewhat easily use a UNION as follows:
(SELECT 'comment', title, comment_id AS id, created FROM comment)
UNION
(SELECT 'update', title, update_id as id, created FROM update)
UNION
(SELECT 'subscription', title, subscription_id as id, created
FROM subscription)
ORDER BY created desc
This will provide you with a listing view. You could then link to the details of each type or load it on an ajax call.
You could accomplish this with the method that you are currently using, but this will actually eliminate the need for the 'reference_table' and will accomplish the same thing in a cleaner way (IMO).
The problem is that UNION should be used just to get similar recordsets together. If you try to unify two different queries (for example, with different columns being fetched) it's an error.
If the nature of the queries is different (having different column count, or data types) you'll need to make several different queries and treat them all separately.
Another approach (less elegant, I guess) would be LEFT JOINing your activities table with all the others, so you'll end up with a recordset with a lot of columns, and you'll need to check for each row which columns should be used depending on the activity nature.
Again, I'd rather stick with the first one, since the second procudes a rather sparse recorset.
With UNION you don't have to get all of the columns from each table, just as long as all of the columns have the same datatypes.
So you could do something like this:
SELECT name, comment as description
FROM Comments
UNION
SELECT name, reply as description
FROM Replies
And it wouldn't matter if Comments and Replies have the same number of columns.
This really depends on the amount of traffic on your site. The union approach is a straightforward and possibly the correct one, logically, but you'll suffer on the performance if your site is heavily loaded since the indexing of a UNIONed query is hard.
Joining might be good, but again, in terms of performance and code clarity, it's not the best of ways.
Another totally different approach is to create an 'activities' table, which will be updated with activity (in addition to the real activity, just for this purpose). In old terms of DB correctness, you should avoid this approach since it will create duplicate data on your system, I, however, found it very useful in terms of performance.
[Another side note about the UNION approach if you decide to take it: if you have difference in parameters length, you can SELECT bogus parameters on some of the unions, for example.. (SELECT UserId,UserName FROM users) UNION (SELECT 0,UserName from notes)

Categories