PHP join help with two tables - php

I am just learning php as I go along, and I'm completely lost here. I've never really used join before, and I think I need to here, but I don't know. I'm not expecting anyone to do it for me but if you could just point me in the right direction it would be amazing, I've tried reading up on joins but there are like 20 different methods and I'm just lost.
Basically, I hand coded a forum, and it works fine but is not efficient.
I have board_posts (for posts) and board_forums (for forums, the categories as well as the sections).
The part I'm redoing is how I get the information for the last post for the index page. The way I set it up is that to avoid using joins, I have it store the info for latest post in the table for board_forums, so say there is a section called "Off Topic" there I would have a field for "forum_lastpost_username/userid/posttitle/posttime" which I woudl update when a user posts etc. But this is bad, I'm trying to grab it all dynamically and get rid of those fields.
Right now my query is just like:
`SELECT * FROM board_forums WHERE forum_parent='$forum_id''
And then I have the stuff where I grab the info for that forum (name, description, etc) and all the data for the last post is there:
$last_thread_title = $forumrow["forum_lastpost_title"];
$last_thread_time = $forumrow["forum_lastpost_time"];
$lastpost_username = $forumrow["forum_lastpost_username"];
$lastpost_threadid = $forumrow["forum_lastpost_threadid"];
But I need to get rid of that, and get it from board_posts. The way it's set up in board_posts is that if it's a thread, post_parentpost is NULL, if it's a reply, then that field has the id of the thread (first post of the topic). So, I need to grab the latest post_date, see which user posted that, THEN see if parentpost is NULL (if it's null then the last post is a new thread, so I can get all the info of the title and user there, but if it's not, then I need to get the info (title, id) of the first post in that thread (which can be found by seeing what post_parentpost is, looking up that ID and getting the title from it.
Does that make any sense? If so please help me out :(
Any help is greatly appreciated!!!!

Updating board___forums whenever a post or a reply is inserted is - regarding performance - not the worst idea. For displaying the index page you only have to select data from one table board_forums - this is definitely much faster than selecting a second table to get the "last posts' information", even when using a clever join.

You are better off just updating the stats on each action, New Post, Delete Post etc.
The other instances would not likely require any stats update (deletion of a thread would trigger a forum update, to show one less topic in the topic count).
Think about all the actions the user would do, in most cases, you dont need to update any stats, therefore, getting the counts on the fly is very inefficient and you are right to think so.

It looks like you've already done the right thing.
If you were to join, you'd do it like this:
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id
WHERE forum_parent = '$forum_id'
The problem with that, is that it gets you every post, which is not useful (and very slow). What you would want to do is something like this
SELECT * FROM board_forums
JOIN board_posts ON board_posts.forum_id = board_forums.id ORDER BY board_posts.id desc LIMIT 1
WHERE forum_parent = '$forum_id'
except SQL doesn't work like that. You can't order or limit on a join (or do many other useful things like that), so you have to fetch every row and then scan them in code (which sucks).
In short, don't worry. Use joins for the actual case where you do want to load all forums and all posts in one hit.

The simple solution will result in numerous queries, some optional, as you're already discovered.
The classic approach to this is to cache the results, and only retrieve it once in a while. The cache doesn't have to live long; even two or three seconds on a busy site will make a significant difference.
De-normalizing the data into a table you're already reading anyway will help. This approach saves you figuring out optional queries and can be a bit of a cheap win because it's just one more update when an insert is already happening. But it shifts some data integrity to the application.
As an aside, you might be running into the recursive-query problem with your threads. Relational databases do not store heirarchical data all that well if you use a "simple" algorithim. A better way is something sometimes called 'set trees'. It's a bit hard to Google, unfortunately, so here are some links.

Related

Is this a good way to use NOT IN?

Let us Imagine the Facebook homepage. There is a list of posts, I report a post, and that post is blocked.
So, in the PHP & Mysql backend, I would do something like.
reported_posts = MySQL GROUP_CONCAT(reported_post_id) and fetch all my reported posts, store it in some cache like memcached or redis. This will give me a response with comma separated post_ids like 123, 234, 45
Fetch all homepage_posts which are NOT IN (reported_posts). This will give us all the post_ids that needs to be in the homepage other than the posts, 123, 234 and 45, as I have used NOT IN.
The issue here is that, as time goes by, the reported_posts will keep on increasing(lets assume it increases 1000 ids). At that time, the NOT IN (reported_posts) clause will take a huge input. Will this effect the performance of the query? What is an alternative solution to this?
I hope I could convey my doubt clearly, please let me know if it needs more clarification, I would edit as such. Thank you.
EDIT
The Reported post is not to be considered Globally, i.e. If I report the post, it should be Hidden only for me, and not for anyone else. So, it's also dependent on the account_id as well.
Assuming that reported_posts contains a list of user-specific blacklisted posts, it would be much better to do an exclusive left join and let the database handle everything:
SELECT *
FROM homepage_posts hp
LEFT JOIN
reported_posts rp
ON hp.id = rp.post_id
AND rp.account_id = 123
WHERE
rp.id IS NULL
In mysql "IN" operator works fine if the column is indexed. If that column is not indexed then it impacts performance.

Update view count, most reliable way

Hello again Stackoverflow!
I'm currently working on custom forumsoftware and one of the things you like to see on a forum is a viewcounter.
All the approaches for a viewcounter that I found would just select the topic from the database, retrieve the number from a "views" column, add one and update it.
But here's my thought: If, lets say 400, people at the exact same time open a topic, the MySQL database probably won´t count all views because it takes time for the queries to complete, and so the last person (of the 400) might overwrites the first persons (of the 400) view.
Ofcourse one could argue that on a normal site this is never going to happen, but if you have ~7 people opening that topic at the exact same second and the server is struggleing at that moment, you could have the same problem.
Is there any other good approach to count views?
EDIT
Woah, could the one who voted down specify why?
I ment by "Retrieving the number of views and adding one" that I would use SELECT to retrieve the number, add one using PHP (note the tags) and updating it using UPDATE. I had no idea of the other methods specified below, that's why I asked.
If, lets say 400, people at the exact same time open a topic, the MySQL database apparently would count all the views because this is exactly what databases were invented for.
All the approaches for a viewcounter that you have found are wrong. To update a field you don't need to retrieve it, but just already update:
UPDATE forum SET views + 1 WHERE id = ?
So something like that will work:
UPDATE tbl SET cnt = cnt+1 WHERE ...
UPDATE is guaranteed to be atomic. That means no one will be able to alter cnt between the time it is read and the time it is replaced. If you have several concurrent UPDATE for the same row (InnoDB) or table (MyISAM) they have to wait their turn to update the date.
See Is incrementing a field in MySQL atomic?
and http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html

MySQL table/query design - multiples of one item, many comments

I have a site which display images, up to 30 per page.
Users can comment on the images and these comments, or at least the first few, appear under the image if there are comments.
I have a table of image references linked to a folder on my server.
e.g.
image_id // image id
user_id // user who added
image_url // ref to image
Then a separate table for all comments
comment_id
image_id // link to images table
comm_poster_id // id of user who posted comment
Now, the question is what the best way to call the information together? Ideally in one select
I can't really ajax call under each image as that would be 30 db calls per page which would kill it so whats the alternative/best method?
To clarify, in the select there would only ever be 1 image but there could of course be multiple comments for an image
Hope i've given enough info
EDIT To clarify, the question is what is the best way to collate all this information together for display - can I run one query which pulls all the images in on the page also somehow pulls the comments for images in if they exist.
As for how I would like the data to look... I don't know. This is the first time I've done anything like this so guidance needed if possible.
Ok, well I'm not a php expert, but I got you started on the sql side of things. I CAN help you with php, but there are others here that are more versed in it that I am.
I started this sqlFiddle for you, go have a look and you can tinker with the query to get what you want.
http://sqlfiddle.com/#!2/79ecf/1/0
From the php side, until you know how you want to display your data, it's difficult to say what your query needs to look like. I went with this for the time being:
select *
from images i
inner join comments c on i.image_id=c.image_id;
This is a VERY simple query and you will probably end up needing to add to it.
I'll assume you are using mysql as most people using php choose mysql. From my understanding there are 2 ways to connect, mysqli and pdo. PDO seems to be emerging as the preferred method, but I know nothing about it. Here are references for both. Just DO NOT USE mysql_query(), it is deprecated so don't bother learning any part of it.
PDO: http://dev.mysql.com/doc/refman/5.6/en/apis-php-pdo-mysql.html
MYSQLI: http://php.net/manual/en/mysqli.query.php
Either of these should give enough of a tutorial to show you how to query your database and then loop through the results to get to your data. It is then up to you how you want to display it on your page.
Hopefully this is enough to point you in the right direction.
Good Luck!
Your easiest approach is to just join the two tables together, sorting first by image and sub-sorting by comment.
SELECT i.*, c.comment_id, c.comm_poster_id
FROM images i LEFT JOIN comments c ON i.image_id=c.image_id
WHERE {whatever where clause selects your set of 30 images}
ORDER BY i.image_id, c.comm_poster_id
Use a LEFT JOIN or images without comments won't display.

MySQL/PHP: Using multiple sub-queries in a query selecing multiple results, is it a bad idea?

Sorry if the title is a little... Crappy. Basically I'm writing a small forum and using multiple sub-queries to select the number of threads, number of posts, and the date of the last post in a forum while grabbing the forum's information at the same time to display on the main page!
This is my query, since I suck at explaining things:
SELECT `f`.*,
(SELECT COUNT(`id`)
FROM `forum_threads`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`) AS `threadCount`,
(SELECT COUNT(`id`)
FROM `forum_posts`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`) AS `postCount`,
(SELECT `date`
FROM `forum_posts`
WHERE `forumId1` = `f`.`id1`
AND `forumId2` = `f`.`id2`
ORDER BY `date` DESC LIMIT 1) AS `lastPostDate`
FROM `forum_forums` AS `f`
ORDER BY `f`.`position` ASC, `f`.`id1` ASC;
And am using the general foreach loop to display the results:
foreach($forums AS $forum) {
echo $forum->name .'<br />';
echo $forum->threadCount .'<br />';
echo $forum->postCount .'<br />';
echo $forum->lastPostDate .'<br />';
}
(Not exactly like that of course, but for the sake of explaining...)
Now I was wondering if that would be "bad" for performance, or if there was any better way of doing it? Assuming there are quite a few posts and threads in each forum.
I was originally storing "posts", "threads", and "lastPost" columns in the forum table itself, and was going to increment (posts = posts + 1) the values every time someone created a new thread or post. Though I had this idea as well and was wondering if it was any good. :P
I would do things a bit differently:
It seems to me that all these three fields: threadCount, postCount and lastPostDate are fields that you can maintain on a separate table, say forum_stats which will hold only 4 columns:
* forum_id
* thread_count
* post_count
* last_post_date
These columns can be updated via. trigger upon insert/update.
If you'll pay this small overhead during the update operations - you'll get a very fast query for the select (and it will remain very fast regardless the number of forums/posts/threads you have).
Another approach (not us good TMO):
Create the stats table and run a daily (or every few hours) a batch-job which will update the stats. The price is that the data you display will never be up-to-date, and the job might require resources, you might want to run the job only at night, for example, since it's heavy and you don't want it to effect the majority of your website visitors.
Usually this kind of thing is terrible from a performance perspective and you'd be better off with counter columns that you can fetch from a single row. Keeping these in sync can be annoying, but there's no retrieval cost once they're in there.
You've identified the data you're retrieving, so what you need to do next is figure out how to put that data in there in the first place. #alfasin's answer describes an example schema, and while putting it in a separate table is one idea, there's usually not too much in the way of trouble just putting them in the main one. If you're worried about locking, update in smaller batches.
One approach is to write a TRIGGER that updates the counters as records are added and removed from the various tables. This tends to hide a lot of the complexity which can be a bad thing if the logic changes often and people need to be aware of how the system works.
A simple method is to just fiddle with the columns using an additional query after you've created or removed something that would have updated them. For instance, adjusting the last-posted-date is trivial if you do it at the time a post is created.
If these counters get a bit screwy, and they will eventually, you need a method to bring them back into sync. An easy way is to write a VIEW that produces the same results your query does now, perhaps re-written to use LEFT JOIN instead, and then UPDATE against that if that's possible. This may involve using a temporary table if MySQL can't cope with updating a table with a view of itself, but that's usually not a big deal.

hiding model data based on id's existance in another table

I've got a somewhat complicated question for you cakephp experts.
Basically, I have created a db table called "locations". Every month I will get this table sent to me in csv format from a client. Unfortunately, instead of updating this table, I will have to empty it and reimport all of the records. Unfortunately, I cannot alter this table at all.
Functionality wise, users will have the ability to look at a display of these records, and be able to choose to hide certain ones. This "hidden" attribute must be persistent and survive the month to month purging of all records.
I had all of this working yesterday. What I did was, create a separate table called location_properties (columns were: id(int), location_id(foreign key), is_hidden(boolean)). When showing these records, it would simply check to see if "is_hidden==true".
This was all well and good(AND WORKING!), but then my boss kind of gummed up the works. He told me to delete the "is_hidden" column from the table because it would be more efficient. That I should be able to simply check for the existence of the location_id to hide or show it.
It doesn't appear to be quite that simple. Anyone know how I can pull this off? I've tried everything I can think of.
Your boss is wrong.
It's more efficient to add your column, than it is too delete and re-import the locations every month.
Did he say it was less efficient, or did you do an actual benchmark to see if its harms performance too much?
At first glance I see 2 solutions:
1) add a condition array('Location.id' => 'NOT NULL')
2) change join type to right join
I hope this helps

Categories