Database query for timeline of a social networking site [closed]

Database query for timeline of a social networking site [closed] - php

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am building a social networking site and notice board system for my college.
I want to show 10 to 15 most recent posts from the friends of the LoggedInUser in Home Page.
If I select all the rows of timeline table in from database, then it will consume more memory and also fetch posts which are not of the friends of the LoggedInUser.
So what should be the best possible way to do so?
Help Me.
The table "friends" stores the UserID, FriendID and status.
The table "timeline" stores the posts of all the users with PostID and UserID
I'm using Apache, PHP and MySQL.
Thanks...

You'll want to look into SQL joins pretty carefully before you go build a social network. I'm not gonna lie. This is a very basic question that you should be able to answer on your own (without Stack Overflow) if you're planning on building a social network of any kind.
That said, here are two queries that would work. And you might try experimenting with both and reading up on the difference between the two, as well as the performance implications of using one vs the other. I should also mention, these will give you the PostIDs you need, but you'll need to join those with a posts table (presumably) to get the actual post content. I leave that step up to you.
Also, in terms of getting the 10-15 most recent posts, you need to ask yourself a question... can you even answer that question with the tables you've listed?
Hint: You can't. Question: Why not?
Query Option 1 (gets all the post ids belonging to friends of friends.UserId = ???):
SELECT b.postid, b.userid
FROM friends AS a
INNER JOIN timeline AS b
ON a.FriendID = b.userid
WHERE a.UserID = ???
Query Option 2 (gets all the post ids belonging to friends of friends.UserId = ???):
SELECT b.postid, b.userid
FROM timeline AS b
WHERE EXISTS (SELECT * FROM friends
WHERE friends.UserID = ??? AND b.UserID = friends.FriendID)
Now, a question for you: Do both queries above return the same result? Is one preferable to the other? If so, is it always preferable or does it depend on the circumstance? Can you give me an example?

add timestamp to specify the posts that you want to show.
if you are building a social network website NEVER select all rows (youll face security problems, injections and so forth if youre putting it online)
if you are mixing PHP and HTML together do an IF- ELSE statement: IF they are friends display posts of friends that are in your friends table with the value 1.
hope this helps, Ive also built a social network site for school project.

Related

Implementing voting or "likes" using MySQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am creating a system where users (who are identified by a user id number) will be allowed to vote on posts (think Reddit, StackOverflow, etc).
Users can vote a post up or not vote at all on it.
The number of votes on a given post can easily be stored within the table containing the posts.
Keeping track of who has voted, however, is a different task entirely that I'm not sure how to approach.
I was thinking I could have a table that would have two columns: user id and post id.
When they vote on a post, I add their user id and post id to that table. If they unvote, I remove that entry from the table.
EG:
User ID | Post ID
1 | 3949
1 | 4093
2 | 3949
etc...
Is this a reasonable solution?

Yes this is reasonably simple and easy solution to the problem. You can do the same for your comments(if you like to). In your MAIN_POST table assign a post_id and use this same post_id in other tables (comments(post_id, user_id, post_comment, comment_time) and votes(post_id, user_id, vote_status(you can use 1 for vote up and 0 for vote down))). It will complicate your sql queries, to retrieve data, a little but you can do it. And on android side there are alot of tricks to handle and furnish this data in application and you can make this vote(like) and comments idea just like facebook (YOU for your comments and likes and NAMES for others).

I wouldn't remove rows from the table. I understand why you would want to do that, but why lose the information? Instead, keep a +1/-1 value for each entry and then sum up the values for a post:
select sum(vote)
from uservotes
where postid = 1234;
And, I agree with Rick that you should also include the creation date/time.

Using an 'in between' or 'joining' table is a perfectly acceptable solution in this case. If relevant you could even add a timestamp to the relation and show to the user when a user has upvoted something.
Also it is important to take care of proper Indexes and Keys to have your table structure also perform properly once the dataset grows.

Counting large amounts of data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My site has a social feature. On the user profile it shows how many posts the user has, how many followers and so on.
However our database has over 100,000 rows.
Its working fine however im getting very sceptical on the performance in the long run.
I was thinking of another method which i think would work best.
So basically right now it just counts the rows which the user owns in the mysql database.
Instead of scanning through the entire mysql table would it be better to do the following:
Create a section in the "Users" table called "post_counts". Every time he makes a post the counter will go up. Everytime the user removes his post it goes down and so forth.
Ive tried both methods however since the DB is still small its hard to tell if there is a performance increase
current method just querys SELECT * WHERE user = user_id FROM table_name; then just count with php count($fetchedRows);
Is there a better way to handle this?
[update]
Basically the feature is like the twitter followers. Im sure Twitter doesnt count billions of rows to determine the users followers count.

I have MySQL tables that have 70M+ rows in them. 100,000 is nothing.
But yes I would keep the counters in a field and simply update them whenever that users posts something or deletes a post. Make sure you have good indexes.
Also what you COUNT() make a differences. A COUNT(*) takes a less overhead than a COUNT(col) WHERE...
Use "explain" to see how long different COUNT() statements take and how many rows they are scanning.
As in: mysql> explain select count(*) from my_table where user_id = 72 \G;

How to Handle a great number of rows with SQL Queries and take only small amount of data efficiently? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm coding a site in PHP, and site will contain really much messages(like 100.000 , 200.000 or more) which users will post on the site. The problem is, messages will be stored on a table called 'site_messages' by it's ID. This means, all messages aren't grouped by their poster, it's grouped by their ID. If I want to fetch the messages that are posted by user 'foo', I have to query a lot of rows, and it will get really slow I think. Or I want to fetch the messages by post subject(yes, it will contain post subject column too, and maybe more column to add), I must query all the table again, and unfortunately, it will be less efficient. Is there any speedy solutions about that? I'm using PHP and MySQL(and PHPMyAdmin).
Edit: For example, my table would look like this:
MessageID: 1
MessageContent(Varchar, this is the message that user posts): Hi I like this site. Bye!
MessagePoster(Varchar): crazyuser
MessagePostDate: 12/12/09
MessagePostedIn(Varchar, this is the post subject): How to make a pizza
MessageID: 2
MessageContent(Varchar): This site reallllly sucks.
MessagePoster(Varchar): top_lel
MessagePostDate: 12/12/09
MessagePostedIn(Varchar): Hello, I have a question!
MessageID: 3
MessageContent(Varchar): Who is the admin of this site?
MessagePoster(Varchar): creepy2000
MessagePostDate: 1/13/10
MessagePostedIn(Varchar): This site is boring.
etc...

This is what DBs (especially relationship DBs) were built for! MySql and other DBs use things like indexes to help you get access to the rows you need in the most efficient way. You will be able to write queries like select * from site_messages where subject like "News%" order by entryDateTime desc limit 10 to find the latest ten messages starting with "News", or select * from site_messages, user where user.userid='foo' and site_messages.fk_user=user.id to find all posts for a certain user, and you'll find it performs pretty well. For these, you'd probably have (amongst others) an index for the subject column, and an index on the fk_user column.
Work on having a good table structure (data model). Of course if you have issues you can research DB performance and the topic of explain plans to help.
Yes, for each set of columns you want, you will query the table again. Think of a query as a set of rows. Avoid sending large numbers of rows over connections. As the other commenters have suggested, we can't help much more without more details about your tables.

Two candidates for indexing that jump right out are (Poster, PostDate) and (PostDate, Poster) to help queries in the form:
select ...
from ...
where Poster = #PID and PostDate > #Yesterday;
and
select Poster, count(*) as Postings, ...
from ...
where PostDate > #Yesterday
group by Poster;
and
select Poster, ...
from ...
where PostDate between #DayBeforeYesterday and #Yesterday;
Just keep in mind that indexing improves queries at the expense of the DML operations (insert, update, delete). If the query/DML ratio is very low, you just may want to live with the slower queries.

MySQL Add tag to article if the keyword is in the article title [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
This seems to be a bit out of my league and knowledge of MySQL (I'm not sure if it's even possible with MySQL). So if someone can help it would be much appreciated.
I would like to do the following:
Enter/define a keyword e.g. mercedes
Find all Joomla K2 articles that have that keyword in the title
Than to each of those articles assign my keyword mercedes as a tag.
Now... There are three tables with relevant columns listed:
k2_items
id, title
k2_tags
id, name, published (value 1 is assigned if tag is published)
k2_tags_xref
id, tagID, itemID
So, query should select all items from k2_items table that have keyword in their title, check if keyword is already defined as tag in k2_tags, if not than create a new tag. After that, new k2_tags_xref entry should be generated to connect keyword tag with K2 article item.
I still didn't have database course on my university so I'm kind of out of my league with this one, and it was supposed to be just a simple touchup for the site I'm developing.
Any help with this would be much appreciated, and I'm sure it'll help community later on.
Thanks!

The part in SQL is actually pretty easy. It is simplest if you have a unique index on k2_tags_xref(tagId, itemId). This has the database check for duplicates.
Then you need to do two things:
Find all items with the keyword in the title
Find the tagId for the keyword
This results in a query like this:
insert into k2_tags_xref(tagId, itemId)
select t.tagId, i.itemId
from k2_items i cross join
(select t.tagId from k2_tags where tag = 'mercedes') t
where i.title like '%mercedes%';
You will also need to put the tag into the tags table, if it is not already there. But the above query is the basis of the SQL code.

Choosing data pseudo-randomly with even distribution

I'm currently working on a medium-sized web project, and I've ran into a problem.
What I want to do is display a question, together with an image. I have a (global) list of questions, and a (global) list of images, all questions should be asked for all images.
As far as the user can see the question and image should be chosen at random. However the statistics from the answers (question/image-pair) will be used for research purposes. This means that all the question/image-pair must be chosen such that the answers will be distributed evenly across all question, and across all images.
A user should only be able to answer a specific question/image-pair one time.
I am using a mysql database and php. Currently, i have three database tables:
tbl_images (image_id)
tbl_questions (question_id)
tbl_answers (answer_id, image_id, question_id, user_id)
The other columns are not related to this specific problem.
Solution 1:
Track how many times each image/question has been used (add a column in each table). Always choose the image and question that has been asked the least.
Problem:
What I'm actually interested in is distribution among questions for an image and vice versa, not that each question is even globally.
Solution 2:
Add another table, containing all question/image-pairs along with how many times it has been asked. Choose the lowest combination (first row if count column is sorted by ascending order).
Problem:
Does not enforce that the user can only answer a question once. Also does not give the appearance that the choice is random to the user.
Solution 3:
Same as #2, but store question/image/user_id in table.
Problem:
Performance issues (?), a lot of space wasted for each user. There will probably be semi-large amounts of data (thousands of questions/images and atleast hundreds of users).
Solution 4:
Choose a question and image at true random from all available. With a large enough amount of answers they will be distributed evenly.
Problem:
If i add a new question or image they will not get more answers than the others and therefore never catch up. I want an even amount of statistics for all question/image-pairs.
Solution 5:
Weighted random. Choose a number of question/image pairs (say about 10-100) at true random and pick the best (as in, lowest global count) of these that the user has not answered.
Problem:
Does not guarantee that a recently added question or image gets a lot of answers quickly.
Solution #5 is probably the best once I've come up with so far.
Your input is very much appreciated, thank you for your time.

From what I understand of your problem, I would go with #1. However, you do not need a new column. I would create an SQL View instead becuase it sounds like you'll need to report on things like that anyway. A view is basically a cached select, but acts similar to a table. Thus you would create a view for keeping the total of each question answered for each image:
DROP VIEW IF EXISTS "main"."view_image_question_count";
CREATE VIEW "view_image_question_count" AS
SELECT a.image_id, a.question_id, SUM(b.question_id) as "total"
FROM answer AS a
INNER JOIN answer AS b ON a.question_id = b.question_id
GROUP BY a.image_id, a.question_id;
Then, you need a quick and easy way to get the next best image/question combo to ask:
DROP VIEW IF EXISTS "main"."view_next_best_question";
CREATE VIEW "view_next_best_question" AS
SELECT a.*, user_id
FROM view_image_question_count a
JOIN answer USING( image_id, question_id )
JOIN question USING(question_id)
JOIN image USING(image_id)
ORDER BY total ASC;
Now, if you need to report on your image to question performace, you can do so by:
SELECT * FROM view_image_question_count
If you need the next best image+question to ask for a user, you would call:
SELECT * FROM view_next_best_question WHERE user_id != {USERID} LIMIT 1
The != {USERID} part is to prevent getting a question the user has already answered. The LIMIT optimizes to only get one.
Disclaimer: There is probably a lot that could be done to optimize this. I just wanted to post something for thought.
Also, here is the database dump I used for testing. http://pastebin.com/yutyV2GU

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.