Need guidance with efficient method to fetch all the news - php

Let's assume a user is following thousands of other people,
These people send news regularly, and in his/her page, our user wants to see the recent news (paginated) from these people only.
What is the most efficient way to do this?
This is what I'm doing currently:
Create a table called following in database, each follow is added here, id, user_id, following_user_id
Get a list of user's following_user_ids
fetch all news WHERE user_id (news poster id) is IN(...following_user_ids...)
For example if our user's id is 1:
SELECT `following_user_id` FROM `following` WHERE `user_id` = 1; /* This is used in the IN() below */
SELECT * FROM `news` WHERE `user_id` IN (4,11,7,...following_user_ids....) ORDER BY `id` DESC limit 50 offset 0
/* Of course the `user_id` is indexed in the `news` table */
But if the user is following thousands of people and the news table is huge, I'm assuming the IN (... thousands of IDs ...) will be very slow?
So, is there a more efficient way to do this?
EDIT:
In case any one also has this issue, just stick with the IN method, it is a lot faster than JOIN in my case.

select
news.*
from
news
join following on news.user_id=following.following_user_id
where
following.user_id=1

Pagination
OFFSET has a problem. As he pages forward/backward and others are inserting new rows, he will miss stories or see the same story twice on consecutive pages.
The solution is to "remember where you left off". More: http://mysql.rjweb.org/doc.php/pagination
JOIN
The JOIN approach is cleaner, but not necessarily faster. In either case, the end result is a large list of stories, of which he is only interested in a page's worth. Shoveling the rest around is costly.
The fix for this is to find only the ids of the stories while finding the page's worth. Then look up (via another JOIN) the rest of the data for each story.
Prebuilt list
Still, if there are thousands of followed people (or millions of followers, in the case of Trump), it gets quite costly. There is a technique for making the SELECT faster at the cost of INSERTs needing to run around and store information.
Have a new 3-column table: (1) follower_id, (2) timestamp, (3) story_id. Whenever a story is posted, one row per follower is added to this table. When a follower wants the latest stories, it is sitting right in this table (or at least the ids are).
More: http://mysql.rjweb.org/doc.php/lists

You can limit your search by using the 'LIMIT' function, that will need to be updated everytime the user want more information:
LIMIT [offset,] row_count;
Putting it in your example would be something like this, saving this select in a temporary table variable:
SELECT * FROM `following_user_ids` ORDER BY `id` DESC limit rowcount offset offset_variable;
If you want to put in the example of a social media, you can update the limit everytime the user asks for more posts, so that the user will be able to see the posts of several of the he follows.

Related

Show relationship using two table JOIN, or use PHP functions?

I'm making a micro-blogging website. The users can follow each other. I've to make stream of posts (activity stream) for the current user ( $userid ) based on the users the current user is following, like in Twitter. I know two ways of implementing this. Which one is better?
Tables:
Table: posts
Columns: PostID, AuthorID, TimeStamp, Content
Table: follow
Columns: poster, follower
The first way, by joining these two tables:
select `posts`.* from `posts`,`follow` where `follow`.`follower`='$userid' and
`posts`.`AuthorID`=`follow`.`poster` order by `posts`.`postid` desc
The second way is by making an array of users the $userid is following (posters), then doing php implode on this array, and then doing where in:
One thing I'll like to tell here that I'm storing the the number of users a user is following in the `following` record of the `user` table, so here I'll use this number as a limit when extracting the list of posters - the 'followingList':
function followingList($userid){
$listArray=array();
$limit="select `following` from `users` where `userid`='$userid' limit 1";
$limit=mysql_query($limit);
$limit=mysql_fetch_row($limit);
$limit= (int) $limit[0];
$sql="select `poster` from `follow` where `follower`='$userid' limit $limit";
$result=mysql_query($sql);
while($data = mysql_fetch_row($result)){
$listArray[] = $data[0];
}
$posters=implode("','",$listArray);
return $posters;
}
Now I've a comma separated list of user IDs the current $userid is following.And now selecting the posts to make the activity stream:
$posters=followingList($userid);
$sql = "select * from `posts` where (`AuthorID` in ('$posters'))
order by `postid` desc";
Which of the two methods is better?
And can knowing the total number of following (number of users the current user is following), make things faster in the first method as it's doing in the second method?
Any other better method?
You should go all the way with the first option. Always try as much as possible to process the data on the mysql server instead of in your PHP code. PHP will not implicitly cache the results of the operations while MySQL will do it.
The most important thing is to make sure you index your data correctly. Try using "EXPLAIN" statements to make sure you have optimized your database as much as possible and use #1 to link your data together.
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This will allow you later to compute statistics also, while the second method requires you to process a part of the statistics.
The first important point is that PHP is good at building pages but very bad are managing data, everything manipulated by PHP will fill the memory and no special behavior can be applied in PHP to prevent using to much memory, except crashing.
On the other side the datatase job is to analyse relation between the tables, real number used by the query (cardinality of indexes and statictics on rows and index usage in fact), and a lot of different mechanism can be choosen by the engine depending on the size of data (merge joins, temporary tables, etc). That means you could have 256.278.242 posts and 145.268 users, with 5.684 average followers the datatabase job would be to find the fastest way to give you an answer. Well, when you hit really big numbers you'll see that all databases are not equal, but that's another problem.
On the PHP side Retrieving the list of users from the fisrt query coudl became very long (with a big number of followed users, let's say 15.000. Simply building the query string with 15 000 identifiers inside would take a quite big amount a memory. Trasnferring this new query to the SQL server would also be slow. It's definitively the wrong way.
Now be careful of the way you build your SQL request. A request is something you should be able to read from the top to the end, explaining what you really want. This will help the SQL (good) engine in choosing the right solution.
select `posts`.*
from `posts`
INNER JOIN `follow` ON posts`.`AuthorID`=`follow`.`poster`
where `follow`.`follower`='#userid'
order by `posts`.`postid` desc
LIMIT 15
Several remarks:
I have used an INNER JOIN.I want an INNER JOIN, let's write it, it will be easier to read for me later and it should be the same for the query analyser.
if #userid is an int do not use quotes. Please use ints for identifiers (this is really faster than strings). And on the PHP side cast the int "SELECT ..." . (int) $user_id ." ORDER ... or use query with parameters (This is for security).
I have used a LIMIT 15, maybe an offset could be used as well, if you want to show some pagination control around the posts. Let's say this query will retrieve 15.263 documents from my 5.642 folowwed users, you do not want, and the user do not want, to show theses 15.263 documents on a web page. And knowing with $limit that the number is 15.263 is a good thing but certainly not for a request limit. You know this number, but the database may know it as well if it has a good query analyser and some good internal statistics.
The request limit has several goals
1. Limit the size of data transfered from the database to your PHP script
2. Limit the memory usage of your PHP script (an array with 15.263 documents containg some HTMl stuff... ouch)
3. Limit the size of the final user output (and get a faster response)

get content from all dbs in a single query

I have primarily 3 major tables called news,albums and videos. I want to create a facebook wall kind of page where in all the updates from all the three tables would appear sorted by posted time in descending order.
Is it possible to make this kind of call in a single query to db.
i will explain briefly my tables
news has id,title,content,timestamp
albums has id,title,albumdirectory,timestamp
videos has id,title,youtubelink,timestamp.
If not possible what would be the best way to do it.
Querying all three tables at the same time for this purpose will be not a good practice. You can create a feed table. and insert reference ids from all other tables you want i.e (news,albums,videos) and with the date of that field. Now you can query the feed table and put a join to other three tables on the basis of that reference id in that table and display them according to date in the feed table. I'm using this approach and this is working good for me.
Hope this helps.
It depends on how that data is designed. If it is all related using some shared ID, you can make a single join query to get all the data. If that data is not related, you will need to make 3 separate calls.
If the info you want on each entity shares the same structure (i.e. id, title, timestamp) then you can do this with a UNION
SELECT * FROM (
SELECT CONCAT('news','-',id),title,`timestamp`
FROM news
UNION
SELECT CONCAT('albums','-',id),title,`timestamp`
FROM albums
UNION
SELECT CONCAT('videos','-',id),title,`timestamp`
FROM videos
) AS all_items
ORDER BY `timestamp`
If the id fields are unique across the database (rather than just within each table) then you can remove the CONCATs and just return the ids.

Are database queries for everyone in a user list too much?

I am currently using MySQL and MyISAM.
I have a function of which returns an array of user IDs of either friends or users in general in my application, and when displaying them a foreach seemed best.
Now my issue is that I only have the IDs, so I would need to nest a database call to get each user's other info (i.e. name, avatar, other fields) based on the user ID in the loop.
I do not expect hundreds of thousands of users (mainly for hobby learning), although how should I do this one, such as the flexibility of placing code in a foreach for display, but not relying on ID arrays so I am out of luck to using a single query?
Any general structures or tips on what I can display the list appropriately with?
Is my amount of queries (1:1 per users in list) inappropriate? (although pages 0..n of users, 10 at a time make it seem not as bad I just realize.)
You could use the IN() MySQL method, i.e.
SELECT username,email,etc FROM user_table WHERE userid IN (1,15,36,105)
That will return all rows where the userid matches those ID's. It gets less efficient the more ID's you add but the 10 or so you mention should be just fine.
Why couldn't you just use a left join to get all the data in 1 shot? It sounds like you are getting a list, but then you only need to get all of a single user's info. Is that right?
Remember databases are about result SETS and while generally you can return just a single row if you need it, you almost never have to get a single row then go back for more info.
For instance a list of friends might be held in a text column on a user's entry.
Whether you expect to have a small database or large database, I would consider using the InnoDB engine rather than MyISAM. It does have a little higher overhead for processing than MyISAM, however you get all the added benefits (as your hobby grows) including JOIN, which will allow you to pull in specific data from multiple tables:
SELECT u.`id`, p.`name`, p.`avatar`
FROM `Users` AS u
LEFT JOIN `Profiles` AS p USING `id`
Would return id from Users and name and avatar from Profiles (where id of both tables match)
There are numerous resources online talking about database normalization, you might enjoy: http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/

using joins or multiple queries in php/mysql

Here i need help with joins.
I have two tables say articles and users.
while displaying articles i need to display also the user info like username, etc.
So will it be better if i just use joins to join the articles and user tables to fetch the user info while displaying articles like below.
SELECT a.*,u.username,u.id FROM articles a JOIN users u ON u.id=a.user_id
OR can this one in php.
First i get the articles with below sql
SELECT * FROM articles
Then after i get the articles array i loop though it and get the user info inside each loop like below
SELECT username, id FROM users WHERE id='".$articles->user_id."';
Which is better can i have explanation on why too.
Thank you for any reply or views
There is a third option. You could first get the articles:
SELECT * FROM articles
Then get all the relevant user names in one go:
SELECT id, username FROM users WHERE id IN (3, 7, 19, 34, ...)
This way you only have to hit the database twice instead of many times, but you don't get duplicated data. Having said that, it seems that you don't have that much duplicated data in your queries anyway so the first query would work fine too in this specific case.
I'd probably choose your first option in this specific case because of its simplicity, but if you need more information for each user then go with the third option. I'd probably not choose your second option as it is neither the fastest nor the simplest.
It depends how much data the queries are returning - if you'll be getting a lot of duplicate data (i.e. one user has written many articles) you are better off doing the queries separately.
If you don't have a lot of duplicated data, joins are always preferable as you only have to make one visit to the database server.
The first approach is better if applicable/possible:
SELECT a.*, u.username, u.id
FROM articles a
JOIN users u ON u.id = a.user_id
You have to write less code
There is no need to run multiple queries
Using joins is ideal when possible
Get the articles with one query, then get each username once and not every time you display it (cache them in an array or whatever).

What is the best approach to list a user's recent activities in PHP/MySQL?

I want to list the recent activities of a user on my site without doing too many queries. I have a table where I list all the things the user did with the date.
page_id - reference_id - reference_table - created_at - updated_at
The reference_id is the ID I need to search for in the reference_table (example: comments). If I would do a SELECT on my activity table I would then have to query:
SELECT * FROM reference_table where id = reference_id LIMIT 1
An activity can be a comment, a page update or a subscription. Depending which one it is, I need to fetch different data from other tables in my database
For example if it is a comment, I need to fetch the author's name, the comment, if it is a reply I need to fetch the orignal comment username, etc.
I've looked into UNION keyword to union all my tables but I'm getting the error
1222 - The used SELECT statements have a different number of columns
and it seems rather complicated to make it work because the amount of columns has to match and none of my table has the same amount of tables and I'm not to fond of create column for the fun of it.
I've also looked into the CASE statement which also requires the amount of columns to match if I remember correctly (I could be wrong for this one though).
Does anyone has an idea of how I could list the recent activities of a user without doing too many queries?
I am using PHP and MySQL.
You probably want to split out the different activities into different tables. This will give you more flexiblity on how you query the data.
If you choose to use UNION, make sure that the you use the same number of columns in each select query that the UNION is comprised of.
EDIT:
I was down-voted for my response, so perhaps I can give a better explanation.
Split Table into Separate Tables and UNION
I recommended this technique, because it will allow you to be more explicit about the resources for which you are querying. Having a single table for inserting is convenient, but you will always have to do separate queries to join with other tables to get meaningful information. Also, you database schema will be obfuscated by a single column being a foreign key for different tables depending on the data stored in that row.
You could have tables for comment, update and subscription. These would have their own data which could be queried on individually. If, say, you wanted to look at ALL user activity, you could somewhat easily use a UNION as follows:
(SELECT 'comment', title, comment_id AS id, created FROM comment)
UNION
(SELECT 'update', title, update_id as id, created FROM update)
UNION
(SELECT 'subscription', title, subscription_id as id, created
FROM subscription)
ORDER BY created desc
This will provide you with a listing view. You could then link to the details of each type or load it on an ajax call.
You could accomplish this with the method that you are currently using, but this will actually eliminate the need for the 'reference_table' and will accomplish the same thing in a cleaner way (IMO).
The problem is that UNION should be used just to get similar recordsets together. If you try to unify two different queries (for example, with different columns being fetched) it's an error.
If the nature of the queries is different (having different column count, or data types) you'll need to make several different queries and treat them all separately.
Another approach (less elegant, I guess) would be LEFT JOINing your activities table with all the others, so you'll end up with a recordset with a lot of columns, and you'll need to check for each row which columns should be used depending on the activity nature.
Again, I'd rather stick with the first one, since the second procudes a rather sparse recorset.
With UNION you don't have to get all of the columns from each table, just as long as all of the columns have the same datatypes.
So you could do something like this:
SELECT name, comment as description
FROM Comments
UNION
SELECT name, reply as description
FROM Replies
And it wouldn't matter if Comments and Replies have the same number of columns.
This really depends on the amount of traffic on your site. The union approach is a straightforward and possibly the correct one, logically, but you'll suffer on the performance if your site is heavily loaded since the indexing of a UNIONed query is hard.
Joining might be good, but again, in terms of performance and code clarity, it's not the best of ways.
Another totally different approach is to create an 'activities' table, which will be updated with activity (in addition to the real activity, just for this purpose). In old terms of DB correctness, you should avoid this approach since it will create duplicate data on your system, I, however, found it very useful in terms of performance.
[Another side note about the UNION approach if you decide to take it: if you have difference in parameters length, you can SELECT bogus parameters on some of the unions, for example.. (SELECT UserId,UserName FROM users) UNION (SELECT 0,UserName from notes)

Categories