Creating a feed for a social networking site - php

I am making a social networking site and can't decide what the best way to get data from various tables is to display in a feed. Everytime something is stored it has a timestamp stored against it so I was wondering the best way to retrieve data from various different tables ordered by timestamp, and limited to 20 results per page. Ideally I would like mysql to query all of the different tables and order and limit it for me but because the different tables are not all neccessarily related and different data needs to be returned depending on what the table is for I don't think this is going to be possible. I can query each table individually of course but then how do I sort and order all of the information into pages so that all of the different entities are in one ordered list together. The server side language I use is PHP with the codeigniter framework.
Anyone got any ideas?

Can you establish a common format for what's returned out of all the separate tables? So for example, you would write a query that got back FeedTitle, FeedSummary, and Timestamp:
select top 20 *
from (
select a.Title as FeedTitle,
a.A + a.B + a.C as FeedSummary,
a.Timestamp as TimeStamp
from a
union all
select b.Name + ' married ' + b.Spouse as FeedTitle,
b.AtPlace as FeedSummary,
b.TimeStamp as TimeStamp
from b
) as allFeeds
order by TimeStamp desc
Not sure on the exact my-sql syntax, this will work in SQL Server and should be very similar. It's just pseudocode anyway, the idea is that you'd do some of your application logic in the database in order to hopefully gain a performance boost (so you don't have to sort through lots of data in PHP).
Another approach would be to return the last 20 from each table and let the client side sort through them. So send them all to the UI and let jQuery code display the top 20, then let the users select the type of feed dynamically, and they'd see the top 20 stories in any one type or any combination of types.

Related

mysql querying a database

I have a question as to a better way of doing this as I have a very large database with a lot symbols. "Hence a, aa... etc"
I would like to know if I can actually query every table also desc order would be nice. In one line. Otherwise I will have to type thousands of unions and it will be a pain later as the database will be changed often. As a table is erased and another joins it place.
Every table has the Date column and would like to search based on a date.
Thank you in advance.
I.E.
SELECT * from a where Date = '2017-07-31' union
SELECT * from aa where Date = '2017-07-31' union
SELECT * from aaap where Date = '2017-07-31' union
SELECT * from aabvf where Date = '2017-07-31' union
I mean, you COULD....
SELECT * FROM a,aa,aaap,aabvf WHERE date='2017-07-21'
Ahmed helped me out. As to why my data structure is like that. Well. If you have better suggested I'm opening to it. So.
Why.
Basically I have data in the form of symbols
I.E. A, AA that are stock tickers
They have dates that are unique keys to open, high, low, various other stock measurements.
So why I would want to grab just a single date. It's basically the top date or "today" to display and chart. So I can do various other things with the data.
If you have another method of storing I'm open.
I written a java program (not normally a web developer) that mines the data and that form and stores how I suggested. Which I could change, if you have a better way. I would love to hear. Also. If you have opinion on how to store data faster with MySQL I would love to hear. Currently I have few hundred threads that basically store data. Each thread handles a symbol. It creates a table if it doesn't exist with the ticker name and puts its data in columns separated date (unique key) open, high, etc... also various other operations the incoming data and stores that. Thank you for the answer and thank you if you have a better method !
Ps sorry I didn't mean chart. I display the top date as a table with corresponding data attached!

MySQL performance issue with large tables

I've been asked to develop a web software able to store some reading data from heat metering device and to divide the heat expenses among all the flat owner. I chose to work in php with MySQL engine MyISAM.
I was not used to work with large data, so i simply created a logical database where we have:
a table for building, with an id as primary key indexed (now we have ~1200
buildings in the db)
a table with all the flats in all the buildings, with an id as primary key indexed and the building_id to link to the building (around 32k+ flats in total)
a table with all the heaters in all the flats, with an id as primary key indexed and the flat_id to link to the flat (around 280k+ heaters)
a table with all the reading value, with the timestamp of the reading, an id as primary key and the heater_id to link to the heater (around 2.7M+ reading now)
There is also a separate table, linked to the building, where are stored the starting date and the end date between which the division of expenses have to be done.
When it is necessary to get all the data from a building, the approach i used is to get raw data from DB with single query, elaborate in php, than make the next query.
So here is roughly the operation sequence i used:
get the starting and end date from the specific table with a single query
store the dates in a php variable
get all the flats of the building: SELECT * FROM flats where building_id=my_building_id
parse all the data in php with a php while cycle
on each step of the while cycle i make a query getting all the heaters of that specific flat: SELECT * FROM heaters where flat_id=my_flat_id
parse all the data of the heaters with a php while cycle
on each step of this inner while cycle i'll get the last reading value of that specific heater: SELECT * FROM reading_values where heater_id=my_heater_id AND data<my_data
Now the problem is that i have serious performance issue.
Before someone point it out, i cannot get only reading value jumping all the first 6 steps of the list above, since i need to print bills and on each bill i have to write all flat information and all heaters information, so i have to get all the flats and heaters data anyway.
So I'd like some suggestions on how to improve script performance:
all the tables are indexed, but i have to add some index somewhere else?
would using a single query with subquery instead of several one among php code improve performance?
any other suggestions?
I haven't inserted specific code as i think it would have made the question too heavy, but if asked i could insert some.
Some:
Don't use 'SELECT *' if you can avoid it -> Just get the fields you really need
I didn't test it in your particular case, but usually a single query which joins all three tables should achieve much better performance rather than looping through results with php.
If you need to loop for some reason, then at least use mysql prepared statements, which again should increase performance given the amount of queries :)
Hope it helps!
Regards
EDIT:
just to exemplify an alternative query, not sure if this suits your specific needs and without testing it (which probably means I forgot something):
SELECT
a.field1,
b.field2,
c.field3,
d.field4
FROM heaters a
JOIN reading_values b ON (b.heater_id = a.heater_id)
JOIN flats c ON (c.flat_id = a.flat_id)
JOIN buildings d ON (d.building_id = c.building_id)
WHERE
a.heater_id = my_heater_id
AND b.date < my_date
GROUP BY a.heater_id
EDIT 2
Following your comments, I modified the query so that it retrieves the information as you want it: Given a building id, it will list all the heaters and their newest reading value according to a given date:
SELECT
a.name,
b.name,
c.name,
d.reading_value,
d.created
FROM buildings a
JOIN flats b ON (b.building_id = a.building_id)
JOIN heaters c ON (c.flat_id = b.flat_id)
JOIN reading_values d ON (d.reading_value_id = (SELECT reading_value_id FROM reading_values WHERE created <= my_date AND heater_id = c.heater_id ORDER BY created DESC LIMIT 1))
WHERE
a.building_id = my_building_id
GROUP BY c.heater_id
It should be interesting to know how it performs in your environment.
Regards

Show relationship using two table JOIN, or use PHP functions?

I'm making a micro-blogging website. The users can follow each other. I've to make stream of posts (activity stream) for the current user ( $userid ) based on the users the current user is following, like in Twitter. I know two ways of implementing this. Which one is better?
Tables:
Table: posts
Columns: PostID, AuthorID, TimeStamp, Content
Table: follow
Columns: poster, follower
The first way, by joining these two tables:
select `posts`.* from `posts`,`follow` where `follow`.`follower`='$userid' and
`posts`.`AuthorID`=`follow`.`poster` order by `posts`.`postid` desc
The second way is by making an array of users the $userid is following (posters), then doing php implode on this array, and then doing where in:
One thing I'll like to tell here that I'm storing the the number of users a user is following in the `following` record of the `user` table, so here I'll use this number as a limit when extracting the list of posters - the 'followingList':
function followingList($userid){
$listArray=array();
$limit="select `following` from `users` where `userid`='$userid' limit 1";
$limit=mysql_query($limit);
$limit=mysql_fetch_row($limit);
$limit= (int) $limit[0];
$sql="select `poster` from `follow` where `follower`='$userid' limit $limit";
$result=mysql_query($sql);
while($data = mysql_fetch_row($result)){
$listArray[] = $data[0];
}
$posters=implode("','",$listArray);
return $posters;
}
Now I've a comma separated list of user IDs the current $userid is following.And now selecting the posts to make the activity stream:
$posters=followingList($userid);
$sql = "select * from `posts` where (`AuthorID` in ('$posters'))
order by `postid` desc";
Which of the two methods is better?
And can knowing the total number of following (number of users the current user is following), make things faster in the first method as it's doing in the second method?
Any other better method?
You should go all the way with the first option. Always try as much as possible to process the data on the mysql server instead of in your PHP code. PHP will not implicitly cache the results of the operations while MySQL will do it.
The most important thing is to make sure you index your data correctly. Try using "EXPLAIN" statements to make sure you have optimized your database as much as possible and use #1 to link your data together.
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This will allow you later to compute statistics also, while the second method requires you to process a part of the statistics.
The first important point is that PHP is good at building pages but very bad are managing data, everything manipulated by PHP will fill the memory and no special behavior can be applied in PHP to prevent using to much memory, except crashing.
On the other side the datatase job is to analyse relation between the tables, real number used by the query (cardinality of indexes and statictics on rows and index usage in fact), and a lot of different mechanism can be choosen by the engine depending on the size of data (merge joins, temporary tables, etc). That means you could have 256.278.242 posts and 145.268 users, with 5.684 average followers the datatabase job would be to find the fastest way to give you an answer. Well, when you hit really big numbers you'll see that all databases are not equal, but that's another problem.
On the PHP side Retrieving the list of users from the fisrt query coudl became very long (with a big number of followed users, let's say 15.000. Simply building the query string with 15 000 identifiers inside would take a quite big amount a memory. Trasnferring this new query to the SQL server would also be slow. It's definitively the wrong way.
Now be careful of the way you build your SQL request. A request is something you should be able to read from the top to the end, explaining what you really want. This will help the SQL (good) engine in choosing the right solution.
select `posts`.*
from `posts`
INNER JOIN `follow` ON posts`.`AuthorID`=`follow`.`poster`
where `follow`.`follower`='#userid'
order by `posts`.`postid` desc
LIMIT 15
Several remarks:
I have used an INNER JOIN.I want an INNER JOIN, let's write it, it will be easier to read for me later and it should be the same for the query analyser.
if #userid is an int do not use quotes. Please use ints for identifiers (this is really faster than strings). And on the PHP side cast the int "SELECT ..." . (int) $user_id ." ORDER ... or use query with parameters (This is for security).
I have used a LIMIT 15, maybe an offset could be used as well, if you want to show some pagination control around the posts. Let's say this query will retrieve 15.263 documents from my 5.642 folowwed users, you do not want, and the user do not want, to show theses 15.263 documents on a web page. And knowing with $limit that the number is 15.263 is a good thing but certainly not for a request limit. You know this number, but the database may know it as well if it has a good query analyser and some good internal statistics.
The request limit has several goals
1. Limit the size of data transfered from the database to your PHP script
2. Limit the memory usage of your PHP script (an array with 15.263 documents containg some HTMl stuff... ouch)
3. Limit the size of the final user output (and get a faster response)

Are database queries for everyone in a user list too much?

I am currently using MySQL and MyISAM.
I have a function of which returns an array of user IDs of either friends or users in general in my application, and when displaying them a foreach seemed best.
Now my issue is that I only have the IDs, so I would need to nest a database call to get each user's other info (i.e. name, avatar, other fields) based on the user ID in the loop.
I do not expect hundreds of thousands of users (mainly for hobby learning), although how should I do this one, such as the flexibility of placing code in a foreach for display, but not relying on ID arrays so I am out of luck to using a single query?
Any general structures or tips on what I can display the list appropriately with?
Is my amount of queries (1:1 per users in list) inappropriate? (although pages 0..n of users, 10 at a time make it seem not as bad I just realize.)
You could use the IN() MySQL method, i.e.
SELECT username,email,etc FROM user_table WHERE userid IN (1,15,36,105)
That will return all rows where the userid matches those ID's. It gets less efficient the more ID's you add but the 10 or so you mention should be just fine.
Why couldn't you just use a left join to get all the data in 1 shot? It sounds like you are getting a list, but then you only need to get all of a single user's info. Is that right?
Remember databases are about result SETS and while generally you can return just a single row if you need it, you almost never have to get a single row then go back for more info.
For instance a list of friends might be held in a text column on a user's entry.
Whether you expect to have a small database or large database, I would consider using the InnoDB engine rather than MyISAM. It does have a little higher overhead for processing than MyISAM, however you get all the added benefits (as your hobby grows) including JOIN, which will allow you to pull in specific data from multiple tables:
SELECT u.`id`, p.`name`, p.`avatar`
FROM `Users` AS u
LEFT JOIN `Profiles` AS p USING `id`
Would return id from Users and name and avatar from Profiles (where id of both tables match)
There are numerous resources online talking about database normalization, you might enjoy: http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/

SQL query to collect entries from different tables - need an alternate to UNION

I'm running a sql query to get basic details from a number of tables. Sorted by the last update date field. Its terribly tricky and I'm thinking if there is an alternate to using the UNION clause instead...I'm working in PHP MYSQL.
Actually I have a few tables containing news, articles, photos, events etc and need to collect all of them in one query to show a simple - whats newly added on the website kind of thing.
Maybe do it in PHP rather than MySQL - if you want the latest n items, then fetch the latest n of each of your news items, articles, photos and events, and sort in PHP (you'll need the last n of each obviously, and you'll then trim the dataset in PHP). This is probably easier than combining those with UNION given they're likely to have lots of data items which are different.
I'm not aware of an alternative to UNION that does what you want, and hopefully those fetches won't be too expensive. It would definitely be wise to profile this though.
If you use Join in your query you can select datas from differents tables who are related with foreign keys.
You can look of this from another angle: do you need absolutely updated information? (the moment someone enters new information it should appear)
If not, you can have a table holding the results of the query in the format you need (serving as cache), and update this table every 5 minutes or so. Then your query problem becomes trivial, as you can have the updates run as several updates in the background.

Categories