Twitter style trends with php/mysql - php

I am coding a social network and I need a way to list the most used trends, All statuses are stored in a content field, so what it is exactly that I need to do is match hashtag mentions such as: #trend1 #trend2 #anothertrend
And sort by them, Is there a way I can do this with MySQL? Or would I have to do this only with PHP?
Thanks in advance

The maths behind trends are somewhat complex; machine learning may be a bit over the top, but you probably need to work through some examples.
If you go with #deadtrunk's sample code, you would miss trends that have fired up in the last half hour; if you go with #eggyal's example, you miss trends that have been going strong all day, but calmed down in the last half hour.
The classic solution to this problem is to use a derivative function (http://en.wikipedia.org/wiki/Derivative); it's worth building a sample database and experimenting with this, and making your solution flexible enough to change this over time.
Whilst you want to build something simple, your users will be used to trends, and will assume it's broken if it doesn't work the way they expect.

You should probably extract the hash tags using PHP code, and then store them in your database separately from the content of the post. This way you'll be able to query them directly, rather then parsing the content every time you sort.

I think it is better to store tags in dedicated table and then perform queries on it.
So if you have a following table layout
trend | date
You'll be able to get trends using following query:
SELECT COUNT(*), trend FROM `trends` WHERE `date` = '2012-05-10' GROUP BY trend
18 test2
7 test3

Create a table that associates hashtags with statuses.
Select all status updates from some recent period - say, the last half hour - joined with the hashtag association table and group by hashtag.
The count in each group is an indication of "trend".

Related

MySql: saving date ranges VS saving single day

I am currently working on a simple booking system and I need to select some ranges and save them to a mysql database.
The problem I am facing is deciding if it's better to save a range, or to save each day separately.
There will be around 500 properties, and each will have from 2 to 5 months booked.
So the client will insert his property and will chose some dates that will be unavailable. The same will happen when someone books a property.
I was thinking of having a separate table for unavailable dates only, so if a property is booked from 10 may to 20 may, instead of having one record (2016-06-10 => 2016-06-20) I will have 10 records, one for each booked day.
I think this is easier to work with when searching between dates, but I am not sure.
Will the performance be noticeable worse ?
Should I save the ranges or single days ?
Thank you
I would advise that all "events" go into one table and they all have a start and end datetime. Use of indexes on these fields is of course recommended.
The reasons are that when you are looking for bookings and available events - you are not selecting from two different tables (or joining them). And storing a full range is much better for the code as you can easily perform the checks within a SQL query and all php code to handle events works as standard for both. If you only store one event type differently to another you'll find loads of "if's" in your code and find it harder to write the SQL.
I run many booking systems at present and have made mistakes in this area before so I know this is good advice - and also a good question.
This is too much for a comment,So I will leave this as an answer
So the table's primary key would be the property_id and the Date of a particular month.
I don't recommend it.Because think of a scenario when u going to apply this logic to 5 or 10 years system,the performance will be worse.You will get approximately 30*12*1= 360 raws for 1 year.Implement a logic to calculate the duration of a booking and add it to table against the user.

Time Prediction based on existing date:time records

I have a system that logs date:time and it returns results such as:
05.28.2013 11:58pm
05.27.2013 10:20pm
05.26.2013 09:47pm
05.25.2013 07:30pm
05.24.2013 06:24pm
05.23.2013 05:36pm
What I would like to be able to do is have a list of date:time prediction for the next few days - so a person could see when the next event might occur.
Example of prediction results:
06.01.2013 04:06pm
05.31.2013 03:29pm
05.30.2013 01:14pm
Thoughts on how to go about doing time prediction of this kind with php?
The basic answer is "no". Programming tools are not designed to do prediction. Statistical tools are designed for that purpose. You should be thinking more about R, SPSS, SAS, or some other similar tool. Some databases have rudimentary data analysis tools built-in, which is another (often inferior) option.
The standard statistical technique for time-series prediction is called ARIMA analysis (auto-regressive integrated moving average). It is unlikely that you are going to be implementing that in php/SQL. The standard statistical technique for estimating time between events is Poisson regression. It is also highly unlikely that you are going to be implementing that in php/SQL.
I observe that your data points are once per day in the evening. I might guess that this is the end of some process that runs during the day. The end time is based on the start time and the duration of the process.
What can you do? Often a reasonable prediction is "what happened yesterday". You would be surprised at how hard it is to beat this prediction for weather forecasting and for estimating the stock market. Another very reasonable method is the average of historical values.
If you know something about your process, then an average by day of the week can work well. You can also get more sophisticated, and do Monte Carlo estimates, by measuring the average and standard deviation, and then pulling a random value from a statistical distribution. However, the average value would work just as well in your case.
I would suggest that you study a bit about statistics/data mining/predictive analytics before attempting to do any "predictions". At the very least, if you really have a problem in this domain, you should be looking for the right tools to use.
As Gordon Linoff posted, the simple answer is "no", but you can write some code that will give a rough guess on what the next time will be.
I wrote a very basic example on how to do this on my site http://livinglion.com/2013/05/next-occurrence-in-datetime-sequence/
Here's a possible way that this could be done, using PHP + MySQL:
You can have a table with two fields: a DATE field and a TIME field (essentially storing the date + time portion separately). Say that the table is named "timeData" and the fields are:
eventDate: date
eventTime: time
Your primary key would be the combination of eventDate and eventTime, so that they're never repeated as a pair.
Then, you can do a query like:
SELECT eventTime, count(*) as counter FROM timeData GROUP BY eventTime ORDER BY counter DESC LIMIT 0, 10
The aforementioned query will always return the first 10 most frequent event times, ordered by frequency. You can then order these again from smallest to largest.
This way, you can return quite accurate time prediction results, which will become even more accurate as you gather data each day

Too many SQL calls on page load?

I'm constructing a website for a small collection of parents at a private daycare centre. One of the desired functions of the site is to have a calendar where you can pick what days you can be responsible for the cleaning of the locales. Now, I have made a working calendar. I found a simple script online that I modified abit to fit our purpose. Technically, it works well, but I'm starting to wonder if I really should alter the way it extracts information from the databse.
The calendar is presented monthly, and drawn as a table using a for-loop. That means that said for-loop is run 28-31 times each time the page is loaded depending on the month. To present who is responsible for cleaning each day, I have added a call to a MySQL database where each member's cleaning day is stored. The pseudo code looks like this, simplified:
Draw table month
for day=start_of_month to day=end_ofmonth
type day
select member from cleaning_schedule where picked_day=day
type member
This means that each reload of the page does at least 28 SELECT calls to the database and to me it seems both inefficient and that one might be susceptible to a DDOS-attack. Is there a more efficient way of getting the same result? There are much more complex booking calendars out there, how do they handle it?
SELECT picked_day, member FROM cleaning_schedule WHERE picked_day BETWEEN '2012-05-01' AND '2012-05-31' ORDER BY picked_day ASC
You can loop through the results of that query, each row will have a date and a person from the range you picked, in order of ascending dates.
The MySQL query cache will save your bacon.
Short version: If you repeat the same SQL query often, it will end up being served without table access as long as the underlying tables have not changed. So: The first call for a month will be ca. 35 SQL Queries, which is a lot but not too much. The second load of the same page will give back the results blazing fast from the cache.
My experience says, that this tends to be much faster than creating fancy join queries, even if that would be possible.
Not that 28 calls is a big deal but I would use a join and call in the entire month's data in one hit. You can then iterate through the MySQL Query result as if it was an array.
You can use greater and smaller in SQL. So instead of doing one select per day, you can write one select for the entire month:
SELECT day, member FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
ORDER BY day;
Then you need to pay attention in your program to handle multiple members per day. Although the program logic will be a bit more complex, the program will be faster: The interprocess or even network based communication is a lot slower than the additional logic.
Depending on the data structure, the following statement might be possible and more convenient:
SELECT day, group_concat(member) FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
GROUP BY day
ORDER BY day;
28 queries isnt a massive issue and pretty common for most commercial websites but is recommend just grabbing your monthly data by each month on one hit. Then just loop through the records day by day.

More efficient way of displaying querying a db, based on input from user

I have a database(mySQL) with a schedule for a bus. I want to be able to display the schedule based on some user inputs: route, day, and time. The bus makes at least 13 runs around the city in per day. The structure is set up as:
-Select Route(2 diff routes)
-Select Day(2 set of day, Sun-Wed & Thur-Sat)
-Select Time(atLeast 13 runs per day) = Show Schedule
My table structure is:
p_id, route day run# stop time
1 routeA m-w 1 stop1 12:00PM
1 routeA m-w 1 stop2 12:10PM
..and so on
I do have a functioning demo, however, it is very inefficient. I query the db for every possible run. I would like to avoid doing this.
Could anyone give me some tips to make this more efficient? OR show me some examples?
If you google for "bus timetable schema design" you will find lots of similar questions and many different solutions depending on the specific use case. Here is one similar question asked on here - bus timetable using SQL.
The first thing would be to normalise your data structure. There are many different approaches to this but a starting point would be something like -
routes(route_id, bus_no, route_name)
stops(stop_id, stop_name, lat/long, etc)
schedule(schedule_id, route_id, stop_id, arrive, depart)
You should do some searching and look to see the different use cases supported and how they relate to your specific scenario. The above example is only a crude example. It can be broken down further depending on the data being used. You may only want to store the time between stops in one table and then a start time for the route in another.

Allow access to a game every few hours using mysql and php without long code?

I have some games I have programmed for members to play. I usually allow members to play games every day, and I have a cronjob resetting the value at midnight to allow access to the games again the next day.
I constructed a new game, and I wanted it to allow members to play every 2 hours. I realize I cannot do this with a cronjob, because members will play the game at different times.
I know this probably sounds bad, but I'm not very familiar with timestamps, so I really don't know where to start. I haven't really had any reason to look into it until now. I'm guessing that the time or now function will accomplish this when compared, but again I cannot find the relevant situation in the manual about doing this with mysql and successfully submitting that data in the same format.
I've seen examples of other programmers doing this in a certain way, but it seemed they went to unnecessary lengths to make it work. The example I've seen would add a lot of lines to my code.
If this is a repeat question, I apologize, but I don't know what keywords to search for in this situation. All I have gotten is javascript countdowns.
Well the first thing that you're going to need is a way to determine when each member has played the game. So you will need to create a table with the following information:
MemberID
GameID (In case you want to support more than just 1 game in the future using this model)
DateTimePlayed
So now the first problem you have to solve is "Has player X played game Y in the last 2 hours?" That can be solved with a simple query:
SELECT * FROM MemberGameHistory WHERE MemberID = X and GameID = Y and DateTimePlayed > DATE_SUB(NOW(), INTERVAL 2 HOURS)
If you're happy that they haven't played it and decide to let them in, then you need to insert a row so that the next time you run the query you'll see that they've done it:
INSERT INTO MemberGameHistory (MemberID, GameID, DateTimePlayed) VALUES (X, Y, NOW())
Does that solve your problem?

Categories