Real time activity feed - code / platform implementation?

Real time activity feed - code / platform implementation? - php

I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.

Check out http://www.stream-hub.com/

I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.

Related

How can handle thousands of requests per second using php and mysql?

I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.

Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).

Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.

Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.

Adopting my current system to include real time notifications

I have a PHP system, that does everything a social media platform does, i.e. add comments, upload images, add objects, logins, sessions etc. Storing all interactions in a MySQL database. So i've got a pretty good infrastructure to build on.
The next stage of my project is to develop the system so that notifications are sent to the "Networks" of "Contacts", which are associated with one and other. Such as the notifications system like Facebook. i.e. Chris has just commented on object N.
I'm looking at implementing this system for a lot of users: 10,000+, so it has to be reliable. I've researched the Facebook integration, resulting in techniques such as memcache, sockets & hashing.
Are they're any systems that can be easily adapted to this functionality, as I could do with a quick, reliable implementation.
P.s one thought I had was just querying the database every 5 seconds for e.g. "Select everything that has happened in the last 5 seconds" using jQuery, Ajax & PHP, but thats stupid, it would exhaust the server & database right?
I've seen this website & this article, can anyone reflect on this to tell me what is the best approach as I am hesitant about which path to follow.
Thanks

This is not possible with just pure vanilla PHP/MySQL. What you can do is set MySQL triggers (http://dev.mysql.com/doc/refman/5.0/en/triggers.html) on your data, and then use the UDF function sys_exec (Which you will need to install on your MySQL server) to run the notification php script. See this post: Invoking a PHP script from a MySQL trigger
If you can get this set up it should be pretty reliable and fast.

How does Facebook notify and instantly shows new comments or how does Stackoverflow do it?

I am a PHP developer and the title basically says it all. However I was hoping on some more in-depth information as I am starting to get confused about how the flow for the project I work on should go.
For an (web) application I need to implement a feature like Facebook does it with notifying users about replies/comments and instantly showing these.
I figured I could use long-polling with ajax requests but this does not seem to be a nice solution as the notifications never really are instant and it is resource heavy.
So I should use some form of sockets if I understand correctly, and Node.Js would be a good choice. So based on the last assumption I now get confused about the work flow.
I thought about two possible solutions:
1) It seems to me, that if I would use Node.Js I could skip using PHP at all and base the application on Node.js only.
2) Or I could use PHP as a base and only use Node.js for notifying users and instantly showing messages but saving the data using PHP and Mysql.
These two possibilities confuse me and I can't make up my mind about what would be the "best" and cleanest way.
I do not have much experience in Node.js, played with it for a while. But managing and saving data seems to be hard in Node.js so that is why I came up with option 2.
I know Facebook is build on PHP so I am assuming that they save the data via PHP and notify / instantly show replies and comments via Node.
Could someone help me out on this?
Thanks in advance!
EDIT:
I just noticed, Stackoverflow does something similar. I get a notification in the upper left, and below my question a box with "new answer to this question". I am really interested in the technologie(s) used.

Well you could use node.js for the notifications and PHP for your app.
By googling I found this about real-time-notifications.
You could also just use node.js with socket.io, but this means that you have to learn new technologies as you mention that you have no experience with node.
I haven't used it but you could check this project, for websockets in PHP.
When you have an update that you want to notify users you can use the publish subscriber pattern to notify the intrested in this update.
Take a look in Gearman too.
Personally, I've built a notification system using the pubsub mechanism of redis, with node.js+socket.io. Everytime that there is an update on a record then there is a publish on the appropriate channel. If the channel has listeners then they will be notified. I also store the last 20 notifications in a Redis list.
The appplication is built in PHP. The notification system is built in node.js. They are different applications that see the same data. The communication occurs via redis. For example in the Facebook context:
1) A user updates his status.
2) PHP stores this to the database and Redis
3) Redis knows that this update must publish to the status channel of the specific user and it does.
4) All the friends of the specific user are listening to his status channel (here comes node.js)
5) Node.js pushes the notification in the browser with socket.io
As for facebook, I have read in an article that is using long polling for supporting older browsers. Not sure for this though, needs citation...

AFAIK It would be via two simple methods :
First one that could be very simple is adding a Boolean column to each record that determines if it has been notified or not.
The second method is creating a table to insert all notifications.
However, I'm not sure if there are alternative methods for better performance, But first method is what I do commonly myself. But I think Facebook is using 2nd method, because it has to notify each one to a lot of users.
Your question maybe dublicate of:
Facebook like notifications tracking (DB Design)
Database design to store notifications to users

You could use Server Side Events it involves a bit of JavaScript but nothing overly complicated I think.
The main bulk of this method is PHP though, so you would just use the PHP to query your DB for notifications and SSE will push them to the user.
It does have some limitations though, most notably it's not supported by IE (huge surprise) thought i'd mention it anyway to let you know of other possibilities.
Hope this helps

Push notification to the client browser

I'd like to create an application where when a Super user clicks a link the users should get a notification or rather a content like a pdf for them to access on the screen.
Use Case: When a teacher wants to share a PDF with his students he should be able to notify his students about the pdf available for download and a link has to be provided to do the same.

There are several ways you can accomplish this. The most supported way is through a technique called Comet or long-polling. Basically, the client sends a request to the server and the server doesn't send a response until some event happens. This gives the illusion that the server is pushing to the client.
There are other methods and technologies that actually allow pushing to the client instead of just simulating it (i.e. Web Sockets), but many browsers don't support them.

As you want to implement this in CakePHP (so I assume it's a web-based application), the user will have to have an 'active' page open in order to receive the push messages.
It's worth looking at the first two answers to this, but also just think about how other sites might achieve this. Sites like Facebook, BBC, Stackoverflow all use techniques to keep pages up to date.
I suspect Facebook just uses some AJAX that runs in a loop/timer to periodically pull updates in a way that would make it look like push. If the update request is often enough (short time period), it'll almost look realtime. If it's a long time period it'll look like a pull. Finding the right balance between up-to-dateness and browser/processor/network thrashing is the key.
The actual request shouldn't thrash the system, but the reply in some applications may be much bigger. In your case, the data in each direction is tiny, so you could make the request loop quite short.
Experiment!

Standard HTTP protocol doesn't allow push from server to client. You can emulate this by using for example AJAX requests with small interval.

Have a look at php-amqplib and RabbitMQ. Together they can help you implement AMQP (Advanced Message Queuing Protocol). Essentially your web page can be made to update by pushing a message to it.
[EDIT] I recently came across Pusher which I have implemented for a project. It is a HTML5 WebSocket powered realtime messaging service. It works really well and has a free bottom tier plan. It's also extremely simple to implement.

Check out node.js in combination with socket.io and express. Great starting point here

Libraries and pseudocode for physical Dashboard/Status board

OK, so I bought a 46" screen for the office yesterday, and with the imminent risk of being accused for setting up an "elaborate World Cup procrastination scheme", I'd better show my colleagues what it's meant for ;)
Looking at my simple sketch, and at these great projects from which I was inspired, I would like to get some input on the following:
Pseudocode for the skeleton: As some methods should be called every 24 hours ("Today's date in the heading"), others at 60 second intervals ("Twitter results"), what would be a good approach using JavaScript (jQuery) and PHP?
EDIT: Alsciende: I can agree that #1 and #8 are too vague. Therefore I remove #8 and try to clarify #1: With "Pseudocode for the skeleton", I basically mean could this be done entirely using JavaScript timers and how would you set up the various timers?
Library for Google Analytics: Which libraries support the Google Analytics API and can produce neat charts. Preferably HTML5, JavaScript-based like Protovis.
Library for Twitter: Which libraries would you recommend for fetching twitter search results and latest tweets from profiles.
Libraries for Typography/CSS/HTML5: Trying to learn some HTML5 etc. in the process, please advice on any other typography/css libraries that could be of relevance.
Scraping/Parsing? I'll give you a concrete example: Trying to fetch today's menu from this restaurant's website, how would you go about? (it's in Swedish - but you get the point - sorry ;) )
Real-time stats? I'm using the WassUp-plugin for WordPress to track real-time visitors on our website. Other logging software (AWStats etc.) is probably also installed on the webserver. Any ideas on how to extract information from these and present in real-time on the dashboard?
Browser choice? Which Browser and OS would you pick? Stable, Full-screen, HTML5.
alt text http://www.freeimagehosting.net/uploads/cb7af2ef28.png

I've built a dashboard similar to what you're talking about for our office. I spent about a day working on it, the possibilities are really (pretty much) endless. Basically, all the computational stuff I handle via PHP and do interval AJAX calls to the appropriate PHP script, which returns JSON data to present.
#2:
For graphs, I use/recommend flot (http://code.google.com/p/flot/). The documentation isn't really that great, but once you figure out how things generally work - it's a great library, and it generates graphs using HTML5 Canvas tag.
I've not integrated external libraries with Google Analytics before, but I assume you could pull data from analytics and format it for flot to build appropriate graphs for. This might be the hard-way around, but I'm more familiar with flot than most other graphing libraries (and it doesn't suck, as much as a lot of others do) so for me, this would be the easiest way to get it done.
#3:
For twitter, it's pretty easy to pull data from their search API using JSON-P. Basically what this does, is dynamically adds a <script> tag to your DOM, which GET parameters that twitter interprets, then calls a predefined javascript method (which you pass via the URI) with a json-encoded hash of the results.
#5:
Scraping and parsing individual sites is going to be a painstaking process. Every site is going to have it's own "pattern" (or non-pattern) for publishing their daily menu or specials. I'd build a "menu" script which knew how to call a few functions, and write a function / class for scraping each restaurant's site which you're interested in showing the menu for in PHP (or whatever other language you're comfortable with). It can reply with json, which is (imo) the easiest way to manipulate/process data in Javascript.
#6:
Real time stats are pretty much the same as #5. I'd build a couple classes that knew how to fetch the statistics from whatever data sources I was interested in pulling from, and present the data in json to javascript, via an ajax call.
#1: Writing javascript code to load data on a timer is really simple, look into the setInterval, clearInterval, setTimeout, and clearTimeout methods. They all take a function name (or closure) and a timeout to wait before calling that function (in ms). You could easily call a master timer function every 60 seconds which would basically be a "scheduler" or a "cron" style function, which would just look for the stuff that needed to run "right now" and execute those functions from the scheduler.
Hopefully this gives you some ideas on where to go, and how to go about getting there.

For the Rails Rumble we developed Boarrd that is exactly what you would like to develop!
We were impressed by Panic too :)
In our team page on RailsRumble you'll find details about the tools used. I know that is not in PHP but maybe you'll try our tool and decide for better development environment ;)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.