I want to design a notification component. I want to understand what type of pulling notification methods are used out there to effectively pull the notification with minimal stress on the server.
Let's say for example I want to notify user of a chat message, I imagine I would need to pull the data quite regularly, like every 500ms for a quick response. However, doing this may overload the system. Hypothetically speaking if I have a million user browsing the site that's 2 million requests every second!
I'm thinking of writing an algorithm that will incrementally increase the pull interval by 1 second on each pull up to a maximum of 60 second. The interval will reset to 500ms if there is new data. In this way, if the user has frequent notification it will be instant. But if there hasn't been notification for a longer period of time, there maybe a bit of delay of up to a minute.
In essence I'm compromising between user experience and server load to find a middle ground for both.
Please advise on possible drawback of this approach if any. Is there a proper name for it?
Alternatively, is there a better method out there?
What you are doing is pulling or long pulling. Effectively it is not good for performance.
The alternative way is pushing (http://en.wikipedia.org/wiki/Push_technology). You push the data when there is something new.
you could use web socket it achieve this.
You could look at Apollo messaging middle-ware that have native support for websockets and good performances.
http://activemq.apache.org/apollo/
The method you are using could lead a network traffic overload on your server if there are many clients connected . Let's suppose you have 1000 clients connected : the server will have to handle 1000 different connections. A better approach is using a push notification system. Check this out https://nodejs.org/it/docs/
Related
I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.
Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.
There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/
I currently have a PHP script that collects similar data from various sources, each data source is scraped and parsed every 120 seconds. At the moment I have 20 data sources, but I expect to integrate another 100 over the coming weeks.
Currently each data source is scraped in it's own thread, there is one main PHP script that will execute other scripts to perform the scraping work. This method allows all sources to be scraped at the same time, but it also puts a strain on the server, and a bottleneck on the database (MySQL).
I'm looking for a way to scale my current application, could I do something like this with AWS? Perhaps each of these scraping scripts could run in their own small server instance, each of these instances would be automatically created by a "main" instance and then die once the script has finished. I don't have any experience with AWS, so I'm not entirely sure if this is possible, or maybe it's just a bad idea.
The main question here is: How can I scale my current scraping script to allow for many new data sources? I'm interested in any solution even if I need to buy additional services.
You need a queueing system
You're describing a sort of worker / queue pattern, with your main server performing both the en-queueing and the worker execution, which of course is going to be a huge strain on your server.
First and foremost, your workers need to be asynchronous: you shouldn't be waiting for something that may or may not come back. You really should take a look at ZeroMQ which, I might add, contains some of the best documentation on the planet. If you're willing to learn, take a look at how this works and follow some tutorials, there are plenty out there. Have your queue taking on new jobs and dispatching others elsewhere (i.e. to other boxes) hosted on your main server.
Horizontal Scaling
You can create some sort of Instance Controller to handle AWS instances. You really just need to sit down and think about your logic (when do I want this many boxes, when do I want to shut them down). The API is pretty simple to use once you get your head around it. Here's some code I wrote a while back to wrap Amazon's SDK for PHP. I'm not sure if it's working 100% with the latest version (I used it around a year ago), but the concepts are there - you have simple methods like startBox() or stopBox() that you call from your queue, and have your box automatically start doing it's stuff once it starts up.
You could use the t1.micro instances from Amazon pricing here, which has a free tier info here up to a certain limit.
Get it working properly, with a loop on your main server deciding how many boxes you need working at any one time given certain circumstances (no. of jobs in your database table, for example), and you'll have theoretically infinite scaling. Here's how I did it for my code:
Tier 1: > 5 jobs, < 10 jobs = 1 box
Tier 2: > 10 jobs, < 20 jobs = 2 boxes
etc. etc.
Advice
Log everything. Log every box coming up, every box coming down. Calculate your costs in your code and store them, maybe in a database, or log them, so you know exactly how much you're spending - your don't want things to get out of hand.
Make sure you open up your DB ports so your instances can talk to your DB to say when a job is done or anything else you need to pass between your "master" box and your "slave" boxes.
Also, if you're paying for web servers, you'll be billed for the hour with aws, so you need to get the time you start the box, and when it's time to shut down, only actually shut it down when 55 minutes or so has passed - you might as well get those extra minutes for what you're paying.
I can't really think of anything else. Do your research, figure out the best way to build a queueing system, and build it with scalability in mind (it can react and change to numbers that you control).
Split your scraping up across multiple instances (say 5 per server) and have them talk to a central DB like Amazon RDS.
No need to kill the instances after you have finished scraping if your doing this every 120 seconds.
I have an APNS notification server sent up, which would in theory every day send about 50,000 to 100,000 users a processed notification (based on the amount of users of our web app that ties in with the iOS app).
The notification would go out around 2, but it must send it to each user individually (using Urban Airship) and is triggered by curl on a cron job.
It iterates through each user and has to use an HTML scraper (simple_html_dom to be exact) which takes about 5-10s per user, and is obviously very memory intensive. A simple GET request cant be the right way to come about doing this, in fact im positive it will fail. What is the best way to handle this long, memory intensive task on a cron job?
If You will reuse same variables or set ones You are not going to use any more to null You won't run out memory.
Just don't load all data at once and free it(set to null) or replace with new data after You process it.
And make sure You can't improve speed of Your task 5-10s sounds really long.
I'm doing an auction script and time syncing between visitors and the server is necessary (when will the auction end). Every time a user bids, auction end time is extended for a few seconds. My problem is that several users are complaining about their timers skipping (some seconds) and figured out that it is because of a high latency connection.
My current algorithm has a javascript function that runs every second, getting time left for the auction through ajax requests. Is there a better way to approach this, especially for high latency users, to prevent the timer skipping problem?
Adaptive intervals
First of all, I would suggest that you decrease the amount of polling. I don't know about your server implementation, but the current setup will create a lot of requests once you have a couple of users.
I would suggest that you adjust the polling interval depending on how much time is left. If there are two hours left until the end of an auction, we might not really care if the additional seconds are only fetched from the server every minute, right? You could do it like this
pollingInterval = secondsLeft / 100
The interval is shorter and the result is more accurate towards the end of the auction.
Server Sent Events
For the last minute or so, when you want a high accuracy, regular polling at short intervals is not the best solution, as discussed in the comments. Long polling is an option, but you should also look into HTML5 Server Sent Events, which is like a native browser implementation of long polling. There's a good introduction and comparison to Websockets. Browser support is already pretty good, there's a polyfill for unsupported browsers which falls back to...polling.
Have you looked into long polling? Use you could use a jquery/javascript countdown clock and then just change the countdown time whenever a new bid is placed. Should cut your ajax calls drastically.
javascript function that runs every second
This the old way to do what you want.
I think you need to use web-sockets to ensure real-time delivery for all users.
If you want to save time you can use any web-socket servers available instead of making it yourself.
I prefer Real-Time Pusher
It's easy and you can use it free but with a limited number of users. Also you can upgrade for more users.
www.pusher.com
Also, have good API documentation to help you to implement what you want fast and easy.
For any help with Pusher-or-websockets feel free to ask.
I have a service, where I need to ask 40 external services (API's) to get information from them, by each user request. For example one user is searching for some information and my service is asking 40 external partners to get the information, aggregates it in one DB (mysql) and displays the result to the user.
At this moment I have a multicurl solution, where I have 10 partner request at one time and if someone parnter is done with the request, then the software is adding another partner from the remaining 30 to the queue of multicurl, until all the 40 request are done and the results are in the DB.
The problem on this solution, is that it can not scale on many servers and I want to have some solution, where I can fire 40 request at one time for example divided on 2-3 servers and wait only so long, as the slowest partner delivers the results ;-) What means, that if the slowest partner tooks 10 seconds I will have the result of all 40 partners in 10 seconds. On multicurl I come in troubles, when there are more then 10-12 requests at one time.
What kind of solution, can you offer me, what i getting as low as possible ressources and can run many many process on one server and be scalable. My software is on PHP written, that mean I need an good connect to the solution with framework or API.
I hope you understand my problem and need. Please ask, if something is not clear.
One possible solution would be to use a message queue system like beanstalkd, Apache ActiveMQ, memcacheQ etc.
A high level example would be:
User makes request to your service for information
Your service adds the requests to the queue (presumably one for each of the 40 services you want to query)
One or more job servers continuously poll the queue for work
A job server gets a message from the queue to do some work, adds the data to the DB and deletes the item from the queue.
In this model, since now the one task of performing 40 requests is distributed and is no longer part of one "process", the next part of the puzzle will be figuring out how to mark a set of work as completed. This part may not be that difficult or maybe it introduces a new challenge (depends on the data and your application). Perhaps you could use another cache/db row to set a counter to the number of jobs a particular request needs in order to complete and as each queue worker finishes a request, it can reduce the counter by 1. Once the counter is 0, you know the request has been completed. But when you do that you need to make sure the counter gets to 0 and doesn't get stuck for some reason.
That's one way at least, hope that helps you a little or opens the door for more ideas.