I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.
Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.
There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/
Related
I am creating a project management system and in need to do push notifications when an activity took place.
Question : If I do a jquery to refresh and fetch notification from mysql database, say every 30seconds, will there be a huge impact in the server? What are the minimum requirements?
So basically, I'm looking at 10 notifications/day for 20 employees.
Assuming you're talking about an AJAX request to the server in order to update DOM elements, most basic web servers would very much be able to handle a few requests every 30 seconds or so. More important is how well-optimized the server-side code that finds & returns the notifications is. Assuming you'll have a few clients requesting every 30 seconds, I would suggest making sure the code only takes a few seconds to process the request and respond with the updated data.
Which one is best when to choose from server-side or client-side?
I have a PHP function something like:
function insert(argument)
{
//do some heavy MySQL work such as sp_call
// that takes near about 1.5 seconds
}
I have to call this function about 500 times.
for(i=1;i<=500;i++)
{
insert(argument);
}
I have two options:
a) call through loop in PHP(server-side)-->server may timed out
b) call through loop in JavaScript(AJAX)-->takes a long time.
Please suggest the the best one, if there is any third one.
If I understand correctly your server still needs to do all the work, so you can't use the clients computer to lessen the power needed on your server, so you have a choice of the following:
Let the client ask the server 500 times. This will easily let you show the process for the client, giving him the satisfactory knowledge that something is happening, or
Let the server do everything to skip the 500 extra round trip times, and extra overhead needed to process the 500 requests.
I would probably go with 1 if it't important that the client don't give up early, or 2 if it's important that the job is done all the way though, as the client might stop the requests after 300.
EDIT: With regard to your comment I would then suggest having a "start work"-button on the client that tells the server to start the job. Your server then tells a background service (which can be created in php) to do the work. And it can update it's process to a file or in a database or something. Then the client and the php server is free to timeout and log out without problems. And then you can update the page to see if the work is completed in the background, which can be collected from the database or file or whatever. Then you minimize both time and dependencies.
You have not given any context for what you are trying to achieve - of key importance here are performance and whether a set of values should be treated as a single transaction.
The further the loop is from the physical storage (not just the DBMS) then the bigger the performance impact. For most web applications the biggest performance bottleneck is the network latency between the client and webserver - even if you are relatively close....say 50 milliseconds away...and have keeaplives working properly, then it will take a minimum of 25 seconds to carry out this operation for 500 data items.
For optimal performance you should be sending the data the DBMS in the least number of DML statements - you've mentioned MySQL which supports multiple row inserts and if you're using MySQLi you can also submit multiple DML statements in the same database call (although the latter just eliminates the chatter between PHP and DBMS while a single DML inserting multiple rows also reduces chatter between the DBMS and the storage). Depending on the data structure and optimiziation this should take in the region of 10s of milliseconds to insert hundreds of rows - both methods will be much, MUCH faster than having the loop running in the client even if the latency were 0.
The length of time the transaction in progress is going to determine the likelihood of the transaction failing - the faster method will therefore be thousands of times more reliable than the Ajax method.
As Krycke suggests, using the client to do some of the work will not save resource on your system - there is an additional overhead of the webserver, PHP instances and DBMS connection. Although these are relatively small, they add up quickly. If you test both approaches you will find that having the loop in PHP or in the database will result in significantly less effort and therefore greater capacity on your server.
Once I had script which was running tens of minutes. My solutions was doing long request through AJAX with timeout 1 second and checking for result in another AJAX threads. Experience for user is better than waiting too long for response from php without ajax.
$.ajax({
...
timeout: 1000
})
So Finally I Got this.
a) Use AJAX if you wanna sure that it will complete. it is also user-friendly as he gets regular responses between AJAX calls.
b) Use Server Side Script if you almost sure that server will not get it down in between and want less load on client.
Now i am using Server Side Script with a waiting message window for the user and user waits for successful submission message else he have to try again.
with a probability that it will succeed in first attempt is 90-95%.
I want to design a notification component. I want to understand what type of pulling notification methods are used out there to effectively pull the notification with minimal stress on the server.
Let's say for example I want to notify user of a chat message, I imagine I would need to pull the data quite regularly, like every 500ms for a quick response. However, doing this may overload the system. Hypothetically speaking if I have a million user browsing the site that's 2 million requests every second!
I'm thinking of writing an algorithm that will incrementally increase the pull interval by 1 second on each pull up to a maximum of 60 second. The interval will reset to 500ms if there is new data. In this way, if the user has frequent notification it will be instant. But if there hasn't been notification for a longer period of time, there maybe a bit of delay of up to a minute.
In essence I'm compromising between user experience and server load to find a middle ground for both.
Please advise on possible drawback of this approach if any. Is there a proper name for it?
Alternatively, is there a better method out there?
What you are doing is pulling or long pulling. Effectively it is not good for performance.
The alternative way is pushing (http://en.wikipedia.org/wiki/Push_technology). You push the data when there is something new.
you could use web socket it achieve this.
You could look at Apollo messaging middle-ware that have native support for websockets and good performances.
http://activemq.apache.org/apollo/
The method you are using could lead a network traffic overload on your server if there are many clients connected . Let's suppose you have 1000 clients connected : the server will have to handle 1000 different connections. A better approach is using a push notification system. Check this out https://nodejs.org/it/docs/
I currently have a PHP script that collects similar data from various sources, each data source is scraped and parsed every 120 seconds. At the moment I have 20 data sources, but I expect to integrate another 100 over the coming weeks.
Currently each data source is scraped in it's own thread, there is one main PHP script that will execute other scripts to perform the scraping work. This method allows all sources to be scraped at the same time, but it also puts a strain on the server, and a bottleneck on the database (MySQL).
I'm looking for a way to scale my current application, could I do something like this with AWS? Perhaps each of these scraping scripts could run in their own small server instance, each of these instances would be automatically created by a "main" instance and then die once the script has finished. I don't have any experience with AWS, so I'm not entirely sure if this is possible, or maybe it's just a bad idea.
The main question here is: How can I scale my current scraping script to allow for many new data sources? I'm interested in any solution even if I need to buy additional services.
You need a queueing system
You're describing a sort of worker / queue pattern, with your main server performing both the en-queueing and the worker execution, which of course is going to be a huge strain on your server.
First and foremost, your workers need to be asynchronous: you shouldn't be waiting for something that may or may not come back. You really should take a look at ZeroMQ which, I might add, contains some of the best documentation on the planet. If you're willing to learn, take a look at how this works and follow some tutorials, there are plenty out there. Have your queue taking on new jobs and dispatching others elsewhere (i.e. to other boxes) hosted on your main server.
Horizontal Scaling
You can create some sort of Instance Controller to handle AWS instances. You really just need to sit down and think about your logic (when do I want this many boxes, when do I want to shut them down). The API is pretty simple to use once you get your head around it. Here's some code I wrote a while back to wrap Amazon's SDK for PHP. I'm not sure if it's working 100% with the latest version (I used it around a year ago), but the concepts are there - you have simple methods like startBox() or stopBox() that you call from your queue, and have your box automatically start doing it's stuff once it starts up.
You could use the t1.micro instances from Amazon pricing here, which has a free tier info here up to a certain limit.
Get it working properly, with a loop on your main server deciding how many boxes you need working at any one time given certain circumstances (no. of jobs in your database table, for example), and you'll have theoretically infinite scaling. Here's how I did it for my code:
Tier 1: > 5 jobs, < 10 jobs = 1 box
Tier 2: > 10 jobs, < 20 jobs = 2 boxes
etc. etc.
Advice
Log everything. Log every box coming up, every box coming down. Calculate your costs in your code and store them, maybe in a database, or log them, so you know exactly how much you're spending - your don't want things to get out of hand.
Make sure you open up your DB ports so your instances can talk to your DB to say when a job is done or anything else you need to pass between your "master" box and your "slave" boxes.
Also, if you're paying for web servers, you'll be billed for the hour with aws, so you need to get the time you start the box, and when it's time to shut down, only actually shut it down when 55 minutes or so has passed - you might as well get those extra minutes for what you're paying.
I can't really think of anything else. Do your research, figure out the best way to build a queueing system, and build it with scalability in mind (it can react and change to numbers that you control).
Split your scraping up across multiple instances (say 5 per server) and have them talk to a central DB like Amazon RDS.
No need to kill the instances after you have finished scraping if your doing this every 120 seconds.
I have a db with over 5 million rows and for each row i have to do a http post to a server with some parameters at maximum rate of 500 connections. each post request takes 12 secs to process. so as old connections are completed i have to do new ones and maintain ~500 connection. I have to then update DB with values returned from these webcalls.
How do I make the webcalls as above?
My App is in PHP. Can I use PHP or should I switch to something else for this.
Actually you can definitely do this with PHP using a technique called long-polling. Basically how it works is the client machine pings the server and says "Do you have anything for me" and the server sees that it does not. Instead of responding it holds onto the request and responds when it has something to send.
Long polling is a method that is used by both DrupalChat and the APE project (AJAX Push Engine).
http://drupal.org/project/drupalchat
http://www.ape-project.org/
Here is some more info on push tech: http://en.wikipedia.org/wiki/Push_technology and http://en.wikipedia.org/wiki/Comet_%28programming%29
And here is a stackoverflow post about it: How do I implement basic "Long Polling"?
Now I have to say that 12 seconds is really dang long for a DB query to run. It sounds like either the query needs to be optimized or the DB does (or both). Have you normalized the database and setup good table and inter-table indexing?
Now as for preventing DB update collisions you need to use transactions (which both PostGres and newer versions of MySQL offer along with most enterprise DB systems). Transactions will allow you to rollback db changes and reserve table IDs and things like that.
http://en.wikipedia.org/wiki/Database_transaction
PHP isn't the right tool to make long-running scripts, since it by default has a maximum execution time which is pretty short. You might look into using python for this task. Also note that you can call external scripts from PHP (such as python scripts) using the system() function, if the only reason you're using PHP is to make it easy to integrate a web front-end.
However, you [b]can[/b] do this in php with a cron-job by simply having your php script only handle a single row at a time, and have the cron-job call the php script every second. Just maintain the index into the table elsewhere (either elsewhere in the DB or just write the number to a file)
If you wanted to saturate your 500 connection limit, have your script do 40 rows at a time. 40 rows / second is roughly 500 rows / 12 seconds