I have a db with over 5 million rows and for each row i have to do a http post to a server with some parameters at maximum rate of 500 connections. each post request takes 12 secs to process. so as old connections are completed i have to do new ones and maintain ~500 connection. I have to then update DB with values returned from these webcalls.
How do I make the webcalls as above?
My App is in PHP. Can I use PHP or should I switch to something else for this.
Actually you can definitely do this with PHP using a technique called long-polling. Basically how it works is the client machine pings the server and says "Do you have anything for me" and the server sees that it does not. Instead of responding it holds onto the request and responds when it has something to send.
Long polling is a method that is used by both DrupalChat and the APE project (AJAX Push Engine).
http://drupal.org/project/drupalchat
http://www.ape-project.org/
Here is some more info on push tech: http://en.wikipedia.org/wiki/Push_technology and http://en.wikipedia.org/wiki/Comet_%28programming%29
And here is a stackoverflow post about it: How do I implement basic "Long Polling"?
Now I have to say that 12 seconds is really dang long for a DB query to run. It sounds like either the query needs to be optimized or the DB does (or both). Have you normalized the database and setup good table and inter-table indexing?
Now as for preventing DB update collisions you need to use transactions (which both PostGres and newer versions of MySQL offer along with most enterprise DB systems). Transactions will allow you to rollback db changes and reserve table IDs and things like that.
http://en.wikipedia.org/wiki/Database_transaction
PHP isn't the right tool to make long-running scripts, since it by default has a maximum execution time which is pretty short. You might look into using python for this task. Also note that you can call external scripts from PHP (such as python scripts) using the system() function, if the only reason you're using PHP is to make it easy to integrate a web front-end.
However, you [b]can[/b] do this in php with a cron-job by simply having your php script only handle a single row at a time, and have the cron-job call the php script every second. Just maintain the index into the table elsewhere (either elsewhere in the DB or just write the number to a file)
If you wanted to saturate your 500 connection limit, have your script do 40 rows at a time. 40 rows / second is roughly 500 rows / 12 seconds
Related
I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.
Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.
There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/
I need to run long (mintues to hours) matlab code on server side and send the user its progress status (0-100%). I can't send the data directly to client-side because the client may disconnect and check the status hours later.
Should I do it through the database? Thought about updating the database through matlab/php while the client side (php via javascript/ajax) can query the database every few seconds but I am afraid its very "expensive" (many read & write operations for only one user).
What should I do?
by the way, its an internal network, dozenes of users, no more.
You did not mention the kind of database you are using.
If it is mysql and since you are only in an internal network with some dozens users: yes you can use the database. If you want to keep read/write-operations low, you can use the MEMORY-Database-Engine for that purpose.
Also, you can use Memcache for interprocess-communication. One process writes into memcache, and another process reads the value out.
Which one is best when to choose from server-side or client-side?
I have a PHP function something like:
function insert(argument)
{
//do some heavy MySQL work such as sp_call
// that takes near about 1.5 seconds
}
I have to call this function about 500 times.
for(i=1;i<=500;i++)
{
insert(argument);
}
I have two options:
a) call through loop in PHP(server-side)-->server may timed out
b) call through loop in JavaScript(AJAX)-->takes a long time.
Please suggest the the best one, if there is any third one.
If I understand correctly your server still needs to do all the work, so you can't use the clients computer to lessen the power needed on your server, so you have a choice of the following:
Let the client ask the server 500 times. This will easily let you show the process for the client, giving him the satisfactory knowledge that something is happening, or
Let the server do everything to skip the 500 extra round trip times, and extra overhead needed to process the 500 requests.
I would probably go with 1 if it't important that the client don't give up early, or 2 if it's important that the job is done all the way though, as the client might stop the requests after 300.
EDIT: With regard to your comment I would then suggest having a "start work"-button on the client that tells the server to start the job. Your server then tells a background service (which can be created in php) to do the work. And it can update it's process to a file or in a database or something. Then the client and the php server is free to timeout and log out without problems. And then you can update the page to see if the work is completed in the background, which can be collected from the database or file or whatever. Then you minimize both time and dependencies.
You have not given any context for what you are trying to achieve - of key importance here are performance and whether a set of values should be treated as a single transaction.
The further the loop is from the physical storage (not just the DBMS) then the bigger the performance impact. For most web applications the biggest performance bottleneck is the network latency between the client and webserver - even if you are relatively close....say 50 milliseconds away...and have keeaplives working properly, then it will take a minimum of 25 seconds to carry out this operation for 500 data items.
For optimal performance you should be sending the data the DBMS in the least number of DML statements - you've mentioned MySQL which supports multiple row inserts and if you're using MySQLi you can also submit multiple DML statements in the same database call (although the latter just eliminates the chatter between PHP and DBMS while a single DML inserting multiple rows also reduces chatter between the DBMS and the storage). Depending on the data structure and optimiziation this should take in the region of 10s of milliseconds to insert hundreds of rows - both methods will be much, MUCH faster than having the loop running in the client even if the latency were 0.
The length of time the transaction in progress is going to determine the likelihood of the transaction failing - the faster method will therefore be thousands of times more reliable than the Ajax method.
As Krycke suggests, using the client to do some of the work will not save resource on your system - there is an additional overhead of the webserver, PHP instances and DBMS connection. Although these are relatively small, they add up quickly. If you test both approaches you will find that having the loop in PHP or in the database will result in significantly less effort and therefore greater capacity on your server.
Once I had script which was running tens of minutes. My solutions was doing long request through AJAX with timeout 1 second and checking for result in another AJAX threads. Experience for user is better than waiting too long for response from php without ajax.
$.ajax({
...
timeout: 1000
})
So Finally I Got this.
a) Use AJAX if you wanna sure that it will complete. it is also user-friendly as he gets regular responses between AJAX calls.
b) Use Server Side Script if you almost sure that server will not get it down in between and want less load on client.
Now i am using Server Side Script with a waiting message window for the user and user waits for successful submission message else he have to try again.
with a probability that it will succeed in first attempt is 90-95%.
I have a php script which I use to make about 1 mil. requests every day to a specific web service.
The problem is that in a "normal" workflow the script is working almost the whole day to complete the job .
Therefore I've worked on an additional component. Basically I developed a script which access the main script using multi-curl GET request to generates some random tempid for each 500 records and finally makes another multi-curl request using POST with all the generated tempids.
However I don't feel this is the right way so I would like some advice/solutions to add multithreading capabilities to the main script without to use additional /external applications (e.g the curl script that I'm currently using).
Here is the main script : http://pastebin.com/rUQ6pwGS
If you want to do it right you should install a message queue. My preference goes out to redis because it is a "data structure server since keys can contain strings, hashes, lists, sets and sorted sets". Also redis is extremely fast.
Using the blpop(spawning a couple of worker threads using php <yourscript> to process work concurrently) to listen for new messages(work) and rpush to push new messages onto the queue. Spawning processes is expensive(relative) and when using a message queue this has to be done only once when the process is created.
I would go for phpredis if you could(need to be to recompile PHP) because it is an extension written in C and therefor going to be a lot faster than the pure PHP clients. Else PRedis is also pretty mature library you could use.
You could also use this brpop/rpush as some sort of lock(if you need to). This is because:
Multiple clients can block for the
same key. They are put into a queue,
so the first to be served will be the
one that started to wait earlier, in a
first-BLPOP first-served fashion.
I would advise you to have a look at Simon's redis tutorial to get an impression of the sheer power that redis has to offer.
This is background process, correct? In which case, you should not run it via a web server. Run it from the command-line, either as a daemon or as a cron job.
My preference is a "cron" job because you get automatic restart for free. Be sure that you don't have more instances of the program running than desired (You can achieve this by locking a file in the filesystem, doing something atomic in a database etc).
Then you just need to start the number of processes you want, and have them read work from a queue.
Normally the pattern for doing this is having a table containing columns to store who is currently excuting a given task:
CREATE TABLE sometasks (
ID of some kind,
Other info required to do task,
some data we need to know if the task is due yet or complete,
locked_by_host VARCHAR(64) NULL,
locked_by_pid INT NULL
)
Then the process will do the following pseduo-query to lock a set of tasks (batch_size is how many per batch, can be 1)
UPDATE sometasks SET locked_by_host=my_hostname, locked_by_pid=my_pid
WHERE not_done_already AND locked_by_host IS NULL ORDER BY ID LIMIT batch_size
Then select the rows back out using a select to find the current process's tasks. Then process the tasks, and update them as being "done" and clear out the lock.
I'd opt for a cron job with a controller process which starts up N child processes and monitors them. The child processes could periodically die (remember PHP does not have good GC, so it can easily leak memory) and be respawned to prevent resource leaks.
If the work is all done, the parent could quit, and wait to be respawned by cron (the next hour or something).
NB: locked_by_host can store the host name (pids aren't unique in different hosts) to allow for distributed processing, but maybe you don't need that, so you can omit it.
You can make this design more robust by putting a locked_time column and detecting when a task has been taking too long - you can alert, kill the process, and try again or something.
I have a basic HTML file, using jQuery's ajax, that is connecting to my polling.php script every 2 seconds.
The polling.php simply connections to mysql, checks for ID's newer than my hidden, stored current ID, and then echo's if there is anything new. Since the javascript is connecting every 2 seconds, I am getting thousands of connections in TIME_WAIT, just for my client. This is because my script is re-connecting to MySQL over and over again. I have tried mysql_pconnect but it didn't help any.
Is there any way I can get PHP to open 1 connection, and continue to query using it? Instead of reconnecting every single time and making all these TIME_WAIT connections. Unsure what to do here to make this work properly.
I actually ended up doing basic Long Polling. I made a simple PHP script to to an infinite while loop, and it queries every 2 seconds. If it finds something new, it echoes it out, and breaks the loop. My jquery simply ajax connects to it, and waits for a reponse; on reponse, it updates my page, and restarts the polling. Very simple!
PS, the Long Polling method also reduces browser memory issues, as well as drastically reduces the TIME_WAIT connections on the server.
There's no trivial way of doing this, as pconnect doesn't work across multiple web page calls. However, some approaches to minimise the database throughput would be:
Lower the polling time. (2 seconds is perhaps a bit excessive?)
Have a "master" PHP script that runs every 'n' seconds, extracts the data from the database and saves it in the appropriate format (serialised PHP array, XML, HTML data, etc.) in the filesystem. (I'd recommend writing to a temp file and then renaming over the existing one to minimise any partial file collection issues.) The Ajax requested PHP page would then simply use the information in this data file.
In terms of executing the master PHP script, you could either use cron or simply let the user who first requests the page when the contents of file is deemed too stale. (You could use the data file's timestamp for this purpose via the filemtime function.) I'd personally use the latter approach, as cron is overkill for this purpose.
You could take this even further and use memcached instead of a flat file, etc. if so required. (That said, it would perhaps be an over-complex solution at this stage of affairs.)