I'm building a cron job that does the following:
1. Get records from DB
2. For each record fire a curl request to an API. (some requests are quick and some are uploading large images or videos).
3. If a request is not successful, create a new request with slightly different parameters (still based on the record) and send it again. This can happen several times.
4. On successful request do some DB select/inserts (based on the original record that caused sending this request).
Sending the requests should happen in parallel as some take minutes (large uploads) and some are very quick.
What would be most appropriate to do this - having a master script that gets the records from the DB and creates a process for each record to handle calling the API and parsing the response? Or using curl_multi to send multiple requests at the same time from the same script and parse each one as it returns?
If using multiple processes what would be the best way to do this - PCNTRL, popen, etc.?
If using curl_multi how would I know which DB record corresponds to which returning request?
EDIT: If using curl multi I'd probably employ this techique: http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
so that it wouldn't wait for all requests to complete before I start processing the responses.
Thanks!
I had a similar issue once processing a large dataset.
The simplest answer for me was to make 4 separate scripts, each written to take a specific fourth of the db columns involved and in my case do processing or in your case curl requests. This would prevent a big request on one of the processes from locking up the others.
In contrast a single script using curl_multi is still going to lock on a large request, it would just allow you to queue up multiple at once.
Optimally I'd instead write this in a language with native support for multithreading so you could have things happening concurrently without resorting to hacks but thats understandably not always an option.
In the end I went with multiprocessing using PCNTRL (with limiting the number of concurent processes). Seemed to me that curl_multi won't scale for thousands of requests.
Related
I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.
Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.
There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/
I have created a "boggle"-like game for personal programming practice/portfolio.
I found a free API where I can verify words.
My question: if 3 players each have 15-20 words and a script starts running the calls to the api (it's an unlimited use API as far as I can tell from research), then is there a "guarantee" that every call will run? How does php compare to JS's promise/asyncronous style? Is there anything to worry about with a lot of cUrls in a row? How many requests/responses can an instance of php handle at one time?
PHP code runs asynchronously, if you are using standard curl_exec(), then it will only process one request at a time, and the only limit for a single script is how long the calls take, and the configured time limit.
If you are using curl_multi_exec() then you can make asynchronous requests, and there is theoretically no limit, but it is dependent on a number of other factors, such as available bandwidth etc, limits of number of network connections and/or open files on your system.
Some relevant info here:
libcurl itself has no particular limits, if you're referring to amount of
concurrent transfer/handles or so. Your system/app may have a maximum amount
of open file handles that'll prevent you from adding many thousands. Also,
when going beyond a few hundred handles the regular curl_multi_perform()
approach start to show that it isn't suitable for many transfers and you
should rather switch to curl_multi_socket() - which I unfortunately believe
the PHP binding has no support for.
Which one is best when to choose from server-side or client-side?
I have a PHP function something like:
function insert(argument)
{
//do some heavy MySQL work such as sp_call
// that takes near about 1.5 seconds
}
I have to call this function about 500 times.
for(i=1;i<=500;i++)
{
insert(argument);
}
I have two options:
a) call through loop in PHP(server-side)-->server may timed out
b) call through loop in JavaScript(AJAX)-->takes a long time.
Please suggest the the best one, if there is any third one.
If I understand correctly your server still needs to do all the work, so you can't use the clients computer to lessen the power needed on your server, so you have a choice of the following:
Let the client ask the server 500 times. This will easily let you show the process for the client, giving him the satisfactory knowledge that something is happening, or
Let the server do everything to skip the 500 extra round trip times, and extra overhead needed to process the 500 requests.
I would probably go with 1 if it't important that the client don't give up early, or 2 if it's important that the job is done all the way though, as the client might stop the requests after 300.
EDIT: With regard to your comment I would then suggest having a "start work"-button on the client that tells the server to start the job. Your server then tells a background service (which can be created in php) to do the work. And it can update it's process to a file or in a database or something. Then the client and the php server is free to timeout and log out without problems. And then you can update the page to see if the work is completed in the background, which can be collected from the database or file or whatever. Then you minimize both time and dependencies.
You have not given any context for what you are trying to achieve - of key importance here are performance and whether a set of values should be treated as a single transaction.
The further the loop is from the physical storage (not just the DBMS) then the bigger the performance impact. For most web applications the biggest performance bottleneck is the network latency between the client and webserver - even if you are relatively close....say 50 milliseconds away...and have keeaplives working properly, then it will take a minimum of 25 seconds to carry out this operation for 500 data items.
For optimal performance you should be sending the data the DBMS in the least number of DML statements - you've mentioned MySQL which supports multiple row inserts and if you're using MySQLi you can also submit multiple DML statements in the same database call (although the latter just eliminates the chatter between PHP and DBMS while a single DML inserting multiple rows also reduces chatter between the DBMS and the storage). Depending on the data structure and optimiziation this should take in the region of 10s of milliseconds to insert hundreds of rows - both methods will be much, MUCH faster than having the loop running in the client even if the latency were 0.
The length of time the transaction in progress is going to determine the likelihood of the transaction failing - the faster method will therefore be thousands of times more reliable than the Ajax method.
As Krycke suggests, using the client to do some of the work will not save resource on your system - there is an additional overhead of the webserver, PHP instances and DBMS connection. Although these are relatively small, they add up quickly. If you test both approaches you will find that having the loop in PHP or in the database will result in significantly less effort and therefore greater capacity on your server.
Once I had script which was running tens of minutes. My solutions was doing long request through AJAX with timeout 1 second and checking for result in another AJAX threads. Experience for user is better than waiting too long for response from php without ajax.
$.ajax({
...
timeout: 1000
})
So Finally I Got this.
a) Use AJAX if you wanna sure that it will complete. it is also user-friendly as he gets regular responses between AJAX calls.
b) Use Server Side Script if you almost sure that server will not get it down in between and want less load on client.
Now i am using Server Side Script with a waiting message window for the user and user waits for successful submission message else he have to try again.
with a probability that it will succeed in first attempt is 90-95%.
Certain pages on my site are populated via numerous ajax calls, often 20+ ajax calls running asynchronously. The problem I'm having is that each call is creating a new mysql_connection, causing me to receive the 'max_user_connections' error, which I'm told is capped off at 20. I've tried turning async off, but as the JQuery documentation states, this will often freeze up the page. I have considered turning this into 1 ajax call and letting php perform the loop, but the idea was to display each element as it becomes available (some of the data is gathered from external sources, thus the timing can vary)
Is this something that can be fixed with mysql_pconnect?
Is this something that can be fixed with mysql_pconnect?
No - you need to either:
1) increase the number of connections mysql will handle (not recommended - not a scalable solution without replication)
2) improve the speed of the ajax call handler (e.g. query tuning, database/opcode/http level caching)
3) decrease the frequency of the ajax calls - e.g. by skiping the call if the last one occurred within 500msec
20+ Ajax calls are a concern even in absence of MySQL issue. Browser would not make these 20 calls simultaneously since it caps max http connection (to a single domian) to 2,4 or 8.
I would combine these requests and bring number of Ajax calls down to 6 - 8. As you mentioned, rest of calls get data from external sources. Do these calls talk to MySQL as well?
In addition, these things might help:
Can you cache some of things (E.g.
username , user full name) in http session
Can you cache some data out of MySQL into memcached or even a file ? When data becomes available, cache can be updated.
I have a db with over 5 million rows and for each row i have to do a http post to a server with some parameters at maximum rate of 500 connections. each post request takes 12 secs to process. so as old connections are completed i have to do new ones and maintain ~500 connection. I have to then update DB with values returned from these webcalls.
How do I make the webcalls as above?
My App is in PHP. Can I use PHP or should I switch to something else for this.
Actually you can definitely do this with PHP using a technique called long-polling. Basically how it works is the client machine pings the server and says "Do you have anything for me" and the server sees that it does not. Instead of responding it holds onto the request and responds when it has something to send.
Long polling is a method that is used by both DrupalChat and the APE project (AJAX Push Engine).
http://drupal.org/project/drupalchat
http://www.ape-project.org/
Here is some more info on push tech: http://en.wikipedia.org/wiki/Push_technology and http://en.wikipedia.org/wiki/Comet_%28programming%29
And here is a stackoverflow post about it: How do I implement basic "Long Polling"?
Now I have to say that 12 seconds is really dang long for a DB query to run. It sounds like either the query needs to be optimized or the DB does (or both). Have you normalized the database and setup good table and inter-table indexing?
Now as for preventing DB update collisions you need to use transactions (which both PostGres and newer versions of MySQL offer along with most enterprise DB systems). Transactions will allow you to rollback db changes and reserve table IDs and things like that.
http://en.wikipedia.org/wiki/Database_transaction
PHP isn't the right tool to make long-running scripts, since it by default has a maximum execution time which is pretty short. You might look into using python for this task. Also note that you can call external scripts from PHP (such as python scripts) using the system() function, if the only reason you're using PHP is to make it easy to integrate a web front-end.
However, you [b]can[/b] do this in php with a cron-job by simply having your php script only handle a single row at a time, and have the cron-job call the php script every second. Just maintain the index into the table elsewhere (either elsewhere in the DB or just write the number to a file)
If you wanted to saturate your 500 connection limit, have your script do 40 rows at a time. 40 rows / second is roughly 500 rows / 12 seconds