Reduce cURL Processor Usage

Reduce cURL Processor Usage - php

I'm doing this certain task that involves sending 6 sets of 8 requests each per user, and a total of about 2000 users. It's a bunch of GET requests, used to send commands.
To speed up the sending, I've constructed 4 curl multi-handles, each holding 8 requests, firing them off one after the other, and then continuing on with the next user. Slight problem of it eating 99% of my CPU, and eating only about 5kb per second on my bandwidth. There's no leaks or anything, but when sending 96000 requests, it lags big time, taking up about a good 3 hours on my dual core AMD Phenom.
Are there any methods I can possible speed this up? Using file_get_contents() instead of cURL ends up being 50% slower. But cURL uses only 5 kbps, and eats my CPU out.

Have you tried using fopen() for your requests instead of curl? This could also be putting a load on where ever you are sending the requests? It won't return until the web server finishes the request. Do you need the data back to present the user, if not, can you run the queries in the background? The real question is why are you sending so many request, and it would be far better to consolidate them into fewer requests. You have a lot of variables in this setup that can contribute to the speed.

Related

Is it just fine to POST data to Laravel every second?

I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.

Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.

There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/

Waiting for multiple cUrls from PHP and when is too much?

I have created a "boggle"-like game for personal programming practice/portfolio.
I found a free API where I can verify words.
My question: if 3 players each have 15-20 words and a script starts running the calls to the api (it's an unlimited use API as far as I can tell from research), then is there a "guarantee" that every call will run? How does php compare to JS's promise/asyncronous style? Is there anything to worry about with a lot of cUrls in a row? How many requests/responses can an instance of php handle at one time?

PHP code runs asynchronously, if you are using standard curl_exec(), then it will only process one request at a time, and the only limit for a single script is how long the calls take, and the configured time limit.
If you are using curl_multi_exec() then you can make asynchronous requests, and there is theoretically no limit, but it is dependent on a number of other factors, such as available bandwidth etc, limits of number of network connections and/or open files on your system.
Some relevant info here:
libcurl itself has no particular limits, if you're referring to amount of
concurrent transfer/handles or so. Your system/app may have a maximum amount
of open file handles that'll prevent you from adding many thousands. Also,
when going beyond a few hundred handles the regular curl_multi_perform()
approach start to show that it isn't suitable for many transfers and you
should rather switch to curl_multi_socket() - which I unfortunately believe
the PHP binding has no support for.

does the increase in subdomains help to load the multiple ajax finish faster?

I have a doubt regarding the multiple ajax calls.
Consider I have 100 ajax calls to make. If I used a single sub domain it is taking 30 sec to finish. But if I use 2 sub domains it is taking 20 secs & if I use 3 sub domains it is taking 18 secs.
All the Ajax calls are dynamic. The time to finish a call is a max of 3 sec.
Each call need to communicate with db. Previously I had a single db for all the 3 sub-domains. Now I created 3 different databases.
My concern is to get them finished in 10 secs.
Any suggestions please.
KR

If the sub-domains are served by the same Apache server the performance will be a bit slower or almost the same because the Apache needs to serve more virtual hosts.
So the right choice would be to group your requests into one or use WebSocket to communicate with the server "real-time"

The problem you're describing is the limit of simultaneous connections that browser opens per single hostname. If you have many calls to one server, some of them will need to wait before the others are finished (which causes delays). If you distribute resources between servers you get around this per-server-limit and they run simultaneously. However for small amount of data it is usually wiser to just merge requests together and send as one package, as otherwise you loose time for each request to get back and forth + repeating useless headers, opening connection.
Check here for the current limits per browser. They might be nto strick implementing those limits, though.
http://www.browserscope.org/?category=network

PHP Multi curl or multi threading

I'm building a cron job that does the following:
1. Get records from DB
2. For each record fire a curl request to an API. (some requests are quick and some are uploading large images or videos).
3. If a request is not successful, create a new request with slightly different parameters (still based on the record) and send it again. This can happen several times.
4. On successful request do some DB select/inserts (based on the original record that caused sending this request).
Sending the requests should happen in parallel as some take minutes (large uploads) and some are very quick.
What would be most appropriate to do this - having a master script that gets the records from the DB and creates a process for each record to handle calling the API and parsing the response? Or using curl_multi to send multiple requests at the same time from the same script and parse each one as it returns?
If using multiple processes what would be the best way to do this - PCNTRL, popen, etc.?
If using curl_multi how would I know which DB record corresponds to which returning request?
EDIT: If using curl multi I'd probably employ this techique: http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
so that it wouldn't wait for all requests to complete before I start processing the responses.
Thanks!

I had a similar issue once processing a large dataset.
The simplest answer for me was to make 4 separate scripts, each written to take a specific fourth of the db columns involved and in my case do processing or in your case curl requests. This would prevent a big request on one of the processes from locking up the others.
In contrast a single script using curl_multi is still going to lock on a large request, it would just allow you to queue up multiple at once.
Optimally I'd instead write this in a language with native support for multithreading so you could have things happening concurrently without resorting to hacks but thats understandably not always an option.

In the end I went with multiprocessing using PCNTRL (with limiting the number of concurent processes). Seemed to me that curl_multi won't scale for thousands of requests.

PHP and CPU - Process of chat + notifications

My site has a PHP process running, for each window/tab open, that runs in a maximum of 1 minute, and it returns notifications/chat messages/people online or offline. When JavaScript gets the output, it calls the same PHP process again and so on.
This is like Facebook chat.
But, seems it is taking too much CPU when it is running. Have you something in mind how Facebook handles this problem? What do they do so their processes don't take too much CPU and put their servers down?
My process has a "while(true)", with a "sleep(1)" at the end. Inside the cycle, it checks for notifications, checks if one of the current online people got offline/changed status, reads unread messages, etc.
Let me know if you need more info about how my process works.
Does calling other PHPs from "system()" (and wait for its output) alleviate this?
I ask this because it makes other processes to check notifications, and flushes when finished, while the main PHP is just collecting the results.
Thank you.

I think your main problem here is the parallelism. Apache and PHP do not excell at tasks like this where 100+ Users have an open HTTP-Request.
If in your while(true) you spend 0.1 second on CPU-bound workload (checking change status or other useful things) and 1 second on the sleep, this would result in a CPU load of 100% as soon as you have 10 users online in the chat. So in order so serve more users with THIS model of a chat you would have to optimize the workload in your while(true) cycle and/or bring the sleep interval from 1 second to 3 or higher.
I had the same problem in a http-based chat system I wrote many years ago where at some point too many parallel mysql-selects where slowing down the chat, creating havy load on the system.
What I did is implement a fast "ring-buffer" for messages and status information in shared memory (sysv back in the day - today I would probably use APC or memcached). All operations write and read in the buffer and the buffer itself gets periodicaly "flushed" into the database to persist it (but alot less often than once per second per user). If no persistance is needed you can omit a backend of course.
I was able to increase the number of user I could serve by roughly 500% that way.
BUT as soon as you solved this isse you will be faced with another: Available System Memory (100+ apache processes a ~5MB each - fun) and process context switching overhead. The more active processes you have the more your operating system will spend on the overhead involved with assigning "fair enough" CPU-slots AFAIK.
You'll see it is very hard to scale efficently with apache and PHP alone for your usecase. There are open source tools, client and serverbased to help though. One I remember places a server before the apache and queues messages internally while having a very efficent multi-socket communication with javascript clients making real "push" events possible. Unfortunatly I do not remember any names so you'll have to research or hope on the stackoverflow-community to bring in what my brain discarded allready ;)
Edit:
Hi Nuno,
the comment field has too few characters so I reply here.
Lets get to the 10 users in parallel again:
10*0.1 second CPU time per cycle (assumed) is roughly 1s combined CPU-time over a period of 1.1 second (1 second sleep + 0.1 second execute). This 1 / 1.1 which I would boldly round to 100% cpu utilization even though it is "only" %90.9
If there is 10*0.1s CPU time "stretched" over a period of not 1.1 seconds but 3.1 (3 seconds sleep + 0.1 seconds execute) the calculation is 1 / 3.1 = %32
And it is logical. If your checking-cycle queries your backend three times slower you have only a third of the load on your system.
Regarding the shared memory: The name might imply it but if you use good IDs for your cache-areas, like one ID per conversation or user, you will have private areas within the shared memory. Database tables also rely on you providing good IDs to seperate private data from public information so those should be arround allready :)
I would also not "split" any more. The fewer PHP-processes you have to "juggle" in parallel the easier it is for your systems and for you. Unless you see it makes absolutly sense because one type of notification takes alot more querying ressources than another and you want to have different refresh-times or something like that. But even this can be decided in the whyile cycle. users "away"-status could be checked every 30 seconds while the messages he might have written could get checked every 3. No reason to create more cycles. Just different counter variables or using the right divisor in a modulo operation.
The inventor of PHP said that he believes man is too limited to controll parallel processes :)
Edit 2
ok lets build a formula. We have these variables:
duration of execution (e)
duration of sleep (s)
duration of one cycle (C)
number of concurrent users (u)
CPU load (l)
c=e+s
l=ue / c #expresses "how often" the available time-slot c fits into the CPU load generated by 30 CONCURRENT users.
l=ue / (e+s)
for 30 users ASSUMING that you have 0.1s execution time and 1 second sleep
l=30*0.1 / (0.1 + 1)
l=2.73
l= %273 CPU utilization (aka you need 3 cores :P)
exceeding capab. of your CPU measn that cycles will run longer than you intend. the overal response time will increase (and cpu runs hot)

PHP blocks all sleep() and system() calls. What you really need is to research pcntl_fork(). Fortunately, I had these problems over a decade ago and you can look at most of my code.
I had the need for a PHP application that could connect to multiple IRC servers, sit in unlimited IRC chatrooms, moderate, interact with, and receive commands from people. All this and more was done in a process efficient way.
You can check out the entire project at http://sourceforge.net/projects/phpegg/ The code you want is in source/connect.inc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.