Process Curl Requests with a queue in AWS

Process Curl Requests with a queue in AWS - php

I have a PHP REST server. There is an endpoint where I have a CURL request. This endpoint is the most used endpoint. Process of other code is very fast except for the CURL request. The return value of the curl_exec is not important to me nor the status. So is there a way so that I can put the curl req in aws queue or something so that instead of PHP Server doing the curl, the aws does it later. And in turn the endpoint becomes super fast
Edit
Thing is this can be solved in many ways (example cron, fork, fpsocket etc) but I want the smartest way. One that uses Amazon cloud services like SQS

You can solve your problem using Supervisord(Linux) and Gearman. Please check below links for more details
Supervisord and Gearman

Related

How to send an HTTP/2 PING Frame using [PHP] cURL

I am creating a PHP (don't hate!!) script that creates a long-term connection to Apple's new APNS server, as per their new documentation.
The general concept is a while(true) loop that sleeps for n seconds, and checks a queue for outbound push notifications, which are created and inserted into a database by a separate application.
I am getting stuck with comprehending the following section of the documentation, because of my lack of knowledge in the HTTP/2 spec and protocol.
Best Practices for Managing Connections
<snip> You can check the health of your connection using an HTTP/2 PING frame.
As this loop runs, I need to be alerted of the health of my connection, so that I can reconnect, in the case that I get disconnected, or if the connection is somehow terminated.
So, to summarize, how would I send a HTTP/2 PING frame using cURL, specifically, PHP's cURL, and what might the response look like?
I suppose, since cURL uses nghttp2 as the low-level library to interact with HTTP/2, this has something to do with it, but I am not sure how to use nghttp2 functions from within curl: https://nghttp2.org/documentation/nghttp2_submit_ping.html

curl (currently) offers no API that allows an application to send specific HTTP/2 frames like PING.

Consuming SOAP and REST WebServices at the same time in PHP

My objective is consume various Web Services and then merge the results.
I was doing this using PHP cURL, but as the number of Web Services has increased, my service slowed since the process was waiting for a response and then make the request to the next Web Service.
I solved this issue using curl_multi and everything was working fine.
Now, I have a new problem, because I have new Web Services to add in my service that use Soap Protocol and I can't do simultaneous requests anymore, because I don't use cURL for Soap Web Services, I use SoapClient.
I know that I can make the XML with the soap directives and then send it with cURL, but this seems to me a bad practice.
In short, is there some way to consume REST and SOAP Web Services simultaneously?

I would first try a unified, asynchronous guzzle setup as others have said. If that doesn't work out I suggest not using process forking or multithreading. Neither are simple to use or maintain. For example, mixing guzzle and threads requires special attention.
I don't know the structure of your application, but this might be a good case for a queue. Put a message into a queue for each API call and let multiple PHP daemons read out of the queue and make the actual requests. The code can be organized to use curl or SoapClient depending on the protocol or endpoint instead of trying to combine them. Simply start up as many daemons as you want to make requests in parallel. This avoids all of the complexity of threading or process management and scales easily.
When I use this architecture I also keep track of a "semaphore" in a key-value store or database. Start the semaphore with a count of API calls to be made. As each is complete the count is reduced. Each process checks when the count hits zero and then you know all of the work is done. This is only really necessary when there's a subsequent task, such as calculating something from all of the API results or updating a record to let users know the job is done.
Now this setup sounds more complicated than process forking or multithreading, but each component is easily testable and it scales across servers.
I've put together a PHP library that helps build the architecture I'm describing. It's basic pipelining that allows a mix of synchronous and asynchronous processes. The async work is handled by a queue and semaphore. API calls that need to happen in sequence would each get a Process class. API calls that could be made concurrently go into a MultiProcess class. A ProcessList sets up the pipeline.

Yes, you can.
Use an HTTP client(ex: guzzle, httpful) most of them are following PSR7, prior to that you will have a contract. Most importantly they have plenty of plugins for SOAP and REST.
EX: if you choose guzzle as your HTTP client it has plugins SOAP. You know REST is all about calling a service so you don't need extra package for that, just use guzzle itself.
**write your API calls in an async way (non-blocking) that will increase the performance. One solution is you can use promises
Read more

its not something php is good at, and you can easily find edge-case crash bugs by doing it, but php CAN do multithreading - check php pthreads and pcntl_fork. (neither of which works on a webserver behind php-fpm / mod_php , btw, and pcntl_fork only works on unix systems (linux/bsd), windows won't work)
however, you'd probably be better off by switching to a master process -> worker processes model with proc_open & co. this works behind webservers both in php-fpm and mod_php and does not depend on pthreads being installed and even works on windows, and won't crash the other workers if a single worker crash. also you you can drop using php's curl_multi interface (which imo is very cumbersome to get right), and keep using the simple curl_exec & co functions. (here's an example for running several instances of ping https://gist.github.com/divinity76/f5e57b0f3d8131d5e884edda6e6506d7 - but i'm suggesting using php cli for this, eg proc_open('php workerProcess.php',...); , i have done it several times before with success.)

You could run a cronjob.php with crontab and start other php scripts asynchronously:
// cronjob.php
$files = [
'soap-client-1.php',
'soap-client-2.php',
'soap-client-2.php',
];
foreach($files as $file) {
$cmd = sprintf('/usr/bin/php -f "%s" >> /dev/null &', $file);
system($cmd);
}
soap-client-1.php
$client = new SoapClient('http://www.webservicex.net/geoipservice.asmx?WSDL');
$parameters = array(
'IPAddress' => '8.8.8.8',
);
$result = $client->GetGeoIP($parameters);
// #todo Save result
Each php script starts a new SOAP request and stores the result in the database. Now you can process the data by reading the result from the database.

This seems like an architecture problem. You should instead consume each service with a separate file/URL and scrape JSON from those into an HTML5/JS front-end. That way, your service can be divided into many asynchronous chunks and the speed of each can been tweaked separately.

AWS S3 batch get with PHP SDK, or parallel usage for increased performance

I am trying to find out if it's possible to fetch multiple objects from S3 in PHP.
Performing an http request for each object in a blocking and sequential way can hit performance quite hard and I was wondering if there is a way to get round this. The API doesn't seem have a batch get end point, only batch delete or copy. I was thinking maybe Guzzle could help me do it in parallel if AWS doesn't provide a way for me to do this.
Any help would be really appreciated.

You should be able to do concurrent requests following Guzzle's conventions via the AWS SDK. See Executing commands in parallel in the SDK's user guide.

AWS PHP SDK: Caching SQS getQueueUrl

I have just started using Amazon SQS.
In order to make a request I must first call $sqs->GetQueueURL(...) to retreive the full URL of my queue.
ie: https://sqs.ap-southeast-1.amazonaws.com/XXXXXXX/QUEUENAME
As far as I can tell, I need to make this request every time before calling an operation on the queue (ie: ReceiveMessage or SendMessage).
Is there an inbuilt support for AWS PHP SDK to cache these URLs? Does it do it automatically?
I can't seem to find any detail on what (if any) caching is happening by default.

you do dont have to pull the queue url each time as it does not change.
go to sqs consule, and check all your queues. then copy the url you need, and use it.

Ignore cURL Response?

I have a login script that passes data to another script for processing. The processing is unrelated to the login script but it does a bit of data checking and logging for internal analysis.
I am using cURL to pass this data, but cURL is waiting for the response. I do not want to wait for the response because it's causing the user to have to wait before the analysis is complete before they can log in.
I am aware that the request could fail, but I am not overly concerned.
I basically want it to work like a multi threaded application where cURL is being used to fork a process. Is there any way to do this?
My code is below:
// Log user in
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,'http://site.com/userdata.php?e=' . $email);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);
// Redirect user to their home page
Thats all it does. But at the moment it has to wait for the cURL request to get a response.
Is there any way to make a get request and not wait for the response?

You don't need curl for this. Just open a socket and fire off a manual HTTP request and then close the socket. This is also useful because you can use a custom user agent so as not to skew your logging.
See this answer for an example.
Obviously, it's not "true" async/forking, but it should be quick enough.

I like Matt's idea the best, however to speed up your request you could
a) just make a head request (CURLOPT_NOBODY) which is significantly faster (no response body)
or
b) just set down the requesttime limit really low, however i guess you should test if the abortion of the request is really faster to only HEADing

Another possibility: Since there's apparently no need to do the analysis immediately, why do it immediately? If your provider allows cron jobs, just have the script that curl calls store the passed data quickly in a database or file, and have a cron job execute the processing script once a minute or hour or day. Or, if you can't do that, set up your own local machine to regularly run a script that invokes the remote one which processes the stored data.

It strikes me that what you're describing is a queue. You want to kick off a bunch of offline processing jobs and process them independently of user interaction. There are plenty of systems for doing that, though I'd particularly recommend beanstalkd using pheanstalk in PHP. It's far more reliable and controllable (e.g. managing retries in case of failures) than a cron job, and it's also very easy to distribute processing across multiple servers.
The equivalent of your calling a URL and ignoring the response is creating a new job in a 'tube'. It solves your particular problem because it will return more or less instantly and there is no response body to speak of.
At the processing end you don't need exec - run a CLI script in an infinite loop that requests jobs from the queue and processes them.
You could also look at ZeroMQ.
Overall this is not dissimilar to what GZipp suggests, it's just using a system that's designed specifically for this mode of operation.
If you have a restrictive ISP that won't let you run other software, it may be time to find a new ISP - Amazon AWS will give you a free EC2 micro instance for a year.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Process Curl Requests with a queue in AWS - php

You can solve your problem using Supervisord(Linux) and Gearman. Please check below links for more details Supervisord and Gearman

Related

How to send an HTTP/2 PING Frame using [PHP] cURL

Consuming SOAP and REST WebServices at the same time in PHP

AWS S3 batch get with PHP SDK, or parallel usage for increased performance

AWS PHP SDK: Caching SQS getQueueUrl

Ignore cURL Response?

Categories

Resources