I have been tasked with rebuilding an application (CakePHP 2.0, php 5.6) that receives a request, reformats/maps the body to API specific fields and makes requests to multiple APIs with the newly mapped body.
Once the responses are coming back they will be decoded and placed in the output array as a response from the application.
Currently the decoding (mapping from API specific fields) process happens in sequence once the Multicurl requests return.
My idea is to process the responses and soon as they arrive, and I am attempting to do so in parallel.
One complexity is that every target API needs 4 very specific mapping functions so every API object has a map and reverse map for 2 different operations.
A client requirement is to have the minimum number of dependencies, the solution should preferably be in raw php, no libraries wanted.
The KISS solution has been requested.
I have considered the following approaches but they all have drawbacks.
Multicurl waits for the slowest response to return all responses. This is the current approach, no parallel response processing.
pthreads not compatible with Apache, command line only.
Can't pass complex objects (API object) via Sockets easily.
Too many dependencies and/or too immature.
a) Appserver
b) Kraken
c) RabbitMQ
d) socket.io
I am looking for PHP 7 (nothing else) alternatives to this task.
Any suggestions?
It's worth noting that 'parallel' and 'asynchronous' are separate concepts.
eg: ReactPHP and it's ilk [node.js included] are asynchronous, but still single-threaded, relying on event loops, callbacks, and coroutines to allow out-of-order execution of code.
Responding to your assessment of approaches:
Accurate assessment of curl_multi().
However, your stated case is that all of this needs to take place within the context of a single request, so no matter what you do you're going to be stuck waiting on the slowest API response before you can serve your response.
Unless you're fundamentally changing your workflow you should probably just stick with this.
It's not that pthreads is incompatible with apache, it's that it's incompatible with mod_php.
Use use an FCGI worker model like FPM and you can use pthreads all you want.
Why not? That's what serialization is for.
So long as you never accept it from users or want to use it outside of PHP, serialize() is one option.
In pretty much all other cases json_encode() is going to be the way to go.
If you're just going to write off solutions wholesale like this you're going to have a bad time, particularly if you're trying to do something that's inherently at odds with PHP, like parallel/async processing.
Related
I have php and nodejs installed in my server. I am calling a nodejs code which uses node canvas via php
Like:
<?php
exec(node path/to/the/js/file);
?>
Problem:
Each process of this execution consumes around 250 Mb of memory because of node canvas. So If my server has around 8Gb of memory, only 32 users can use the service at any given point of time and also there is a risk of crashing the server if the number of users exceeds.
Is there any scalable solution to this problem?
UPDATE I have to use Canvas server side because of my business requirements, so I am using node canvas, but it is heavily consuming the memory which is giving a huge problem.
Your problem is that you start a new node.js process for each request, that is why the memory footprint is so huge, and isn't what node.js is built for.
But node.js is built to handle a lot of different request in only one process, use that to your advantage.
What I advice you to do is to have only one node.js process started, and find another way to communicate between your PHP process and the node.js process.
There is a lot of different ways to do that, some more perfomant than others, some harder to build than other. All have pros and cons, but since both are web related language, you can be sure there is support in both for HTTP request.
So what you should do is a basic node.js/Express server, probably with only one API point, which execute the code you already did, and return the result. It is easy enought to do (especially if you use JSON to communicate between them), and while I don't know PHP, I m pretty sure it is easy to send a HTTP request and interpret the answer.
If you are ready to dig in node.js, you could try sockets or MQ, which should be more performant.
That way, you only have one node.js process, which shouldn't grow in memory and handle a lot more client, will not have to use exec, and have a first try with Express.
I am looking to hit multiple 3rd party APIs to gather information for a user's search query. I am planning to spin off a thread for each API I want to hit to minimize the response time on my end. I also want to limit the amount of threads my application can have running at any one time due to memory/cpu concerns.
Since I am using Laravel as my framework, I was trying to accomplish this using Laravel queues, but it seems that I might have trouble getting the response data from the Job.
Are laravel queues the correct way to tackle this? If so how do I
listen for the job's status and retrieve the data once the job is complete? I see some things that point towards passing a closure to the job, but something just isn't clicking for me.
It depends. A job queue and worker pool might be appropriate if there are a really huge number of API calls to make, especially if those API calls can be very slow. But, I'd try to avoid all that architecture unless you're really sure you need it.
To start, I'd look at doing async requests to the external APIs, and try to keep the whole thing in a single process. The Guzzle HTTP client library provides a very programmer-friendly API for doing this kind of asynchronous requests.
If the external requests are really numerous or slow, you might consider using a queue. But in that case, you're looking at implementing a bunch of logic to queue all the jobs, then poll until they're done (giving feedback to your user along the way), and finally return the merged result. That may end up being necessary, but I'd start with the simpler implementation I describe above.
My objective is consume various Web Services and then merge the results.
I was doing this using PHP cURL, but as the number of Web Services has increased, my service slowed since the process was waiting for a response and then make the request to the next Web Service.
I solved this issue using curl_multi and everything was working fine.
Now, I have a new problem, because I have new Web Services to add in my service that use Soap Protocol and I can't do simultaneous requests anymore, because I don't use cURL for Soap Web Services, I use SoapClient.
I know that I can make the XML with the soap directives and then send it with cURL, but this seems to me a bad practice.
In short, is there some way to consume REST and SOAP Web Services simultaneously?
I would first try a unified, asynchronous guzzle setup as others have said. If that doesn't work out I suggest not using process forking or multithreading. Neither are simple to use or maintain. For example, mixing guzzle and threads requires special attention.
I don't know the structure of your application, but this might be a good case for a queue. Put a message into a queue for each API call and let multiple PHP daemons read out of the queue and make the actual requests. The code can be organized to use curl or SoapClient depending on the protocol or endpoint instead of trying to combine them. Simply start up as many daemons as you want to make requests in parallel. This avoids all of the complexity of threading or process management and scales easily.
When I use this architecture I also keep track of a "semaphore" in a key-value store or database. Start the semaphore with a count of API calls to be made. As each is complete the count is reduced. Each process checks when the count hits zero and then you know all of the work is done. This is only really necessary when there's a subsequent task, such as calculating something from all of the API results or updating a record to let users know the job is done.
Now this setup sounds more complicated than process forking or multithreading, but each component is easily testable and it scales across servers.
I've put together a PHP library that helps build the architecture I'm describing. It's basic pipelining that allows a mix of synchronous and asynchronous processes. The async work is handled by a queue and semaphore. API calls that need to happen in sequence would each get a Process class. API calls that could be made concurrently go into a MultiProcess class. A ProcessList sets up the pipeline.
Yes, you can.
Use an HTTP client(ex: guzzle, httpful) most of them are following PSR7, prior to that you will have a contract. Most importantly they have plenty of plugins for SOAP and REST.
EX: if you choose guzzle as your HTTP client it has plugins SOAP. You know REST is all about calling a service so you don't need extra package for that, just use guzzle itself.
**write your API calls in an async way (non-blocking) that will increase the performance. One solution is you can use promises
Read more
its not something php is good at, and you can easily find edge-case crash bugs by doing it, but php CAN do multithreading - check php pthreads and pcntl_fork. (neither of which works on a webserver behind php-fpm / mod_php , btw, and pcntl_fork only works on unix systems (linux/bsd), windows won't work)
however, you'd probably be better off by switching to a master process -> worker processes model with proc_open & co. this works behind webservers both in php-fpm and mod_php and does not depend on pthreads being installed and even works on windows, and won't crash the other workers if a single worker crash. also you you can drop using php's curl_multi interface (which imo is very cumbersome to get right), and keep using the simple curl_exec & co functions. (here's an example for running several instances of ping https://gist.github.com/divinity76/f5e57b0f3d8131d5e884edda6e6506d7 - but i'm suggesting using php cli for this, eg proc_open('php workerProcess.php',...); , i have done it several times before with success.)
You could run a cronjob.php with crontab and start other php scripts asynchronously:
// cronjob.php
$files = [
'soap-client-1.php',
'soap-client-2.php',
'soap-client-2.php',
];
foreach($files as $file) {
$cmd = sprintf('/usr/bin/php -f "%s" >> /dev/null &', $file);
system($cmd);
}
soap-client-1.php
$client = new SoapClient('http://www.webservicex.net/geoipservice.asmx?WSDL');
$parameters = array(
'IPAddress' => '8.8.8.8',
);
$result = $client->GetGeoIP($parameters);
// #todo Save result
Each php script starts a new SOAP request and stores the result in the database. Now you can process the data by reading the result from the database.
This seems like an architecture problem. You should instead consume each service with a separate file/URL and scrape JSON from those into an HTML5/JS front-end. That way, your service can be divided into many asynchronous chunks and the speed of each can been tweaked separately.
I'm keeping my self busy working on app that gets a feed from twitter search API, then need to extract all the URLs from each status in the feed, and finally since lots of the URLs are shortened I'm checking the response header of each URL to get the real URL it leads to.
for a feed of 100 entries this process can be more then a minute long!! (still working local on my pc)
i'm initiating Curl resource one time per feed and keep it open until I'm finished all the URL expansions though this helped a bit i'm still warry that i'l be in trouble when going live
any ideas how to speed things up?
The issue is, as Asaph points out, that you're doing this in a single-threaded process, so all of the network latency is being serialized.
Does this all have to happen inside an http request, or can you queue URLs somewhere, and have some background process chew through them?
If you can do the latter, that's the way to go.
If you must do the former, you can do the same sort of thing.
Either way, you want to look at way to chew through the requests in parallel. You could write a command-line PHP script that forks to accomplish this, though you might be better off looking into writing such a beast in language that supports threading, such as ruby or python.
You may be able to get significantly increased performance by making your application multithreaded. Multi-threading is not supported directly by PHP per se, but you may be able to launch several PHP processes, each working on a concurrent processing job.
I works with ASP.NET. IMHO, the asynchronous programming support in ASP.NET is beautiful. That is, we can have BeginXXXX/EndXXXX pair method to improve scalability for resource intensive task.
For example, one operation needs to get huge data from database and render it on response web page. If we have this operation synchronous. The thread handing this request will be occupied for the whole page life cycle. Since threads are limited resource, it is always better to program operation with I/O in asynchronous way. That is, ASP.NET will allocate thread to invoke BeginXXXX method with a callback function. The thread invokes BeginXXXX returns immediately and can be arranged to handle other requests. When the job is done, the callback function is triggered and ASP.NET will invoke EndXXXX to get the actually response.
This asynchronous programming model can fully takes advantage of threading resources. Even though there is limit of ThreadPool, it can actually handle much more requests. However, if we program in synchronous way, and each request needs lengthy I/O, the concurrent requests would not exceed size of thread pool.
Recently, I have chance to explore other web development solution such as PHP and Ruby on Rails. To my surprise, these solutions doesn't have counterpart of asynchronous programming model. Each request is handled by one thread or process for the whole life cycle. That is, the thread or process is occupied before the last bit of response is sent.
There is something similar to asynchronously(http://netevil.org/blog/2005/may/guru-multiplexing), but the baseline is that there is always one thread or process occupied for the request. This is not like ASP.NET.
So, I am wondering: why doesn't these popular web solution have asynchronous programming model like ASP.NET? Why only ASP.NET evolves to use asynchronous approach?
Is it because PHP and Ruby-on-Rails mostly deployed in Linux? And Linux doesn't suffer process/thread performance penalty like Microsoft Windows?
Or, is there actually asynchronous solution for PHP and Ruby-on-Rails that I haven't find?
Thanks.
I don't have a definitive answer to your question, but I can make an educated guess.
Systems such as PHP and Ruby are designed to be very platform-independent, whereas ASP.NET is deeply integrated into the Windows platform. In addition, PHP is more like old-style ASP, with a linear, start-to-finish flow.
Full ASP.NET-style async pages require not only threads, but native async I/O to be used to their maximum impact. Async I/O is an OS-specific capability. Async pages also rely on the concept of a page life cycle, which is anathema to the linear flow style. Without a page life cycle, it becomes much more difficult to integrate the results of the async calls with the rest of your page.
Just my two cents, YMMV.