My objective is consume various Web Services and then merge the results.
I was doing this using PHP cURL, but as the number of Web Services has increased, my service slowed since the process was waiting for a response and then make the request to the next Web Service.
I solved this issue using curl_multi and everything was working fine.
Now, I have a new problem, because I have new Web Services to add in my service that use Soap Protocol and I can't do simultaneous requests anymore, because I don't use cURL for Soap Web Services, I use SoapClient.
I know that I can make the XML with the soap directives and then send it with cURL, but this seems to me a bad practice.
In short, is there some way to consume REST and SOAP Web Services simultaneously?
I would first try a unified, asynchronous guzzle setup as others have said. If that doesn't work out I suggest not using process forking or multithreading. Neither are simple to use or maintain. For example, mixing guzzle and threads requires special attention.
I don't know the structure of your application, but this might be a good case for a queue. Put a message into a queue for each API call and let multiple PHP daemons read out of the queue and make the actual requests. The code can be organized to use curl or SoapClient depending on the protocol or endpoint instead of trying to combine them. Simply start up as many daemons as you want to make requests in parallel. This avoids all of the complexity of threading or process management and scales easily.
When I use this architecture I also keep track of a "semaphore" in a key-value store or database. Start the semaphore with a count of API calls to be made. As each is complete the count is reduced. Each process checks when the count hits zero and then you know all of the work is done. This is only really necessary when there's a subsequent task, such as calculating something from all of the API results or updating a record to let users know the job is done.
Now this setup sounds more complicated than process forking or multithreading, but each component is easily testable and it scales across servers.
I've put together a PHP library that helps build the architecture I'm describing. It's basic pipelining that allows a mix of synchronous and asynchronous processes. The async work is handled by a queue and semaphore. API calls that need to happen in sequence would each get a Process class. API calls that could be made concurrently go into a MultiProcess class. A ProcessList sets up the pipeline.
Yes, you can.
Use an HTTP client(ex: guzzle, httpful) most of them are following PSR7, prior to that you will have a contract. Most importantly they have plenty of plugins for SOAP and REST.
EX: if you choose guzzle as your HTTP client it has plugins SOAP. You know REST is all about calling a service so you don't need extra package for that, just use guzzle itself.
**write your API calls in an async way (non-blocking) that will increase the performance. One solution is you can use promises
Read more
its not something php is good at, and you can easily find edge-case crash bugs by doing it, but php CAN do multithreading - check php pthreads and pcntl_fork. (neither of which works on a webserver behind php-fpm / mod_php , btw, and pcntl_fork only works on unix systems (linux/bsd), windows won't work)
however, you'd probably be better off by switching to a master process -> worker processes model with proc_open & co. this works behind webservers both in php-fpm and mod_php and does not depend on pthreads being installed and even works on windows, and won't crash the other workers if a single worker crash. also you you can drop using php's curl_multi interface (which imo is very cumbersome to get right), and keep using the simple curl_exec & co functions. (here's an example for running several instances of ping https://gist.github.com/divinity76/f5e57b0f3d8131d5e884edda6e6506d7 - but i'm suggesting using php cli for this, eg proc_open('php workerProcess.php',...); , i have done it several times before with success.)
You could run a cronjob.php with crontab and start other php scripts asynchronously:
// cronjob.php
$files = [
'soap-client-1.php',
'soap-client-2.php',
'soap-client-2.php',
];
foreach($files as $file) {
$cmd = sprintf('/usr/bin/php -f "%s" >> /dev/null &', $file);
system($cmd);
}
soap-client-1.php
$client = new SoapClient('http://www.webservicex.net/geoipservice.asmx?WSDL');
$parameters = array(
'IPAddress' => '8.8.8.8',
);
$result = $client->GetGeoIP($parameters);
// #todo Save result
Each php script starts a new SOAP request and stores the result in the database. Now you can process the data by reading the result from the database.
This seems like an architecture problem. You should instead consume each service with a separate file/URL and scrape JSON from those into an HTML5/JS front-end. That way, your service can be divided into many asynchronous chunks and the speed of each can been tweaked separately.
Related
To extend the request limits I want to fetch data from an API endpoint and provide them to my users from a third party hosting platform. They usually support php so I was thinking of using it. The data should update like once a minute or every two minutes. The fetching process itself could be as simple as possible, e.g. like this:
$json = file_get_contents('abc.com/xyz');
file_put_contents('example.json', $json);
Like this an endpoint would be fetched and written into a local file. But to repeat this step continuously and keep the data updated this script would be needed to run permanently or executed frequently. The only way I found was to use cron jobs for that issue but would that be recommendable to use to keep files updated? Or are there way better methods to do this?
I know that there are better setups to solve that issue like handling it with node.js but I consider using a platform like this so I only have to manage the communication between the API and the server and not between server and clients and didn’t find another way to do so but I‘m open to other suggestions!
While it can be done differently (like with node.js you mentioned or other methods), I believe that a system cron job to be run every X minutes (depending on how long it takes for the API to respond) will suffice and keep things simple.
Provided of course that you are able to set-up system cron jobs on your webserver.
I have been tasked with rebuilding an application (CakePHP 2.0, php 5.6) that receives a request, reformats/maps the body to API specific fields and makes requests to multiple APIs with the newly mapped body.
Once the responses are coming back they will be decoded and placed in the output array as a response from the application.
Currently the decoding (mapping from API specific fields) process happens in sequence once the Multicurl requests return.
My idea is to process the responses and soon as they arrive, and I am attempting to do so in parallel.
One complexity is that every target API needs 4 very specific mapping functions so every API object has a map and reverse map for 2 different operations.
A client requirement is to have the minimum number of dependencies, the solution should preferably be in raw php, no libraries wanted.
The KISS solution has been requested.
I have considered the following approaches but they all have drawbacks.
Multicurl waits for the slowest response to return all responses. This is the current approach, no parallel response processing.
pthreads not compatible with Apache, command line only.
Can't pass complex objects (API object) via Sockets easily.
Too many dependencies and/or too immature.
a) Appserver
b) Kraken
c) RabbitMQ
d) socket.io
I am looking for PHP 7 (nothing else) alternatives to this task.
Any suggestions?
It's worth noting that 'parallel' and 'asynchronous' are separate concepts.
eg: ReactPHP and it's ilk [node.js included] are asynchronous, but still single-threaded, relying on event loops, callbacks, and coroutines to allow out-of-order execution of code.
Responding to your assessment of approaches:
Accurate assessment of curl_multi().
However, your stated case is that all of this needs to take place within the context of a single request, so no matter what you do you're going to be stuck waiting on the slowest API response before you can serve your response.
Unless you're fundamentally changing your workflow you should probably just stick with this.
It's not that pthreads is incompatible with apache, it's that it's incompatible with mod_php.
Use use an FCGI worker model like FPM and you can use pthreads all you want.
Why not? That's what serialization is for.
So long as you never accept it from users or want to use it outside of PHP, serialize() is one option.
In pretty much all other cases json_encode() is going to be the way to go.
If you're just going to write off solutions wholesale like this you're going to have a bad time, particularly if you're trying to do something that's inherently at odds with PHP, like parallel/async processing.
I have php and nodejs installed in my server. I am calling a nodejs code which uses node canvas via php
Like:
<?php
exec(node path/to/the/js/file);
?>
Problem:
Each process of this execution consumes around 250 Mb of memory because of node canvas. So If my server has around 8Gb of memory, only 32 users can use the service at any given point of time and also there is a risk of crashing the server if the number of users exceeds.
Is there any scalable solution to this problem?
UPDATE I have to use Canvas server side because of my business requirements, so I am using node canvas, but it is heavily consuming the memory which is giving a huge problem.
Your problem is that you start a new node.js process for each request, that is why the memory footprint is so huge, and isn't what node.js is built for.
But node.js is built to handle a lot of different request in only one process, use that to your advantage.
What I advice you to do is to have only one node.js process started, and find another way to communicate between your PHP process and the node.js process.
There is a lot of different ways to do that, some more perfomant than others, some harder to build than other. All have pros and cons, but since both are web related language, you can be sure there is support in both for HTTP request.
So what you should do is a basic node.js/Express server, probably with only one API point, which execute the code you already did, and return the result. It is easy enought to do (especially if you use JSON to communicate between them), and while I don't know PHP, I m pretty sure it is easy to send a HTTP request and interpret the answer.
If you are ready to dig in node.js, you could try sockets or MQ, which should be more performant.
That way, you only have one node.js process, which shouldn't grow in memory and handle a lot more client, will not have to use exec, and have a first try with Express.
I'm writing a php code for a web server where it's required to do some heavy duty processes when requested before returning the results to the users.
My question is: does the apache server creates a separate thread/process for each client or should I use multi-threading to separate them?
The processes include calling the execution of other applications through cmd and downloading files to the server.
Well every request to the web server is a separate process which will try to use a free core from the CPU, and if there isn't a free one currently, it will go on a que and wait.
You can't have multithreading in php with apache within a single web request. You simply can't. Usually at each request apache forks a new O.S. process.
This is configurable, but typically chosen when working with php, since many methods of php standard library are not thread safe.
When I had to handle heavy computation I always choose to make the user request asynchronous, and let a third-process daemon to do the actual computation in background. In this case, after the user request, I let the client to poll the daemon (through others web-requests) to know when the computation is done.
My app takes a loooong list of urls, and split it in X (where X = $threads) so then I can start a thread.php and calculate the urls for it. Then it does GET and POST request to retrieve data
I am using this:
for($x=1;$x<=$threads;$x++){
$pid[] = exec("/path/bin/php thread.php <options> > /dev/null & echo \$!");
}
For "threading" (I know its not really threading, is it forking or what?), I save the pids into a file for later checking if N thread is running and to stop them.
Now I want to move out from php, I was thinking about using python because I'd like to learn more about it.
How can I achieve this kind of "threading" with python? (or ruby)
Or is there a better way to launch multiple background threads in python or ruby that runs in parallel (at the same time)?
The threads doesn't need to communicate between each other or with a main thread, they are independent, they do http request and interact with a mysql db, they may need to access/modify the same table entries (I haven't tought about this or how I will solve it yet).
The app works with "projects", each project has a "max threads" variable and I use a web interface to control it (so I could still use php for the interface [starting/stopping threads] in the new app).
I wanted to use
from threading import Thread
in python, but I've been told those threads wont run in parallel but once at a time.
The app is intended to run on linux web servers.
Any suggestion will be appreciated.
For Python 2.6+, consider the multiprocessing module:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows
For Python 2.5, the same functionality is available via pyprocessing.
In addition to the example at the links above, here are some additional links to get you started:
multiprocessing Basics
Communication between processes with multiprocessing
You don't want threading. You want a work queue like Gearman that you can send jobs to asynchronously.
It's worth noting that this is a cross-platform, cross-language solution. There are bindings for many languages (including Python and PHP) provided officially, and many more unofficially with a bit of work with Google.
The original intent is effectively load balancing, but it works just as well with only one machine. Basically, you can create one or more Workers that listen for Jobs. You can control the number of Workers and the types of Jobs they can listen for.
If you insert five Jobs into the queue at the same time, and there happen to be five Workers waiting, each Worker will be handed one of the Jobs. If there are more Jobs than Workers, the Jobs get handled sequentially. Your Client (the thing that submits Jobs) can either wait for all of the Jobs it's created to complete, or it can simply place them in the queue and continue on.