I have php and nodejs installed in my server. I am calling a nodejs code which uses node canvas via php
Like:
<?php
exec(node path/to/the/js/file);
?>
Problem:
Each process of this execution consumes around 250 Mb of memory because of node canvas. So If my server has around 8Gb of memory, only 32 users can use the service at any given point of time and also there is a risk of crashing the server if the number of users exceeds.
Is there any scalable solution to this problem?
UPDATE I have to use Canvas server side because of my business requirements, so I am using node canvas, but it is heavily consuming the memory which is giving a huge problem.
Your problem is that you start a new node.js process for each request, that is why the memory footprint is so huge, and isn't what node.js is built for.
But node.js is built to handle a lot of different request in only one process, use that to your advantage.
What I advice you to do is to have only one node.js process started, and find another way to communicate between your PHP process and the node.js process.
There is a lot of different ways to do that, some more perfomant than others, some harder to build than other. All have pros and cons, but since both are web related language, you can be sure there is support in both for HTTP request.
So what you should do is a basic node.js/Express server, probably with only one API point, which execute the code you already did, and return the result. It is easy enought to do (especially if you use JSON to communicate between them), and while I don't know PHP, I m pretty sure it is easy to send a HTTP request and interpret the answer.
If you are ready to dig in node.js, you could try sockets or MQ, which should be more performant.
That way, you only have one node.js process, which shouldn't grow in memory and handle a lot more client, will not have to use exec, and have a first try with Express.
Related
this question has to do with theory as with real life programming I first asked it in (cs.stackexchange.com) because is theory most and I had the instruction to ask here (https://cs.stackexchange.com/questions/81472/question-about-implementing-websockets-theory-and-the-reality-in-php) .
I am experimenting with web sockets and PHP many years now (some of this code is already in production) , first I created from scratch a WebSocket (WS) Server with non blocking IO and everything worked fine , except in real life other methods needed by the app couldn’t be non blocking (e.g. connection to a DB and a query). Then I introduced async programming , meaning that the WS Server initiated various PHP requests to the sever and check in every loop if those requests have finished the results in order to send them to client. That worked well for few client side users connected to this WS server , the number had to do with what the operation was but it wouldn’t be more than 30 or 50. That were because if you use only one thread and you have many simultaneous requests you must check each one of them sequential if there is a finished result.
The next step was to analyze the code of popular approaches claiming that can hold and process many (some say 10000) clients in same time. Maybe they knew something that I didn’t (My issue isn’t if they are lying , the issue is if there is something I am missing (or maybe I am wrong) here). The results were frustrating. Most of them don’t use async by default advising you not to use blocking methods (something that is really impossible in real life programming) , but even if you put modules to them to make them async the same problem that I had arose.
The question isn’t what is the solution , because I implemented PHP pthreads and I could make it work , but with no real benefit (e.g. sharing objects , it had to serialize unserialize everything), I write C++ PHP extensions some years now , so I am working in a PHP extension that will do that efficiently.
The question here is , am I missing something ? How can they claim that the can handle a large amount of request simultaneously while even with async programming they have to check for each request in the loop that has finished ?
Thank you in advance for any new knowledge or direction to search that your answer might lead me.
Yes, there are projects that make it possible with PHP. One such project is Amp with its Aerys HTTP and WebSocket server. Yes, you can't just call blocking functions in the same thread. Yes, pthreads won't help, it's mostly like just running another PHP process, because everything in PHP is shared nothing. But how does it work then?
Use non-blocking implementations where possible. There are libraries that work with non-blocking I/O for database access, such as amphp/mysql.
If there's no such library, ask whether something like that can be implemented if you don't want to / can't implement it yourself.
Another possibility is to use libraries such as amphp/parallel that use persistent workers for blocking tasks. Spawning another worker for each blocking task would be horribly inefficient, so that library makes it easy to use worker pools and keep these workers alive for several tasks each.
One such library that makes use of amphp/parallel is amphp/file, which uses these workers for non-blocking filesystem access when no extensions like uv or eio are available, maybe you want to have a look at its ParallelDriver.
How many connections you will be able to handle concurrently depends a lot on your hardware and what you're doing with these connections. If you constantly stream data to each client, you will be able to keep much fewer connections open than in a situation where most connections are idle and only send / receive something in a small portion of the connected time.
If you want to handle more than ~1000 clients, you probably need an extension or recompile PHP because of the FD_MAXSIZE for stream_select, which is compiled in and limits stream_select to file descriptors lower than 1024.
My objective is consume various Web Services and then merge the results.
I was doing this using PHP cURL, but as the number of Web Services has increased, my service slowed since the process was waiting for a response and then make the request to the next Web Service.
I solved this issue using curl_multi and everything was working fine.
Now, I have a new problem, because I have new Web Services to add in my service that use Soap Protocol and I can't do simultaneous requests anymore, because I don't use cURL for Soap Web Services, I use SoapClient.
I know that I can make the XML with the soap directives and then send it with cURL, but this seems to me a bad practice.
In short, is there some way to consume REST and SOAP Web Services simultaneously?
I would first try a unified, asynchronous guzzle setup as others have said. If that doesn't work out I suggest not using process forking or multithreading. Neither are simple to use or maintain. For example, mixing guzzle and threads requires special attention.
I don't know the structure of your application, but this might be a good case for a queue. Put a message into a queue for each API call and let multiple PHP daemons read out of the queue and make the actual requests. The code can be organized to use curl or SoapClient depending on the protocol or endpoint instead of trying to combine them. Simply start up as many daemons as you want to make requests in parallel. This avoids all of the complexity of threading or process management and scales easily.
When I use this architecture I also keep track of a "semaphore" in a key-value store or database. Start the semaphore with a count of API calls to be made. As each is complete the count is reduced. Each process checks when the count hits zero and then you know all of the work is done. This is only really necessary when there's a subsequent task, such as calculating something from all of the API results or updating a record to let users know the job is done.
Now this setup sounds more complicated than process forking or multithreading, but each component is easily testable and it scales across servers.
I've put together a PHP library that helps build the architecture I'm describing. It's basic pipelining that allows a mix of synchronous and asynchronous processes. The async work is handled by a queue and semaphore. API calls that need to happen in sequence would each get a Process class. API calls that could be made concurrently go into a MultiProcess class. A ProcessList sets up the pipeline.
Yes, you can.
Use an HTTP client(ex: guzzle, httpful) most of them are following PSR7, prior to that you will have a contract. Most importantly they have plenty of plugins for SOAP and REST.
EX: if you choose guzzle as your HTTP client it has plugins SOAP. You know REST is all about calling a service so you don't need extra package for that, just use guzzle itself.
**write your API calls in an async way (non-blocking) that will increase the performance. One solution is you can use promises
Read more
its not something php is good at, and you can easily find edge-case crash bugs by doing it, but php CAN do multithreading - check php pthreads and pcntl_fork. (neither of which works on a webserver behind php-fpm / mod_php , btw, and pcntl_fork only works on unix systems (linux/bsd), windows won't work)
however, you'd probably be better off by switching to a master process -> worker processes model with proc_open & co. this works behind webservers both in php-fpm and mod_php and does not depend on pthreads being installed and even works on windows, and won't crash the other workers if a single worker crash. also you you can drop using php's curl_multi interface (which imo is very cumbersome to get right), and keep using the simple curl_exec & co functions. (here's an example for running several instances of ping https://gist.github.com/divinity76/f5e57b0f3d8131d5e884edda6e6506d7 - but i'm suggesting using php cli for this, eg proc_open('php workerProcess.php',...); , i have done it several times before with success.)
You could run a cronjob.php with crontab and start other php scripts asynchronously:
// cronjob.php
$files = [
'soap-client-1.php',
'soap-client-2.php',
'soap-client-2.php',
];
foreach($files as $file) {
$cmd = sprintf('/usr/bin/php -f "%s" >> /dev/null &', $file);
system($cmd);
}
soap-client-1.php
$client = new SoapClient('http://www.webservicex.net/geoipservice.asmx?WSDL');
$parameters = array(
'IPAddress' => '8.8.8.8',
);
$result = $client->GetGeoIP($parameters);
// #todo Save result
Each php script starts a new SOAP request and stores the result in the database. Now you can process the data by reading the result from the database.
This seems like an architecture problem. You should instead consume each service with a separate file/URL and scrape JSON from those into an HTML5/JS front-end. That way, your service can be divided into many asynchronous chunks and the speed of each can been tweaked separately.
I am considering building a site using php, but there are several aspects of it that would perform far, far better if made in node.js. At the same time, large portions of of the site need to remain in PHP. This is because a lot of functionality is already developed in PHP, and redeveloping, testing, and so forth would be too large of an undertaking, and quite frankly, those parts of the site run perfectly fine in PHP.
I am considering rebuilding the sections in node.js that would benefit from running most in node.js, then having PHP pass the request to node.js using Gearman. This way, I scan scale out by launching more workers and have gearman handle the load distribution.
Our site gets a lot of traffic, and I am concerned if gearman can handle this load. I wan't to keep this question productive, so let's focus largely on the following addressable points:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Would this run better if I just passed the requests to node.js using CURL, and if so, does node.js provide any way to distribute the load over multiple instances of a given script?
Can gearman be configured in a way that there is no single point of failure?
What are some issues that you guys can see arising both in terms of development and scaling?
I am addressing these wide range of points so anyone viewing this post can collect a wide range of information in one place regarding matters that strongly affect each other.
Of course I will test all of this, but I want to collect as much information as possible before potentially undertaking something like this.
Edit: A large reason I am using gearman is not because of it's non-blocking structure, but because of it's sheer speed.
I can only speak to your questions on Gearman:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Short: Yes
Long: Everything has its limit. If your job payloads are inordinately large you may run into issues. Gearman stores its queue in memory.. so if your payloads exceed the amount of memory available to Gearman you'll run into problems.
Can gearman be configured in a way that there is no single point of failure?
Gearman has a plugin/extension/component available to use MySQL as a persistence store. That way, if Gearman or the machine itself goes down you can bring it right back up where it left off. Multiple worker-servers can help keep things going if other workers go down.
Node has a cluster module that can do basic load balancing against n processes. You might find it useful.
A common architecture here in nodejs-land is to have your nodes talk http and then use some way of load balancing such as an http proxy or a service registry. I'm sure it's more or less the same elsewhere. I don't know enough about gearman to say if it'll be "good enough," but if this is the general idea then I'd imagine it would be fine. At the least, other people would be interested in hearing how it went I'm sure!
Edit: Remember, number-crunching will block node's event loop! This is somewhat obvious if you think about it, but definitely something to keep in mind.
No, I'm not trying to see how many buzzwords I can throw into a single question title.
I'm making REST requests through cURL in my PHP app to some webservices. These requests need to be made fairly often since much of the application depends on this API. However, there is severe latency with the requests (2-5 seconds) which just makes my app look painfully slow.
While I'm halfway to a solution with a recommendation to cache these requests in Memcached, I'm still not satisfied with that kind of latency ever appearing within the application.
So here was my thought: I can implement AJAX long-polling in the background so that the user never experiences the latency outright. The REST requests/Memcache lookups will be done all through AJAX at a set interval.
But this is all really new to me and I'm not sure if this is the best approach. And if I'm on the right track, I do know that PHP + Apache is not going to handle something like this well. But PHP is the only language I know. I'd ideally like to set up something like Tornado in Python, but I'm just not sure if I'm over-engineering right now or not.
Any thoughts here would be helpful and much appreciated.
This was some pretty quick turnaround, but I went back through and profiled my app by echoing out microtime() throughout the relevant processes. Turns out that I'm not parallelizing my cURL requests and that's where I take the real hit. It takes approximately 2 seconds to do that, which means very long delays while each cURL request is done in succession.
I'm keeping my self busy working on app that gets a feed from twitter search API, then need to extract all the URLs from each status in the feed, and finally since lots of the URLs are shortened I'm checking the response header of each URL to get the real URL it leads to.
for a feed of 100 entries this process can be more then a minute long!! (still working local on my pc)
i'm initiating Curl resource one time per feed and keep it open until I'm finished all the URL expansions though this helped a bit i'm still warry that i'l be in trouble when going live
any ideas how to speed things up?
The issue is, as Asaph points out, that you're doing this in a single-threaded process, so all of the network latency is being serialized.
Does this all have to happen inside an http request, or can you queue URLs somewhere, and have some background process chew through them?
If you can do the latter, that's the way to go.
If you must do the former, you can do the same sort of thing.
Either way, you want to look at way to chew through the requests in parallel. You could write a command-line PHP script that forks to accomplish this, though you might be better off looking into writing such a beast in language that supports threading, such as ruby or python.
You may be able to get significantly increased performance by making your application multithreaded. Multi-threading is not supported directly by PHP per se, but you may be able to launch several PHP processes, each working on a concurrent processing job.