From what I understand, basically, PHP server-side apps (PHP-FPM) load the entire app from scratch on every request and then close it down at the end of a request. Meaning that variables, containers, config and everything else gets read and built from zero in each separate request and there is no crossover. I can use this knowledge to structure the app better. For example, I would know that class statics hold their data only for the duration of the request and each new request will have its own value.
A Node.js server like Express.js works very differently, however. It is a single Node.js process that is running continually and listens for any new requests and passes them along to the correct handlers. This requires a different approach to development, as there is data that is kept in memory between requests. For example, class statics in such a case sound like they would hold data for the entire duration of the server uptime, not just for the duration of a single request.
So I have some questions about this:
Does it make sense to pre-load some data during Express.js startup (like reading private keys from file) so that it is already in memory when needed by a request and it would get re-used each time without being re-read from file? In a PHP server framework this wouldn't matter that much as everything gets built from 0 with each request.
How do I properly handle exceptions in a Node.js server process? If a PHP server script throws a fatal exception only that specific request dies, all other requests and any new ones run fine. If a fatal error happens in a Node.js server, it sounds like it would kill the entire process and thus all requests with it.
If you have any resources about how this topic, it'd be great if you could share them also.
1-
Does it make sense to pre-load some data during Express.js startup (like reading private keys from file) so that it is already in memory when needed by a request and it would get re-used each time without being re-read from file? In a PHP server framework this wouldn't matter that much as everything gets built from 0 with each request.
Yes, totally. You would bootstrap connections to databases, data read for files and similar tasks at application startup, so they are always available in every request.
There are some things to consider in this scenario:
During application startup, you can safely call synchronous methods, like fs.readFileSync etc, because there are not concurrent request on the single thread at this point.
CommonJS modules does cache their first value exported. So if you choose to use a dedicate module to handle secrets read from a file, database connections etc., you can:
secrets.js
const fs = require('fs');
const gmailSecretApiKey = fs.readFileSync('path_to_file');
const mailgunSecretApiKey = fs.readFileSync('path_to_file');
...
module.exports = {
gmailSecretApiKey,
mailgunSecretApiKey,
...
}
Then require this as your application startup. After this, any modules that does:
const gmailKey = require('.../secrets').gmailSecretApiKey won't read from file again. The results are cached in the module.
This is important because allow you to use require and import for consuming configuration in your controllers and modules, without bothering passing extra parameters to your http controllers or adding them to req objects.
Depending upon infrastructure, you may not be able to allow your application to not handling requests during startup (i.e. you have only one machine up and don't want to give service unavailble to your clients). In such cases, you can expose all the configuration and shared resources in promises, and bootstrap your web controllers as fast as possible, waiting for the promises inside. Let's say we need kafka up and running when handling a request on '/user':
kafka.js
function kafka() {
// return some promise of an object that can publish and read from kafka in a given port etc. etc.
}
module.exports = kafka();
So now in:
userController.js
const kafka = require('.../kafka');
router.get('/user', (req,res) => {
kafka.then(k => {
k.publish(req.user, 'userTopic'); // or whatever. This is just an example.
});
})
In this way, in the event an user makes a request during bootstrap, the request will still be handled (but will take some time). Requests made when the promise is already resolved won't notice anything.
There's no such thing as multiple threads in node. Anything you declare in a commonJS module or you write to process will be available in every request.
2-
How do I properly handle exceptions in a Node.js server process? If a PHP server script throws a fatal exception only that specific request dies, all other requests and any new ones run fine. If a fatal error happens in a Node.js server, it sounds like it would kill the entire process and thus all requests with it.
This really depends in the kind of exception that you find. It is specifically related to the request being processed, or is something critical for the whole application?
In the former case, you want to catch the exception and don't allow the whole thread to die. Now, 'catch the exception' in javascript is tricky, because you cannot catch asynchronous exceptions/errors, and you would likely use process.on('unhandledRejection') to handle that, like:
// main.js
try {
bootstrapMongoDb();
bootstrapKafka();
bootstrapSecrets();
... wahtever
bootstrapExpress();
} catch(e){
// read what `e` brings and decide.
// however, is worth to mention that errors raised during handling
// http request won't ever get handled here, because they are
// asynchronous. try/catch in javascript don't catch asynchronous errors.
}
process.on('unhandledRejection', e => {
// now here we are treating unhandled promise rejections, and errors that raise
// in express controllers are likely end up here. of course, I'm talking about
// promise rejections. I am not sure if this can catch Errors thrown in callbacks.
// You should never `throw new Error` inside an asynchronous callback.
});
Handling errors in node application is a whole topic on its own, too broad to be considered here. However some tips shouldn't do harm:
Never throw errors in callbacks. throw is synchronous. Callbacks and asynchrony should rely on an error parameter or a promise rejection.
You better get used to promises. Promises really improve error management in asynchronous code.
Javascript errors can be decorated with extra fields, so you can fill in trace id's and other id's that may be useful when reading logs of your system, given you will log your unhandled errors.
Now, in the latter case... sometimes there are failures that are totally disastrous for your app. Maybe you totally need a connection to a kafka or a mongo server, and if it is broken, then you may want to kill your application so clients receive a 503 when trying to connect.
Then, in some scenarios, you may want to kill your app, then let another service to reboot it when database is available again. This depends a lot on infrastructure and you may as well not kill your app never.
If you don't have a infrastructure that handles the health and reboot of your web service for you, it is probably safer to never let your application die. Said so, it's a good thing to at least use tools like nodemon or PM2 to ensure your app will relaunch after going down.
Bonus: why you should not throw errors in callbacks
Thrown errors propagates through the callstack. You have, let's say, function A who calls B, who in turn then calls C. Then C throw an Error. All of them only have synchronous code.
In such scenario, error propagates to B and, if it don't catch it, it propagates to A, and so on.
Now let's say that, instead, C doesn't throw an error by itself, but do call fs.readFile(path, callback). In the callback function, an error is thrown.
Here, when the callback is invoked, and the error thrown, A is already done and left the stack long ago, hundreds of milliseconds ago, maybe even more.
This means that any catch block in A won't catch the error, because is not even there already:
function bootstrapTimeout() {
try {
setTimeout(() => {
throw new Error('foo');
console.log('paco');
}, 200);
} catch (e) {
console.log('error trapped!');
}
}
function bootstrapInterval() {
setInterval(() => {
console.log('interval')
}, 50);
}
console.log('start');
bootstrapTimeout();
bootstrapInterval();
If you run that snippet, you would see how the error reach the top level and kill the process, even if the throw new Error('foo'); line was placed within a try/catch block.
error, result interface
Instead of using Errors to handle exceptions in asynchronous code, node.js has the standard behavior of expose an (error, result) interface for every callback you pass to an asynchronous method. If, for instance, fs.readFile happens to go wrong because the filename did not exist, it does not throw an error, it invokes the callback with the corresponding Error as the error parameter.
Like:
fs.readFile('notexists.png', (error, callback) => {
if(error){
// foo
}
else {
http.post('http://something.com', result, (error, callback) => {
if(error){
// oops, something went wrong with an http request
} else {
// keep working
// etc.
// maybe more callbacks, always with the dreadful 'if (error)'...
}
})
}
});
You always control errors in async operations in the callback, you should never throw.
Now this is a pain in the ass. Promises allow for much better error control because you can control async errors in one single catch block:
fsReadFilePromise('something.png')
.then(res => someHttpRequestPromise(res))
.then(httpResponse => someOtherAsyncMethod(httpResponse))
.then(_ => maybeSomeLoggingOrWhatever() )
.catch(e => {
// here you can control any error thrown in the previous chain.
});
And there's also async/await that allow you to mix async and sync code and treat promise rejections in catch blocks:
await function main() {
try {
a(); // some sync code
await b(); // some promise
} catch(e) {
console.log(e); // either an error throw in a() or a promise rejection reason in b();
}
}
However keep in mind that await is no magic and you really need to understand promises and asynchrony well in order to use it properly.
At the end, you always end up with one error control flow for synchronous errors via try/catch, and another for asynchronous errors, via callback parameters or promise rejections.
Callbacks can use try/catch when consuming synchronous api's, but should never throw. Any function can use catch to handle synchronous errors, but cannot rely on catch blocks to handle asynchronous errors. Kinda messy.
Does it make sense to pre-load some data during Express.js startup (like reading private keys from file) so that it is already in memory when needed by a request and it would get re-used each time without being re-read from file?
Yes it make sense if you structure your code to let these data be available in the request handler. In the following example, based on what i know, the staticResponse is readed only one time.
const express = require('express');
const staticResponse = fs.readFileSync('./data');
const app = express();
app.get('/', function (req, res) {
res.json(staticResponse);
});
app.listen(3000, function () {
console.log('Example app listening on port 3000!');
});
How do I properly handle exceptions in a Node.js server process? If a fatal error happens in a Node.js server, it sounds like it would kill the entire process and thus all requests with it.
Exactly, an unhandled exception make the entire nodejs process crash. There are multiple ways to manage error, there isn't 'the one for all' solution. Depends on how you write you're code.
all requests with it => keep in mind that nodejs is single thread.
app.post('/', function (req, res, next) {
try {
const data = JSON.parse(req.body.stringedData);
// use data
res.sendStatus(200);
} catch (err) {
return next(err);
}
});
I'm working on a process where I have a Queue, and I start with a known unit of work. As I process the unit of work, it will result in zero-or-more (unknown) units of work that gets added to the Queue. I continue to process the queue until there's no more work to perform.
I'm working on a proof-of-concept using Guzzle where I accept a first URL to seed the queue, then process the body of the response which may result in more URLs that need to be processed. My goal is to add them to the queue and have Guzzle continue processing them until there's nothing left in the queue.
In other cases, I can define a variable as the queue, and pass it by-reference into a function so that it gets updated with new work. But in the case of Guzzle Async Pools (which I think is the most efficient way to handle this), there doesn't seem to be a clear way to update the queue in-process and have the Pool execute the requests.
Does Guzzle provide a built-in approach for updating the list of Pool requests from inside a fulfilled Promise callback?
use ArrayIterator;
use GuzzleHttp\Promise\EachPromise;
use GuzzleHttp\TransferStats;
use Psr\Http\Message\ResponseInterface;
// Re-usable callback which prints the URL being requested
function onStats(TransferStats $stats) {
echo sprintf(
'%s (%s)' . PHP_EOL,
$stats->getEffectiveUri(),
$stats->getTransferTime()
);
}
// The queue of work to be performed
$requests = new ArrayIterator([
$client->get('http://httpbin.org/anything', [
'on_stats' => 'onStats',
])
]);
// Process the queue, which results in more work to be performed
$p = (new EachPromise($requests, [
'concurrency' => 50,
'fulfilled' => function(ResponseInterface $response) use ($client, &$requests) {
$hash = bin2hex(random_bytes(10));
$requests[] = $client->get(sprintf('http://httpbin.org/anything/%s', $hash), [
'on_stats' => 'onStats',
]);
},
'rejected' => function($reason) {
echo $reason . PHP_EOL;
},
]))->promise();
// Wait for everything to finish
$p->wait(true);
My question appears to be similar to Incrementally add requests to a Guzzle 5.0 Pool (Rolling Requests), but is different in that these refer to different major versions of Guzzle.
After posting this, I was able to do more searching and found some more SO threads and GitHub Issues for Guzzle. I found this library, which appears to address the problem.
https://github.com/alexeyshockov/guzzle-dynamic-pool
I played around with the PHP 7.2 runtime and HTTP trigger on Alibaba Cloud Function Compute. The basic example in the documentation is the following:
<? php
use RingCentral\Psr7\Response;
function handler($request, $context): Response{
/*
$body = $request->getBody()->getContents();
$queries = $request->getQueryParams();
$method = $request->getMethod();
$headers = $request->getHeaders();
$path = $request->getAttribute("path");
$requestURI = $request->getAttribute("requestURI");
$clientIP = $request->getAttribute("clientIP");
*/
return new Response(
200,
array(
"custom_header1" => "v1"
),
"hello world"
);
}
This works quite well. It's easy to get the query parameters from an URL. But the body content is only available in a whole string with
$request->getBody()->getContents();
Although the documentation says that the $request parameter follows the PSR-7 HTTP Message standard, it is not possible to use $request->getParsedBody() to deliver the values submitted by POST method. It didn't work as expected - the result remains empty.
The reason is the underlying technology. Alibaba Cloud Function Compute makes use of the event-driven React PHP library to handle the requests (you can check this by analyzing the $request object). So the $_POST array is empty and there is no "easy way to get POST data".
Luckily, Alibaba's Function Compute handler provides the body content by $request->getBody()->getContents(); as a string like
"bar=lala&foo=bar"
So a solution seems easiser than thought at the beginning, you can e.g. use PHP's own parse_str() function:
$data = [];
$body = $request->getBody()->getContents();
parse_str($body,$data);
If you place this snippet in the handler function, the POST variables are stored in the $data array and ready for further processing.
Hope that this helps somebody who asked the same questions than I. :-)
Kind regards,
Ralf
As you can see in the documentation you need to add a RequestBodyParserMiddleware as middleware to get a parsed PSR-7 request. It seems you didn't do that.
Also keep in mind that only the Content-Types: application/x-www-form-urlencoded and multipart/form-data are supported here. So make sure the client need to send these headers so the request can be parsed. If it's another Content-Type you need to use another middleware.
See: https://github.com/reactphp/http#requestbodyparsermiddleware for more information.
I hope this helps!
#legionth: I apologize that I didn't use the comment feature here, but my answer is too long. :-)
Thanks a lot for your comments - the usage of RequestBodyParserMiddleware is a great solution if you can control the server code. But in the context of Alibaba Cloud Function Compute service this seems not possible. I tried to find out more information about the invocation process - here are my results:
Function Compute makes use of the Docker image defined in https://github.com/aliyun/fc-docker/blob/master/php7.2/run/Dockerfile .
In the build process they download a PHP runtime environment from https://my-fc-testt.oss-cn-shanghai.aliyuncs.com/php7.2.tgz . (I didn't find this on GitHub, but the code is public downloadable.)
A shell script start_server.sh starts a PHP-CGI binary and runs a PHP script server.php.
In server.php a React\Http\Server is started by:
$server = new Server(function (ServerRequestInterface $request) {
[...]
});
[...]
$socket = new \React\Socket\Server(sprintf('0.0.0.0:%s', $port), $loop);
$server->listen($socket);
$loop->run();
As seen in the Function Compute documentation (& example of FC console), I can only use two functions:
/*
if you open the initializer feature, please implement the initializer function, as below:
*/
function initializer($context) {
}
and the handler function you can find in my first post.
Maybe Alibaba will extend the PHP runtime in future to make it possible to use a custom middleware, but currently I didn't find a way to do this.
Thanks again & kind regards,
Ralf
So, I want to create an asynchronous web service in PHP. Why? Because I've a nice async front-end, but Chrome will block my requests if I have more than 6 active TCP connections. Of course I have read some similar questions like:
Async requests in PHP
Multiple PHP Requests Crashing Page
but these don't cover my question.
I installed pthreads with the intention that I would be able to make multiple requests in different threads so that my PHP wasn't blocking other requests(in my situation I start eg. a long process and I want to be able to poll if the process is still busy or not).
PHPReact seems to be a nice library(non-blocking I/O, async) but this won't work either(still sync).
Am I missing something or is this nowadays still not possible in PHP?
class Example{
private $url;
function __construct($url){
$this->url = $url;
echo 'pooooof request to ' . $this->url . ' sent <br />';
$request = new Request($this->url);
$request->start();
}
}
class Request extends Thread{
private $url;
function __construct($url){
$this->url = $url;
}
function run(){
// execute curl, multi_curl, file_get_contents but every request is sync
}
}
new Example('https://gtmetrix.com/why-is-my-page-slow.html');
new Example('http://php.net/manual/en/function.file-get-contents.php');
new Example('https://www.google.nl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=php%20file%20get%20contents');
The ideal situation would be to make use of callbacks.
ps. I have seen some servers(like Node.js) that are providing this functionality, but I prefer a native approach. When this is not possible I'm really thinking of switching to Python, Java, Scala or some other language that supports async.
I can't really make sense of what you are doing ...
Asynchronous and Parallel are not interchangeable words.
Threads at the frontend of a web application don't make sense.
You don't need threads to make many I/O bound tasks concurrent; That is what non-blocking I/O is for (asynchronous concurrency).
Parallel concurrency seems like overkill here.
Regardless, the reason your requests appear synchronous is the way this constructor is written:
function __construct($url){
$this->url = $url;
echo 'pooooof request to ' . $this->url . ' sent <br />';
$request = new Request($this->url);
$request->start();
}
The Request thread will be joined before control is returned to the caller of __construct (new) because the variable goes out of scope, and so is destroyed (joining is part of destruction).
I have a php application that gets requests for part numbers from our server. At that moment, we reach out to a third party API to gather pricing information to make sure we have the latest pricing for that particular request. Sometimes the third party API is slow or it might be down, so we have a database that stores the latest pricing requests for each particular part number that we can use as a fallback. I'd like to run the request to the third party API and the database in parallel using Gearman. Here is the idea:
Receive request
Through gearman, create two jobs:
Request to third party API
MySQL database lookup
Wait in a loop and return the results based on the following conditions:
If the third party API has completed return that result, return that result immediately
If an elapsed time has passed, (e.g. 2 seconds) and the third party API hasn't responded, return the MySQL lookup data
Using gearman, my thoughts were to either run the two tasks in the foreground and break out of runTasks() within the setCompleteCallback() call, or to run them in the background and check in on the two tasks within a separate loop and check in on the tasks using jobStatus().
Unfortunately, I can't get either route to work for me while still getting access to the resulting data. Is there a a better way, or are there some existing examples of how someone has made this work?
I think you've described a single blocking problem, namely the results of an 3rd-party API lookup. There's two ways you can handle this from my point of view, either you could abort the attempt altogether if you decide that you've run out of time or you could report back to the client that you ran out of time but continue on with the lookup anyway, just to update your local cache just in case it happens to respond slower than you would like. I'll describe how I would go about the former problem because that would be easier.
From the client side:
$request = array(
'productId' => 5,
);
$client = new GearmanClient( );
$client->addServer( '127.0.0.1', 4730 );
$results = json_decode($client->doNormal('apiPriceLookup', json_encode( $request )));
if($results && property_exists($results->success) && $results->success) {
// Use local data
} else {
// Use fresh data
}
This will create a job on the job server with a function name of 'apiPriceLookup' and pass it the workload data containing a product id of 5. It will wait for the results to come back, and check for a success property. If it exists and is true, then the api lookup was successful.
The idea is to set the timeout condition then in the worker task, which completely depends on how you're implementing the API lookup. If you're using cURL (or some wrapper around cURL), you can see the answer to how to detect a timeout here.
From the worker side:
$worker= new GearmanWorker();
$worker->addServer();
$worker->addFunction("apiPriceLookup", "apiPriceLookup", $count);
while ($worker->work());
function apiPriceLookup($job) {
$payload = json_decode($job->workload());
try {
$results = [
'data' => PerformApiLookupForProductId($payload->productId),
'success' => true,
];
} catch(Exception $e) {
$results = ['success' => false];
}
return json_encode($results);
}
This just creates a GearmanWorker object and subscribes it the function of apiPriceLookup. It will call the function apiPriceLookup whenever a client submits a task to the job server. That function calls out to another function, PerformApiLookupForProductId, which should be written so as to throw an exception whenever a timeout condition occurs.
I don't think this would be considered using exceptions to control logic flow, I think timeout conditions generally are exceptional (or should be) events. For instance, Guzzle will throw a GuzzleHttp\Exception\RequestException when it has decided to timeout.