How to use amphp/parallel library for non-blocking process - php

I want to use the amphp/parallel library for non-blocking process. I have a simple download file function which does a curl hit to the remote image file and save it to the local. I'm hitting this method through a REST API. Basically I want a process where aysnc download should be done on backend and it could be said as, REST API hit the function and function says "Hey, OK I'm downloading in background you can proceed ahead". Means non-blocking and API gets response as ok , not to wait. Meanwhile, if there is some network failure onto download, worker can restart the process in some time. How do I start?
I have tried the following code, but did not work.
require_once "vendor/autoload.php";
use Amp\Loop;
use Amp\Parallel\Worker\CallableTask;
use Amp\Parallel\Worker\DefaultWorkerFactory;
\Amp\Loop::run(function () {
$remote_file_url = "some remote image url"; //http://example.com/some.png
$file_save_path = "save path for file"; //var/www/html/some.png
$factory = new DefaultWorkerFactory();
$worker = $factory->create();
$result = yield $worker->enqueue(new CallableTask('downloadFile', [$remote_file_url, $file_save_path]));
$code = yield $worker->shutdown();
});
//downloadFile is a simple download function
function downloadFile($remoteFile, $localFile) {
if (!$remoteFile || !$localFile) {
return;
}
set_time_limit(0);
$fp = fopen($localFile, 'w+');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $remoteFile);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($ch);
curl_close($ch);
fclose($fp);
return $result ? true : false;
}
I'm getting this error:
PHP Fatal error: Uncaught Amp\\Parallel\\Worker\\TaskError: Uncaught Error in worker with message "Call to undefined function downloadFile()" and code "0" in /var/www/html/test/vendor/amphp/parallel/lib/Worker/Internal/TaskFailure.php:45\nStack trace:\n#0 /var/www/html/test/vendor/amphp/parallel/lib/Worker/TaskWorker.php(126): Amp\\Parallel\\Worker\\Internal\\TaskFailure->promise()\n#1 [internal function]: Amp\\Parallel\\Worker\\TaskWorker->Amp\\Parallel\\Worker\\{closure}()\n#2 /var/www/html/test/vendor/amphp/amp/lib/Coroutine.php(76): Generator->send(Object(Amp\\Parallel\\Worker\\Internal\\TaskFailure))\n#3 /var/www/html/test/vendor/amphp/amp/lib/Internal/Placeholder.php(130): Amp\\Coroutine->Amp\\{closure}(NULL, Object(Amp\\Parallel\\Worker\\Internal\\TaskFailure))\n#4 /var/www/html/test/vendor/amphp/amp/lib/Coroutine.php(81): Amp\\Coroutine->resolve(Object(Amp\\Parallel\\Worker\\Internal\\TaskFailure))\n#5 /var/www/html/test/vendor/amphp/amp/lib/Internal/Placeholder.php(130): Amp\\Coroutine->Amp\\{closure}(NULL, Object(Amp\\Parallel\\Worker\\Internal\\TaskFailur in /var/www/html/test/vendor/amphp/parallel/lib/Worker/Internal/TaskFailure.php on line 45
I have similar requirement as asked in How does amphp work regarding the background running process.

Generally, Amp doesn't work magically in the background. If you use PHP via PHP-FPM or alike Amp will be shut down once the response is done, just like anything else.
If you want to move work from these requests into background processes, you need some kind of queue (e.g. beanstalkd) and a (permanent) worker to process these queued jobs. You can write such a daemonized worker with Amp, but it will have to be started out-of-band.
That said, if you just want concurrent downloads amphp/artax is better suited than using amphp/parallel, as it has a way lower overhead compared to a separate PHP process per HTTP request.

The question doesn't clarify where the downloadFile() function has been defined. As per amphp/parallel documentation, the callback must be auto-loaded so that Amphp can find it when the task is executed.
Here's a suggestion:
Put the downloadFile() function in a separate file, say functions.inc.
In your composer.json, under autoload/files, add an entry for functions.inc.
{
"autoload": {
"files": ["functions.inc"]
}
}
Run composer install so that the autoload.php is regenerated to reflect the above change.
Try executing the file containing your first code snippet containing Loop::run() etc.
I think this should do the trick. Apart from this, please refer to kelunik's comment which contains valuable information.

Related

TYPO3 v10 Image Processing in Backend Environment

We recently started our first TYPO3 10 project and are currently struggling with a custom import script that moves data to Algolia. Basically, everything works fine, but there is an issue with FAL images, specifically, when they need to be processed.
From the logs, I could find something called DeferredBackendImageProcessor, but the docs are not mentioning this, or I am not looking for the right thing. I'm not sure.
Apparently, images within the backend environment are not just processed anymore. There is something called "processingUrl" which has to be called once for the image to be processed.
I tried calling that url with CURL, but it does not work. The thing is, when I open that "processingUrl" in the browser, it has not effect - but if I open that link in a browser, where I am logged into the TYPO3 backend, then the image is processed.
I'm kind of lost here, as I need the images to be processed within the import script that runs via the scheduler from the backend (manual, not via cron).
That is the function where the problem occurs, the curl part has no effect here, sadly.
protected function processImage($image, $imageProcessingConfiguration)
{
if ($image) {
$scalingOptions = array (
'width' => 170
);
$result = $this->contentObject->getImgResource('fileadmin/'.$image, $scalingOptions);
if (isset($result[3]) && $result[3]) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $result[3]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
return '/fileadmin'.$result['processedFile']->getIdentifier();
}
}
return '';
}
$result[3] being the processing url. Example of the url:
domain.com/typo3/index.phproute=%2Fimage%2Fprocess&token=6cbf8275c13623a0d90f15165b9ea1672fe5ad74&id=141
So my question is, how can I process the image from that import script?
I am not sure if there is a more elegant solution but you could disable the deferred processing during your jobs:
$processorConfiguration = $GLOBALS['TYPO3_CONF_VARS']['SYS']['fal']['processors']
unset ($GLOBALS['TYPO3_CONF_VARS']['SYS']['fal']['processors']['DeferredBackendImageProcessor'])
// ... LocalImageProcessor will be used
$GLOBALS['TYPO3_CONF_VARS']['SYS']['fal']['processors'] = $processorConfiguration;
References:
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/core/Classes/Resource/Processing/ProcessorRegistry.php
https://github.com/TYPO3/TYPO3.CMS/blob/10.4/typo3/sysext/core/Configuration/DefaultConfiguration.php#L284

Gateway Timeout 504 on multiple requests. Apache

I have an XML file localy. It contains data from marketplace.
It roughly looks like this:
<offer id="2113">
<picture>https://anotherserver.com/image1.jpg</picture>
<picture>https://anotherserver.com/image2.jpg</picture>
</offer>
<offer id="2117">
<picture>https://anotherserver.com/image3.jpg</picture>
<picture>https://anotherserver.com/image4.jpg</picture>
</offer>
...
What I want is to save those images in <picture> node localy.
There are about 9,000 offers and about 14,000 images.
When I iterate through them I see that images are being copied from that another server but at some point it gives 504 Gateway Timeout.
Thing is that sometimes error is given after 2,000 images sometimes way more or less.
I tried getting only one image 12,000 times from that server (i.e. only https://anotherserver.com/image3.jpg) but it still gave the same error.
As I've read, than another server is blocking my requests after some quantity.
I tried using PHP sleep(20) after every 100th image but it still gave me the same error (sleep(180) - same). When I tried local image but with full path it didn't gave any errors. Tried second server (non local) the same thing occured.
I use PHP copy() function to move image from that server.
I've just used file_get_contents() for testing purposes but got the same error.
I have
set_time_limit(300000);
ini_set('default_socket_timeout', 300000);
as well but no luck.
Is there any way to do this without chunking requests?
Does this error occur on some one image? Would be great to catch this error or just keep track of the response delay to send another request after some time if this can be done?
Is there any constant time in seconds that I have to wait in order to get those requests rollin'?
And pls give me non-curl answers if possible.
UPDATE
Curl and exec(wget) didn't work as well. They both gone to same error.
Can remote server be tweaked so it doesn't block me? (If it does).
p.s. if I do: echo "<img src = 'https://anotherserver.com/image1.jpg'" /> in loop for all 12,000 images, they show up just fine.
Since you're accessing content on a server you have no control over, only the server administrators know the blocking rules in place.
But you have a few options, as follows:
Run batches of 1000 or so, then sleep for a few hours.
Split the request up between computers that are requesting the information.
Maybe even something as simple as changing the requesting user agent info every 1000 or so images would be good enough to bypass the blocking mechanism.
Or some combination of all of the above.
I would suggest you to try following
1. reuse previously opened connection using CURL
$imageURLs = array('https://anotherserver.com/image1.jpg', 'https://anotherserver.com/image2.jpg', ...);
$notDownloaded = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
foreach ($imageURLs as $URL) {
$filepath = parse_url($URL, PHP_URL_PATH);
$fp = fopen(basename($filepath), "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_exec($ch);
fclose($fp);
if (curl_getinfo($ch, CURLINFO_RESPONSE_CODE) == 504) {
$notDownloaded[] = $URL;
}
}
curl_close($ch);
// check to see if $notDownloaded is empty
If images are accessible via both https and http try to use http instead. (this will at least speed up the downloading)
Check response headers when 504 is returned as well as when you load url your browser. Make sure there are no X-RateLimit-* headers. BTW what is the response headers actually?

Why does this code so negatively affect my server's performance?

I have a Silverstripe site that deals with very big data. I made an API that returns a very large dump, and I call that API at the front-end by ajax get.
When ajax calling the API, it will take 10 mins for data to return (very long json data and customer accepted that).
While they are waiting for the data return, they open the same site in another tab to do other things, but the site is very slow until the previous ajax request is finished.
Is there anything I can do to avoid everything going unresponsive while waiting for big json data?
Here's the code and an explanation of what it does:
I created a method named geteverything that resides on the web server as below, it accessesses another server (data server) to get data via streaming API (sitting in data server). There's a lot of data, and the data server is slow; my customer doesn't mind the request taking long, they mind how slow everything else becomes. Sessions are used to determine particulars of the request.
protected function geteverything($http, $id) {
if(($System = DataObject::get_by_id('ESM_System', $id))) {
if(isset($_GET['AAA']) && isset($_GET['BBB']) && isset($_GET['CCC']) && isset($_GET['DDD'])) {
/**
--some condition check and data format for AAA BBB CCC and DDD goes here
**/
$request = "http://dataserver/streaming?method=xxx";
set_time_limit(120);
$jsonstring = file_get_contents($request);
echo($jsonstring);
}
}
}
How can I fix this, or what else would you need to know in order to help?
The reason it's taking so long is your downloading the entirity of the json to your server THEN sending it all to the user. There's no need to wait for you to get the whole file before you start sending it.
Rather than using file_get_contents make the connection with curl and write the output directly to php://output.
For example, this script will copy http://example.com/ exactly as is:
<?php
// Initialise cURL. You can specify the URL in curl_setopt instead if you prefer
$ch = curl_init("http://example.com/");
// Open a file handler to PHP's output stream
$fp = fopen('php://output', 'w');
// Turn off headers, we don't care about them
curl_setopt($ch, CURLOPT_HEADER, 0);
// Tell curl to write the response to the stream
curl_setopt($ch, CURLOPT_FILE, $fp);
// Make the request
curl_exec($ch);
// close resources
curl_close($ch);
fclose($fp);

Timing out on command line

I had previously asked a question, and got the answer, but I think I've run into another problem.
The php script I'm using does this:
1 - transfers a file to my server from my backup server
2 - when it's done transfering it sends some post data to it using curl, which creates a zip file
3 - when it's done, the result is echoed and depending on what the result is; transfers the file, or does nothing.
My problem is this:
When the file is small enough (under 500MB) it creates it, and transfers back no problem. When it's larger, it timesout, finishes creating the zip on the remote server, but because it timed out it doesn't get transfered.
I'm running this from a command line on the backup server. I have this in the php script:
set_time_limit(0); // ignore php timeout
ignore_user_abort(true); // keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean(); // remove output buffers
But it still timesout when I run sudo php backup.php
Is using curl making it timeout like a browser on the other end where the zip is being made? I think the problem is the response isn't being echo'd out.
Edits:
(#symcbean)
I'm not seeing anything, which is why I'm struggling. When I run it from the browser, I see the loading thing in the address bar. After about 30 seconds it just stops. When I do it from the command line, same deal. 30 seconds and it just stops. This only happens when large zips need to be created.
It's being invoked via a file. The file loads a class, sends the connection information to the class. Which contacts the server to make the zip, transfers the zip back, does some stuff to it then transfers it to S3 for archiving.
It logs into the remote server, uploads a file with curl. upon a valid response, it curls again with the location of the file as a url (I'll always know what it is), which fires up the php file I just transfered over. The zip ALWAYS gets created no problem, even up to 22GB, just sometimes takes a long time of course. After that it waits for a response of "created". Waiting for that response is where it dies.
So the zip always gets created, but the waiting time is what "I think" is making it die.
Second Edit:
I tried this from the command line:
$ftp_connect= ftp_connect('domain.com');
$ftp_login = ftp_login($ftp_connect,'user','pass');
ftp_pasv($ftp_connect, true);
$upload = ftp_put($ftp_connect, 'filelist.php', 'filelist.php', FTP_ASCII);
$get_remote = 'filelist.php';
$post_data = array (
'last_bu' => '0'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'domain.com/'.$get_remote);
curl_setopt($ch, CURLOPT_HEADER, 0 );
// adding the post variables to the request
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
//echo the following to get response
$response = curl_exec($ch);
curl_close($ch);
echo $response;
and got this:
<HTML>
<HEAD>
<TITLE>500 Internal Server Error</TITLE>
</HEAD><BODY>
<H1>Internal Server Error</H1>
The server encountered an internal error or
misconfiguration and was unable to complete
your request.<P>
Please contact the server administrator to inform of the time the error occurred
and of anything you might have done that may have
caused the error.<P>
More information about this error may be available
in the server error log.<P>
<HR>
<ADDRESS>
Web Server at domain.com
</ADDRESS>
</BODY>
</HTML>
Again, the error log is blank, the zip still gets created, but because of the timeout around 650MB of creation I can't get the response.
The problem is in the server code that generates the file to be returned.
Check the php error log
It may be timing out for a few reasons but the log shouldl tell you why.
I fixed it guys, thank you so much to everyone who helped me, it pointed me in the right directions.
In the end, the problem was on the remote server. What was happening was that it was timing out the cURL connection, which didn't send the result I needed back.
What I did to fix it was add a function to my class that (again) using curl, checks for the zip file http code I know it's creating When it finishes, then throw the result locally. If it's not finished, sleep for a few seconds and check again.
private function watchDog(){
$curl = curl_init($this->host.'/'.$this->grab_file);
//don't fetch the actual page, you only want to check the connection is ok
curl_setopt($curl, CURLOPT_NOBODY, true);
//do request
$result = curl_exec($curl);
//if request did not fail
if ($result !== false) {
//if request was ok, check response code
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 404) {
sleep(7);
self::watchDog();
}
else{
return 'zip created';
}
}
curl_close($curl);
}

Web Server CPU/Memory Resource LImited as a result of multiple ajax calls to PHP script

I'm hosting a website with Webhostingpad, and I'm running into an issue. As my homepage loads, I am currently making 8 concurrent ajax calls to a php script that returns content being used for the homepage. The 8 ajax calls are calling a file called run.php. The job of this file is just to call a function from a class called amazon, that is defined in another file called amazon.php.
This is the URL being called 8 times via ajax. The only difference between the 8 calls is the query string:
http://my-domain.com/run.php?f=getItemsById&arg=id:B0043OYFKU,B001JKTTVQ,B004Y9D90Q,B003S516XO,B002XQ1YTK,B003V265QW,B00121UVU0,B004EDYQUE,B000P22TIY,B000E7WHLY
As you can see, I'm passing the function name in the "f" parameter of the url.
The run.php file looks like this:
require_once('amazon.php');
$function_name = $_REQUEST['f'];
$arg_parameter = $_REQUEST['arg'];
$arg_tmp = explode(";", $arg_parameter);
$arg_array = array();
foreach($arg_tmp as $key_value_pair){
$exploded = explode(':', $key_value_pair);
$key = $exploded[0];
$value = $exploded[1];
$arg_array[$key] = $value;
}
$amazon = new amazon();
echo $amazon->$function_name($arg_array);
As you can see, this file is simply calling a function from amazon.php and echoing the result so I can use it in the callback of the ajax function.
Here's the relevant code from amazon.php regarding the getItemsById() function:
class amazon {
private $url;
private $accessKey = 'AKIAISJ2OHTBA888311SD';
private $secretAccessKey = 'RM8EG61w3dLwjymtAEVdfsdiesd883711lskdf';
function __construct(){
$this->url = 'http://webservices.amazon.com/onca/xml?Service=AWSECommerceService&AWSAccessKeyId=' . $this->accessKey . '&AssociateTag=global-18&Version=2011-08-01';
}
public function getItemsById($args = array()){
$itemIds = $args['id'];
$url = $this->url;
$url .= '&Operation=ItemLookup';
$url .= '&ItemId=' . $itemIds;
$url .= '&ResponseGroup=Images,Small,Offers,VariationSummary,EditorialReview';
$signedUrl = $this->amazonSign($url, $this->secretAccessKey);
$returned_xml = file_get_contents($signedUrl);
return $returned_xml;
}
}
As you can see above, this function is calling a URL for amazon.com's API, and returning XML using PHP's file_get_contents() function. My issue is that some of the ajax calls made to run.php are successfully exexcuted, while others are getting HTTP 500 Internal Server Errors. When I run this on my local server, it works fine. When I run it on a development server at my office, it works fine. However, I consistently see this issue on my Webhostingpad server. Some of the ajax calls return HTTP 500 errors.
I've spoken to Webhostingpad support and the only insight they have offered me is that I'm exceeding my CPU/Memory resource limit. The error logs from the server seem to confirm that:
[Tue Feb 19 21:36:39 2013] [error] [client 68.174.126.115] (12)Cannot
allocate memory: couldn't create child process: /opt/suphp/sbin/suphp
for /home/my-server/public_html/my-domain.com/run.php, referer:
http://my-domain.com/
My question for the community is if anything is standing out here as obviously memory intensive? I feel like what I'm doing isn't that out of the oridnary, so I'm trying to figure out if I should be focusing on optimizing my scripts, or if I should simply be looking for another hosting provider.
If executing the AJAX requests synchronously is not an option, you could still do something about the memory consumption of the PHP scripts. Currently the PHP script takes the entire XML that is returned from amazon in memory before it is echoed to the client. If the XML is large and you do it 8 times concurrently, running out of memory is not so strange.
A solution would be to do the request to amazon with cURL and use the cURL option CURLOPT_WRITEFUNCTION to echo the results to the client. This way, you could make the amazon.php script stream the result XML to the client and it won't use as much memory.
Example amazon.php code:
<?
function writeCallback($handle, $data)
{
echo $data;
ob_flush();
flush();
return strlen($data);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://webservices.amazon.com/onca/xml');
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'writeCallback');
ob_start(); // start output buffer
curl_exec($ch); // commence streaming

Categories