Flysystem S3 remote file download always corrupted

Flysystem S3 remote file download always corrupted - php

I recently started using Flysystem in an existing application with the intention of abstracting the local and remote (specifically, S3) filesystems. Everything was working ok on my development environment, on which I successfully configured the LocalAdapter. However, I cannot get S3 file downloads to work. I'd like to point out that file uploads are working perfectly, given that I can successfully download the file by manually browsing the S3 bucket in the AWS management console. That being said, I will skip the code that initializes the $filesystem variable.
My application is using a PSR-7 approach. That is, the code below is inside a function that is passed an object of type Psr\Http\Message\ServerRequestInterface as first argument and an object of type Psr\Http\Message\ResponseInterface as the second. Given that the local filesystem works fine, I think it is safe to assume that the problem doesn't lie there.
This is the code:
<?php
$stream = new \Zend\Diactoros\Stream($filesystem->readStream($filename));
$filesize = $stream->getSize();
return $response
->withHeader('Content-Type', 'application/pdf')
->withHeader('Content-Transfer-Encoding', 'Binary')
->withHeader('Content-Description', 'File Transfer')
->withHeader('Pragma', 'public')
->withHeader('Expires', '0')
->withHeader('Cache-Control', 'must-revalidate')
->withHeader('Content-Length', "{$filesize}")
->withBody($stream);
When I dump the $stream variable and the $filesize variable the results are as expected. The remote file contents are successfully printed. However, the file download is always corrupted and the file size is always of 0 bytes.
I am assuming that Flysystem takes care of everything behind the scenes and that I don't have to manually download the file to a temp folder first, before serving it to the client.
Any clue to what could be the problem?
Update 1
I have also tried with the following code, without any luck. However, it continues to work locally:
use Zend\Diactoros\CallbackStream;
$stream = new CallbackStream(function() use ($filesystem, $filename) {
$resource = $filesystem->readStream($filename);
while (!feof($resource)) {
echo fread($resource, 1024);
}
fclose($resource);
return '';
});
and
use Zend\Diactoros\CallbackStream;
$stream = new CallbackStream(function() use ($filesystem, $filename) {
$resource = $filesystem->readStream($filename);
fpassthru($resource);
return '';
});

Removing the Content-Length header seems to solve the problem.
See https://github.com/thephpleague/flysystem/issues/543 for more details.

Related

Google Cloud Storage get temp filename (using fopen('php://temp'))

Similar question asked here a few years ago but with no answer:
Get path of temp file created via fopen('php://temp')
I am using Google Cloud Storage to download a number of large files in parallel and then upload them to another service. Essentially transferring from A to C, via my server B.
Under the hood, Google's StorageObject -> downloadAsStream() uses Guzzle to get the file via fopen('php://temp','r+').
I am running into a disk space issue because Google's Cloud Storage library is not cleaning up the temp files if there is an exception thrown during the transfer. (This is expected behaviour per the docs). Every retry of the script dumps another huge file in my tmp dir which isn't cleaned up.
If Guzzle used tmpfile() I would be able to use stream_get_meta_data()['uri'] to get the file path, but because it uses php://temp, this option seems to be blocked off:
[
"wrapper_type" => "PHP",
"stream_type" => "TEMP",
"mode" => "w+b",
"unread_bytes" => 0,
"seekable" => true,
"uri" => "php://temp", // <<<<<<<< grr.
]
So: does anyone know of a way to get the temporary file name created by fopen('php://temp') such that I can perform a manual clean-up?
UPDATE:
It appears this isn't possible. Hopefully GCS will update their library to change the way the temp file is generated. Until then I am using the following clean-up code:
public function cleanTempDir(int $timeout = 7200) {
foreach (glob(sys_get_temp_dir()."/php*") as $f) {
if (is_writable($f) && filemtime($f) < (time() - $timeout))
unlink($f);
}
}
UPDATE 2
It is possible, see accepted answer below.

Something like the following should do the trick:
use Google\Cloud\Storage\StorageClient;
$client = new StorageClient;
$tempStream = tmpfile();
$tempFile = stream_get_meta_data($tempStream)['uri'];
try {
$stream = $client->bucket('my-bucket')
->object('my-big-ol-file')
->downloadAsStream([
'restOptions' => [
'sink' => $tempStream
]
]);
} catch (\Exception $ex) {
unlink($tempFile);
}
The restOptions option allows you to proxy through commands to the underlying HTTP 1.1 transport (Guzzle, by default). My apologies this isn't clearly documented, but hope it helps!

Google Cloud Platform Support here!
At the moment, using the php Cloud Storage library it is not possible to get the temporary file name created when using the method downloadAsStream(). Therefore I have created a Feature Request on your behalf, you can follow it here.
As a workaround, you may be able to remove the file by hand, you can get the temp file name using the following command:
$filename = shell_exec('ls -lt | awk 'NR==2' | cut -d: -f2 | cut -d " " -f2');
After that, $filename will contain the last modified file name, which will be the one that failed and you wish to remove. With the filename you can now proceed to remove it.
Notice that you will have to be in your php://temp folder before executing the function.

It will most likely be the system configured temporary directory which you can get by sys_get_temp_dir.
Note that this will only save to file if needed and can reside in memory.
https://www.php.net/manual/en/wrappers.php.php
Edit: Ok, the file created. Then you can probably use stream_get_meta_data on the stream handle to get that information from the stream.

php - unlink throws error: Resource temporarily unavailable

Here is the piece of code:
public function uploadPhoto(){
$filename = '../storage/temp/image.jpg';
file_put_contents($filename,file_get_contents('http://example.com/image.jpg'));
$photoService->uploadPhoto($filename);
echo("If file exists: ".file_exists($filename));
unlink($filename);
}
I am trying to do the following things:
Get a photo from a URL and save it in a temp folder in my server. This works fine. The image file is created and echoes If file exists: 1 when echo("If file exists: ".file_exists('../storage/temp/image.jpg'));.
Pass that file to another function that hanldes uploading the file to Amazon s3 bucket. The file gets stored in my s3 bucket.
Delete the photo stored in the temp folder. This doesn't work! I get an error saying:
unlink(../storage/temp/image.jpg): Resource temporarily unavailable
If I use rename($filename,'../storage/temp/renimage.jpg'); instead of unlink($filename); i get an error:
rename(../storage/temp/image.jpg,../storage/temp/renimage.jpg): The process cannot access the file because it is being used by another process. (code: 32)
If I remove the function call $photoService->uploadPhoto($filename);, everything works perfectly fine.
If the file is being used by another process, how do I unlink it after the process has been completed and the file is no longer being used by any process? I do not want to use timers.
Please help! Thanks in advance.

Simplest solution:
gc_collect_cycles();
unlink($file);
Does it for me!
Straight after uploading a file to amazon S3 it allows me to delete the file on my server.
See here: https://github.com/aws/aws-sdk-php/issues/841
The GuzzleHttp\Stream object holds onto a resource handle until its
__destruct method is called. Normally, this means that resources are freed as soon as a stream falls out of scope, but sometimes, depending
on the PHP version and whether a script has yet filled the garbage
collector's buffer, garbage collection can be deferred.
gc_collect_cycles will force the collector to run and call __destruct
on all unreachable stream objects.
:)

Just had to deal with a similar Error.
It seems your $photoService is holding on to the image for some reason...
Since you didn't share the code of $photoService, my suggestion would be to do something like this (assuming you don't need $photoService anymore):
[...]
echo("If file exists: ".file_exists($filename));
unset($photoService);
unlink($filename);
}
The unset() method will destroy the given variable/object, so it can't "use" (or wharever it does) any files.

I sat over this problem for an hour or two, and finally realized that "temporarily unavailable" really means "temporarily".
In my case, concurrent PHP scripts access the file, either writing or reading. And when the unlink() process had a poor timing, then the whole thing failed.
The solution was quite simple: Use the (generally not very advisable) # to prevent the error being shown to the user (sure, one could also stop errors from beinf printed), and then have another try:
$gone = false;
for ($trial=0; $trial<10; $trial++) {
if ($gone = #unlink($filename)) {
break;
}
// Wait a short time
usleep(250000);
// Maybe a concurrent script has deleted the file in the meantime
clearstatcache();
if (!file_exists($filename)) {
$gone = true;
break;
}
}
if (!$gone) {
trigger_error('Warning: Could not delete file '.htmlspecialchars($filename), E_USER_WARNING);
}
After solving this issue and pushing my luck further, I could also trigger the "Resource temporarily unavailable" issue with file_put_contents(). Same solution, now everything works fine.
If I'm wise enough and/or unlinking fails in the future, I'll replace the # by ob_start(), so the error message could tell me the exact error.

I had the same problem. The S3 Client doesn't seem to want to unlock before unlink is being executed. If you extract the contents into a variable and set it as the 'body' in the putObject array:
$fileContent = file_get_contents($filepath);
$result = $s3->putObject(array(
'Bucket' => $bucket,
'Key' => $folderPath,
'Body' => $fileContent,
//'SourceFile' => $filepath,
'ContentType' => 'text/csv',
'ACL' => 'public-read'
));
See this answer: How to unlock the file after AWS S3 Helper uploading file?

The unlink method return bool value, so you can build a cycle, with some wait() and retries limit to wait for your processes to complete.
Additionally put "#" on the unlink, to hide the access error.
Throw another error/exception if retries count reached.

Use file_get_contents to to download multiple files at same time in php?

UDDATE: THIS IS AN S3 BUCKET QUESTION (SEE ANSWER)
I am looking to upload some code that reads files from an S3 bucket which uses the file_get_contents command to download a file one at a time.
Start
file_get_contents(s3://file1.json)
Wait until finished , then start next download:
file_get_contents(s3://file2.json)
And I want instead for them to all start at once to save time. like this:
Start both at same time:
file_get_contents(s3://file1.json)
file_get_contents(s3://file2.json)
Wait for them both at same time to finish.
I have seen multi curl requests but nothing for file_get_contents on this topic, is it possible ?
EDIT: Currently the code I am looking at uses s3:// which doesn't seem to work with curl. This is a way of getting to Amazon's S3 bucket.
EDIT2: Sample of current code :
function get_json_file( $filename = false ){
if(!$filename) return false;
// builds s3://somefile.on.amazon.com/file.json
$path = $this->get_json_filename( $filename );
if(!$filename || !file_exists($path)){
return false;
}
elseif(file_exists($path)) {
$data = file_get_contents($path);
}
else $data = false;
return ( empty( $data ) ? false : json_decode( $data , true ));
}

ANSWER: S3 SECURED, REQUIRES SPECIAL URI
Thank you guys for responding in comments. I was WAY off base with this earlier today when I asked this question.
This is the deal, the version of php we are using does not allow for threading. Hence, to ask for multiple URLs you need to use the multi curl option. The s:// somehow worked with a class file for S3 that I missed before. Hence the weird naming.
ALSO, IMPORTANT, if you don't care about protecting the data on S3, you can just make it public and NOT have this issue. In my case the data needs to be somewhat protected so that requires a bunch of things in the URI.
You can use Amazon's S3 class to generate a secure link from the URI and the bucket name on S3. This will return the proper URL to use with your bucket. The S3 class can be downloaded manually or installed via composer in laravel for example. It requires that you install an user with a key to access this in the AWS console.
$bucketName = "my-amazon-bucket";
$uri = "/somefile.txt"
$secure_url = $s3->getObjectUrl($bucketName, $uri, '+5 minutes');
This will generate a valid URL to access the file on S3 which can be used with curl etc..
https://domain.s3.amazonaws.com/bucket/filename.zip?AWSAccessKeyId=myaccesskey&Expires=1305311393&Signature=mysignature

Download file to server using API (it triggers prompt)

I want to store some data retrieved using an API on my server. Specifically, these are .mp3 files of (free) learning tracks. I'm running into a problem though. The mp3 link returned from the request isn't to a straight .mp3 file, but rather makes an ADDITIONAL API call which normally would prompt you to download the mp3 file.
file_put_contents doesn't seem to like that. The mp3 file is empty.
Here's the code:
$id = $_POST['cid'];
$title = $_POST['title'];
if (!file_exists("tags/".$id."_".$title))
{
mkdir("tags/".$id."_".$title);
}
else
echo "Dir already exists";
file_put_contents("tags/{$id}_{$title}/all.mp3", fopen($_POST['all'], 'r'));
And here is an example of the second API I mentioned earlier:
http://www.barbershoptags.com/dbaction.php?action=DownloadFile&dbase=tags&id=31&fldname=AllParts
Is there some way to bypass this intermediate step? If there's no way to access the direct URL of the mp3, is there a way to redirect the file download prompt to my server?
Thank you in advance for your help!
EDIT
Here is the current snippet. I should be echoing something, correct?
$handle = fopen("http://www.barbershoptags.com/dbaction.php?action=DownloadFile&dbase=tags&id=31&fldname=AllParts", 'rb');
$contents = stream_get_contents($handle);
echo $contents;
Because this echos nothing.
SOLUTION
Ok, I guess file_get_contents is supposed to handle redirects just fine, but this wasn't happening. So I found this function: https://stackoverflow.com/a/4102293/2723783 to return the final redirect of the API. I plugged that URL into file_get_contents and volia!

You seem to be just opening the file handler and not getting the contents using fread() or another similar function:
http://www.php.net/manual/en/function.fread.php
$handle = fopen($_POST['all'], 'rb')
file_put_contents("tags/{$id}_{$title}/all.mp3", stream_get_contents($handle));

Removing file after delivering response with Silex/Symfony

I'm generating a pdf using Knp\Snappy\Pdf in my Silex application. the file name is random and saved to the tmp directory.
$filename = "/tmp/$random.pdf"
$snappy->generate('/tmp/body.html', $filename, array(), true);
I think return the pdf in the response,
$response = new Response(file_get_contents($filename));
$response->headers->set('Pragma', 'public');
$response->headers->set('Content-Type', 'application/pdf');
return $response;
The pdf is displayed in the web browser correctly. The file with the random filename still exists when the request is finished. I can't unlink the file before returning the response. I've tried registering a shutdown function with register_shutdown_function and unlinking the file from there. However that doesn't seem to work. Any ideas?

Even though this is old, figured if anyone googles this more recently like I did. This is the solution I found.
There is a deleteFileAfterSend() method on the BinaryFileResponse returned from sendFile in Silex.
So in your controller you can just do:
return $app ->sendFile($filepath)
->setContentDisposition(ResponseHeaderBag::DISPOSITION_INLINE, $fileName)
->deleteFileAfterSend(true);

You can use the finish middleware for that:
A finish application middleware allows you to execute tasks after the Response has been sent to the client (like sending emails or logging)
This is how it could look:
$app->finish(function (Request $request, Response $response) use ($app) {
if (isset($app["file_to_remove"])) {
unlink($app["file_to_remove"];
}
});
//in your controller
$app["file_to_remove"] = $filename;

Maerlyn is right, but in this case you can also unlink the file before returning the response, since the content of the file is already in the $response.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Flysystem S3 remote file download always corrupted - php

Removing the Content-Length header seems to solve the problem. See https://github.com/thephpleague/flysystem/issues/543 for more details.

Related

Google Cloud Storage get temp filename (using fopen('php://temp'))

php - unlink throws error: Resource temporarily unavailable

Use file_get_contents to to download multiple files at same time in php?

Download file to server using API (it triggers prompt)

Removing file after delivering response with Silex/Symfony

Categories

Resources