Stream from a temporary file, then delete when finished? - php

I'm writing a temporary file by running a couple of external Unix tools over a PDF file (basically I'm using QPDF and sed to alter the colour values. Don't ask.):
// Uncompress PDF using QPDF (doesn't read from stdin, so needs tempfile.)
$compressed_file_path = tempnam(sys_get_temp_dir(), 'cruciverbal');
file_put_contents($compressed_file_path, $response->getBody());
$uncompressed_file_path = tempnam(sys_get_temp_dir(), 'cruciverbal');
$command = "qpdf --qdf --object-streams=disable '$compressed_file_path' '$uncompressed_file_path'";
exec($command, $output, $return_value);
// Run through sed (could do this bit with streaming stdin/stdout)
$fixed_file_path = tempnam(sys_get_temp_dir(), 'cruciverbal');
$command = "sed s/0.298039215/0.0/g < '$uncompressed_file_path' > '$fixed_file_path'";
exec($command, $output, $return_value);
So, when this is done I'm left with a temporary file on disk at $fixed_file_path. (NB: While I could do the whole sed process streamed in-memory without a tempfile, the QPDF utility requires an actual file as input, for good reasons.)
In my existing process, I then read the whole $fixed_file_path file in as a string, delete it, and hand the string off to another class to go do things with.
I'd now like to change that last part to using a PSR-7 stream, i.e. a \Guzzle\Psr7\Stream object. I figure it'll be more memory-efficient (I might have a few of these in the air at once) and it'll need to be a stream in the end.
However, I'm not sure then how I'd delete the temporary file when the (third-party) class I'd handed the stream off to is finished with it. Is there a method of saying "...and delete that when you're finished with it"? Or auto-cleaning my temporary files in some other way, without keeping track of them manually?
I'd been vaguely considering rolling my own SelfDestructingFileStream, but that seemed like overkill and I thought I might be missing something.

Sounds like what you want is something like this:
<?php
class TempFile implements \Psr\Http\Message\StreamInterface {
private $resource;
public function __construct() {
$this->resource = tmpfile();
}
public function __destruct() {
$this->close();
}
public function getFilename() {
return $this->getMetadata('uri');
}
public function getMetadata($key = null) {
$data = stream_get_meta_data($this->resource);
if($key) {
return $data[$key];
}
return $data;
}
public function close() {
fclose($this->resource);
}
// TODO: implement methods from https://github.com/php-fig/http-message/blob/master/src/StreamInterface.php
}
Have QPDF write to $tmpFile->getFilename() and then you can pass the whole object off to your Guzzle/POST since it's PSR-7 compliant and then the file will delete itself when it goes out of scope.

Related

Parse command line output using Symfony Process

Within my Symfony application I need to do several operation with files: list of files from a directory, decrypt them using gpg, parse the output with an external software and encrypt again.
My first question is: is this the right approach for this problem? On another scenario, I'd have written bash/python scripts to do this, but since info (user ids, passphrases, etc) is read from a Symfony API I though it was quite convenient to embed the calls into the application.
My second question is more specific: is there any way to efficiently handle the command line outputs and errors? For instance, when I call 'ls' how can easily convert the output into an array of file names?
private function decryptAction()
{
$user_data_source = '/Users/myuser/datafiles/';
// Scan directory and get a list of all files
$process = new Process('ls ' . $user_data_source);
try {
$process->mustRun();
$files = explode(' ', $process->getOutput());
return $files;
} catch (ProcessFailedException $e) {
return $e->getMessage();
}
}
Found the answer for my second question, but I am still very interested in your thoughts about the entire approach.
// Scan directory and get a list of all files
$process = new Process('ls -1 ' . $user_data_source);
try {
$process->mustRun();
$files = array_filter( explode("\n", $process->getOutput()), 'strlen');
return $files;
} catch (ProcessFailedException $e) {
return $e->getMessage();
}
Unless you really need an immediate response from the call, this kind of tasks are better left to a background process.
So what I would do is write one or more Symfony commands that perform the described processes (read, decrypt, and so on).
Those processes can be executed via crontab, or "daemonized" via another scheduler like Supervisord.
Then, the API call only creates some kind of "semaphore" file that triggers the actual execution, or even better you can use some kind of queue system.

Get output of a Symfony command and save it to a file

I'm using Symfony 2.0.
I have created a command in Symfony and I want to take its output and write it to a file.
All I want is to take everything that is written on the standard output (on the console) and to have it in a variable. By all I mean things echoed in the command, exceptions catched in other files, called by the command and so on. I want the output both on the screen and in a variable (in order to write the content of the variable in a file). I will do the writing in the file in the end of the execute() method of the command.
Something like this:
protected function execute(InputInterface $input, OutputInterface $output)
{
// some logic and calls to services and functions
echo 'The operation was successful.';
$this->writeLogToFile($file, $output???);
}
And in the file I want to have:
[Output from the calls to other services, if any]
The operation was successful.
Can you please help me?
I tried something like this:
$stream = $output->getStream();
$content = stream_get_contents($stream, 5);
but the command doesn't finish in that way. :(
You could just forward the command output using standard shell methods with php app/console your:command > output.log. Or, if this is not an option, you could introduce a wrapper for the OutputInterface that would write to a stream and then forward calls to the wrapped output.
I needed the same thing, in my case, I wanted to email the console output for debug and audit to email, so I've made anon PHP class wrapper, which stores the line data and then passes to the original output instance, this will work only for PHP 7+.
protected function execute(InputInterface $input, OutputInterface $output) {
$loggableOutput = new class {
private $linesData;
public $output;
public function write($data) {
$this->linesData .= $data;
$this->output->write($data);
}
public function writeln($data) {
$this->linesData .= $data . "\n";
$this->output->writeln($data);
}
public function getLinesData() {
return $this->linesData;
}
};
$loggableOutput->output = $output;
//do some work with output
var_dump($loggableOutput->getLinesData());
}
Note this will only store the data written using write and writeln OutputInterface methods, this will no store any PHP warnings etc.
Sorry for bringing this up again.
I'm in a similar situation and if you browse the code for Symfony versions (2.7 onwards), there already is a solution.
You can easily adapt this to your specific problem:
// use Symfony\Component\Console\Output\BufferedOutput;
// You can use NullOutput() if you don't need the output
$output = new BufferedOutput();
$application->run($input, $output);
// return the output, don't use if you used NullOutput()
$content = $output->fetch();
This should neatly solve the problem.

Fast way of detecting directory changes [duplicate]

We have a PHP application, and were thinking it might be advantageous to have the application know if there was a change in its makeup since the last execution. Mainly due to managing caches and such, and knowing that our applications are sometimes accessed by people who don't remember to clear the cache on changes. (Changing the people is the obvious answer, but alas, not really achievable)
We've come up with this, which is the fastest we've managed to eke out, running an average 0.08 on a developer machine for a typical project. We've experimented with shasum,md5 and crc32, and this is the fastest. We are basically md5ing the contents of every file, and md5'ing that output. Security isnt a concern, we're just interested in detecting filesystem changes via a differing checksum.
time (find application/ -path '*/.svn' -prune -o -type f -print0 | xargs -0 md5 | md5)
I suppose the question is, can this be optimised any further?
(I realise that pruning svn will have a cost, but find takes the least amount of time out of the components, so it will be pretty minimal. We're testing this on a working copy atm)
You can be notified of filesystem modifications using the inotify extension.
It can be installed with pecl:
pecl install inotify
Or manually (download, phpize && ./configure && make && make install as usual).
This is a raw binding over the linux inotify syscalls, and is probably the fastest solution on linux.
See this example of a simple tail implementation: http://svn.php.net/viewvc/pecl/inotify/trunk/tail.php?revision=262896&view=markup
If you want a higher level library, or suppot for non-linux systems, take a look at Lurker.
It works on any system, and can use inotity when it's available.
See the example from the README:
$watcher = new ResourceWatcher;
$watcher->track('an arbitrary id', '/path/to/views');
$watcher->addListener('an arbitrary id', function (FilesystemEvent $event) {
echo $event->getResource() . 'was' . $event->getTypeString();
});
$watcher->start();
Instead of going by file contents, you can use the same technique with filename and timestamps:
find . -name '.svn' -prune -o -type f -printf '%m%c%p' | md5sum
This is much faster than reading and hashing the contents of each file.
Insteading of actively searching for changes, why not getting notified when something changes. Have a look at PHP's FAM - File Alteration Monitor API
FAM monitors files and directories, notifying interested applications of changes. More information about FAM is available at » http://oss.sgi.com/projects/fam/. A PHP script may specify a list of files for FAM to monitor using the functions provided by this extension. The FAM process is started when the first connection from any application to it is opened. It exits after all connections to it have been closed.
CAVEAT: requires an additional daemon on the machine and the PECL extension is unmaintained.
We didn't want to use FAM, since we would need to install it on the server, and thats not always possible. Sometimes clients are insistent we deploy on their broken infrastructure. Since it's discontinued, its hard to get that change approved red tape wise also.
The only way to improve the speed of the original version in the question is to make sure your file list is as succinct as possible. IE only hash the directories/files that really matter if changed. Cutting out directories that aren't relevant can give big speed boosts.
Past that, the application was using the function to check if there were changes in order to perform a cache clear if there were. Since we don't really want to halt the application while its doing this, this sort of thing is best farmed out carefully as an asynchronous process using fsockopen. That gives the best 'speed boost' overall, just be careful of race conditions.
Marking this as the 'answer' and upvoting the FAM answer.
since you have svn, why don't you go by revisions. i realise you are skipping svn folders but i suppose you did that for speed advantage and that you do not have modified files in your production servers...
that beeing said, you do not have to re invent the wheel.
you could speed up the process by only looking at metadata read from the directory indexes (modification timestamp, filesize, etc)
you could also stop once you spotted a difference (should theoretically reduce the time by half in average) etc. there is a lot.
i honestly think the best way in this case is to just use the tools already available.
the linux tool diff has a -q option (quick).
you will need to use it with the recursive parameter -r as well.
diff -r -q dir1/ dir2/
it uses a lot of optimisations and i seriously doubt you can significantly improve upon it without considerable effort.
Definitely what you should be using is Inotify its fast and easy to configure, multiple options directly from bash or php of dedicate a simple node-inotify instance for this task
But Inotify does not worn on windows but you can easy write a command line application with FileSystemWatcher or FindFirstChangeNotification and call via exec
If you are looking for only PHP solution then its pretty difficult and you might not get the performance want because the only way is to scan that folder continuously
Here is a Simple Experiment
Don't use in production
Can not manage large file set
Does not support file monitoring
Only Support NEW , DELETED and MODIFIED
Does not support Recursion
Example
if (php_sapi_name() !== 'cli')
die("CLI ONLY");
date_default_timezone_set("America/Los_Angeles");
$sm = new Monitor(__DIR__ . "/test");
// Add hook when new files are created
$sm->hook(function ($file) {
// Send a mail or log to a file
printf("#EMAIL NEW FILE %s\n", $file);
}, Monitor::NOTIFY_NEW);
// Add hook when files are Modified
$sm->hook(function ($file) {
// Do monthing meaningful like calling curl
printf("#HTTP POST MODIFIED FILE %s\n", $file);
}, Monitor::NOTIFY_MODIFIED);
// Add hook when files are Deleted
$sm->hook(function ($file) {
// Crazy ... Send SMS fast or IVR the Boss that you messed up
printf("#SMS DELETED FILE %s\n", $file);
}, Monitor::NOTIFY_DELETED);
// Start Monitor
$sm->start();
Cache Used
// Simpe Cache
// Can be replaced with Memcache
class Cache {
public $f;
function __construct() {
$this->f = fopen("php://temp", "rw+");
}
function get($k) {
rewind($this->f);
return json_decode(stream_get_contents($this->f), true);
}
function set($k, $data) {
fseek($this->f, 0);
fwrite($this->f, json_encode($data));
return $k;
}
function run() {
}
}
The Experiment Class
// The Experiment
class Monitor {
private $dir;
private $info;
private $timeout = 1; // sec
private $timeoutStat = 60; // sec
private $cache;
private $current, $stable, $hook = array();
const NOTIFY_NEW = 1;
const NOTIFY_MODIFIED = 2;
const NOTIFY_DELETED = 4;
const NOTIFY_ALL = 7;
function __construct($dir) {
$this->cache = new Cache();
$this->dir = $dir;
$this->info = new SplFileInfo($this->dir);
$this->scan(true);
}
public function start() {
$i = 0;
$this->stable = (array) $this->cache->get(md5($this->dir));
while(true) {
// Clear System Cache at Intervals
if ($i % $this->timeoutStat == 0) {
clearstatcache();
}
$this->scan(false);
if ($this->stable != $this->current) {
$this->cache->set(md5($this->dir), $this->current);
$this->stable = $this->current;
}
sleep($this->timeout);
$i ++;
// printf("Memory Usage : %0.4f \n", memory_get_peak_usage(false) /
// 1024);
}
}
private function scan($new = false) {
$rdi = new FilesystemIterator($this->dir, FilesystemIterator::SKIP_DOTS);
$this->current = array();
foreach($rdi as $file) {
// Skip files that are not redable
if (! $file->isReadable())
return false;
$path = addslashes($file->getRealPath());
$keyHash = md5($path);
$fileHash = $file->isFile() ? md5_file($path) : "#";
$hash["t"] = $file->getMTime();
$hash["h"] = $fileHash;
$hash["f"] = $path;
$this->current[$keyHash] = json_encode($hash);
}
if ($new === false) {
$this->process();
}
}
public function hook(Callable $call, $type = Monitor::NOTIFY_ALL) {
$this->hook[$type][] = $call;
}
private function process() {
if (isset($this->hook[self::NOTIFY_NEW])) {
$diff = array_flip(array_diff(array_keys($this->current), array_keys($this->stable)));
$this->notify(array_intersect_key($this->current, $diff), self::NOTIFY_NEW);
unset($diff);
}
if (isset($this->hook[self::NOTIFY_DELETED])) {
$deleted = array_flip(array_diff(array_keys($this->stable), array_keys($this->current)));
$this->notify(array_intersect_key($this->stable, $deleted), self::NOTIFY_DELETED);
}
if (isset($this->hook[self::NOTIFY_MODIFIED])) {
$this->notify(array_diff_assoc(array_intersect_key($this->stable, $this->current), array_intersect_key($this->current, $this->stable)), self::NOTIFY_MODIFIED);
}
}
private function notify(array $files, $type) {
if (empty($files))
return;
foreach($this->hook as $t => $hooks) {
if ($t & $type) {
foreach($hooks as $hook) {
foreach($files as $file) {
$info = json_decode($file, true);
$hook($info['f'], $type);
}
}
}
}
}
}

How can I go to the nth line without using fgets() & file()?

I'm actually developping a class which allow me to open a file & read it line by line.
class File
{
protected $path = null;
protected $cursor = null;
protected $lineCount = 0;
public function isOpen()
{
return !is_null($this->cursor);
}
public function open($flag = 'r')
{
if(!$this->isOpen())
$this->cursor = fopen($this->path, $flag);
}
public function getLine()
{
$this->open();
$line = fgets($this->cursor);
$this->lineCount++;
return $line;
}
public function close()
{
if($this->isOpen())
fclose($this->cursor);
}
}
For some reason, I would like the file open at the line which is described by the lineCount property. I don't how can I update the open() method for doing that.
Instead of using the line count, I can use the size from the beginning of the file in octet and use the fseek method to move the cursor at the right place. But I don't know how can I get the size of a line in octet when I call the fgets method.
Thanks
Given that a text file can have any amount of text in a line, there's no 100% method to quickly jumping to a position. Unless the text format is exactly fixed and known, you'll have to read line-by-line until you reach the line number you want.
If the file doesn't change between sessions, you can store the 'pointer' in the file using ftell() (basically how far into the file you've read), and later jump to that position via fseek(). You could also have your getLine method store the offsets as it reads each line, so you build an array of lines/offsets as you go. This'd let you jump backwards in the file to any arbitrary position. It would not, however, let you jump 'forward' into unknown parts of the file.

Fastest way to compare directory state, or hashing for fun and profit

We have a PHP application, and were thinking it might be advantageous to have the application know if there was a change in its makeup since the last execution. Mainly due to managing caches and such, and knowing that our applications are sometimes accessed by people who don't remember to clear the cache on changes. (Changing the people is the obvious answer, but alas, not really achievable)
We've come up with this, which is the fastest we've managed to eke out, running an average 0.08 on a developer machine for a typical project. We've experimented with shasum,md5 and crc32, and this is the fastest. We are basically md5ing the contents of every file, and md5'ing that output. Security isnt a concern, we're just interested in detecting filesystem changes via a differing checksum.
time (find application/ -path '*/.svn' -prune -o -type f -print0 | xargs -0 md5 | md5)
I suppose the question is, can this be optimised any further?
(I realise that pruning svn will have a cost, but find takes the least amount of time out of the components, so it will be pretty minimal. We're testing this on a working copy atm)
You can be notified of filesystem modifications using the inotify extension.
It can be installed with pecl:
pecl install inotify
Or manually (download, phpize && ./configure && make && make install as usual).
This is a raw binding over the linux inotify syscalls, and is probably the fastest solution on linux.
See this example of a simple tail implementation: http://svn.php.net/viewvc/pecl/inotify/trunk/tail.php?revision=262896&view=markup
If you want a higher level library, or suppot for non-linux systems, take a look at Lurker.
It works on any system, and can use inotity when it's available.
See the example from the README:
$watcher = new ResourceWatcher;
$watcher->track('an arbitrary id', '/path/to/views');
$watcher->addListener('an arbitrary id', function (FilesystemEvent $event) {
echo $event->getResource() . 'was' . $event->getTypeString();
});
$watcher->start();
Instead of going by file contents, you can use the same technique with filename and timestamps:
find . -name '.svn' -prune -o -type f -printf '%m%c%p' | md5sum
This is much faster than reading and hashing the contents of each file.
Insteading of actively searching for changes, why not getting notified when something changes. Have a look at PHP's FAM - File Alteration Monitor API
FAM monitors files and directories, notifying interested applications of changes. More information about FAM is available at » http://oss.sgi.com/projects/fam/. A PHP script may specify a list of files for FAM to monitor using the functions provided by this extension. The FAM process is started when the first connection from any application to it is opened. It exits after all connections to it have been closed.
CAVEAT: requires an additional daemon on the machine and the PECL extension is unmaintained.
We didn't want to use FAM, since we would need to install it on the server, and thats not always possible. Sometimes clients are insistent we deploy on their broken infrastructure. Since it's discontinued, its hard to get that change approved red tape wise also.
The only way to improve the speed of the original version in the question is to make sure your file list is as succinct as possible. IE only hash the directories/files that really matter if changed. Cutting out directories that aren't relevant can give big speed boosts.
Past that, the application was using the function to check if there were changes in order to perform a cache clear if there were. Since we don't really want to halt the application while its doing this, this sort of thing is best farmed out carefully as an asynchronous process using fsockopen. That gives the best 'speed boost' overall, just be careful of race conditions.
Marking this as the 'answer' and upvoting the FAM answer.
since you have svn, why don't you go by revisions. i realise you are skipping svn folders but i suppose you did that for speed advantage and that you do not have modified files in your production servers...
that beeing said, you do not have to re invent the wheel.
you could speed up the process by only looking at metadata read from the directory indexes (modification timestamp, filesize, etc)
you could also stop once you spotted a difference (should theoretically reduce the time by half in average) etc. there is a lot.
i honestly think the best way in this case is to just use the tools already available.
the linux tool diff has a -q option (quick).
you will need to use it with the recursive parameter -r as well.
diff -r -q dir1/ dir2/
it uses a lot of optimisations and i seriously doubt you can significantly improve upon it without considerable effort.
Definitely what you should be using is Inotify its fast and easy to configure, multiple options directly from bash or php of dedicate a simple node-inotify instance for this task
But Inotify does not worn on windows but you can easy write a command line application with FileSystemWatcher or FindFirstChangeNotification and call via exec
If you are looking for only PHP solution then its pretty difficult and you might not get the performance want because the only way is to scan that folder continuously
Here is a Simple Experiment
Don't use in production
Can not manage large file set
Does not support file monitoring
Only Support NEW , DELETED and MODIFIED
Does not support Recursion
Example
if (php_sapi_name() !== 'cli')
die("CLI ONLY");
date_default_timezone_set("America/Los_Angeles");
$sm = new Monitor(__DIR__ . "/test");
// Add hook when new files are created
$sm->hook(function ($file) {
// Send a mail or log to a file
printf("#EMAIL NEW FILE %s\n", $file);
}, Monitor::NOTIFY_NEW);
// Add hook when files are Modified
$sm->hook(function ($file) {
// Do monthing meaningful like calling curl
printf("#HTTP POST MODIFIED FILE %s\n", $file);
}, Monitor::NOTIFY_MODIFIED);
// Add hook when files are Deleted
$sm->hook(function ($file) {
// Crazy ... Send SMS fast or IVR the Boss that you messed up
printf("#SMS DELETED FILE %s\n", $file);
}, Monitor::NOTIFY_DELETED);
// Start Monitor
$sm->start();
Cache Used
// Simpe Cache
// Can be replaced with Memcache
class Cache {
public $f;
function __construct() {
$this->f = fopen("php://temp", "rw+");
}
function get($k) {
rewind($this->f);
return json_decode(stream_get_contents($this->f), true);
}
function set($k, $data) {
fseek($this->f, 0);
fwrite($this->f, json_encode($data));
return $k;
}
function run() {
}
}
The Experiment Class
// The Experiment
class Monitor {
private $dir;
private $info;
private $timeout = 1; // sec
private $timeoutStat = 60; // sec
private $cache;
private $current, $stable, $hook = array();
const NOTIFY_NEW = 1;
const NOTIFY_MODIFIED = 2;
const NOTIFY_DELETED = 4;
const NOTIFY_ALL = 7;
function __construct($dir) {
$this->cache = new Cache();
$this->dir = $dir;
$this->info = new SplFileInfo($this->dir);
$this->scan(true);
}
public function start() {
$i = 0;
$this->stable = (array) $this->cache->get(md5($this->dir));
while(true) {
// Clear System Cache at Intervals
if ($i % $this->timeoutStat == 0) {
clearstatcache();
}
$this->scan(false);
if ($this->stable != $this->current) {
$this->cache->set(md5($this->dir), $this->current);
$this->stable = $this->current;
}
sleep($this->timeout);
$i ++;
// printf("Memory Usage : %0.4f \n", memory_get_peak_usage(false) /
// 1024);
}
}
private function scan($new = false) {
$rdi = new FilesystemIterator($this->dir, FilesystemIterator::SKIP_DOTS);
$this->current = array();
foreach($rdi as $file) {
// Skip files that are not redable
if (! $file->isReadable())
return false;
$path = addslashes($file->getRealPath());
$keyHash = md5($path);
$fileHash = $file->isFile() ? md5_file($path) : "#";
$hash["t"] = $file->getMTime();
$hash["h"] = $fileHash;
$hash["f"] = $path;
$this->current[$keyHash] = json_encode($hash);
}
if ($new === false) {
$this->process();
}
}
public function hook(Callable $call, $type = Monitor::NOTIFY_ALL) {
$this->hook[$type][] = $call;
}
private function process() {
if (isset($this->hook[self::NOTIFY_NEW])) {
$diff = array_flip(array_diff(array_keys($this->current), array_keys($this->stable)));
$this->notify(array_intersect_key($this->current, $diff), self::NOTIFY_NEW);
unset($diff);
}
if (isset($this->hook[self::NOTIFY_DELETED])) {
$deleted = array_flip(array_diff(array_keys($this->stable), array_keys($this->current)));
$this->notify(array_intersect_key($this->stable, $deleted), self::NOTIFY_DELETED);
}
if (isset($this->hook[self::NOTIFY_MODIFIED])) {
$this->notify(array_diff_assoc(array_intersect_key($this->stable, $this->current), array_intersect_key($this->current, $this->stable)), self::NOTIFY_MODIFIED);
}
}
private function notify(array $files, $type) {
if (empty($files))
return;
foreach($this->hook as $t => $hooks) {
if ($t & $type) {
foreach($hooks as $hook) {
foreach($files as $file) {
$info = json_decode($file, true);
$hook($info['f'], $type);
}
}
}
}
}
}

Categories