php://memory & php://temp; preserving stream data on subsequent handle creation - php

This question is closely related to my new findings, regarding this question.
Is there any way to preserve the in stream data of php://memory or php://temp between handles? I read (somewhere I can't source off hand) that subsequent openings of the aforementioned streams clears existing data.
$mem1 = fopen('php://memory', 'r+');
fwrite($mem1, 'hello world');
rewind($mem1);
fpassthru($mem1); // "hello world"
$mem2 = fopen('php://memory', 'r+');
rewind($mem2);
fpassthru($mem2); // empty
So again my question is, is there anyway to force existing data to persist in stream when creating a new handle to it?
(The latter call to fpassthru() would of course dump hello world given this is possible)

Opening one of the pseudo-streams php://temp or php://memory always opens a new stream, what means, that every stream your open this way is unique. So you can't read the content of the stream you have previously written to another one.

If you need in-memory virtual stream that persists data you can use https://github.com/mikey179/vfsStream - although it's mainly used for testing I/O operations it should fulfill your requirements - it stores data within internal objects which are identified by virtual URLs so you can access same data in memory by accessing same URL.

The handlers are unique, so you'll have to pass the handler, or (god forbid) keep the handler global
$GLOBALS['my_global_memory_stream']=fopen('php://memory','r+');

Related

Innocent PHP function that returns a resource?

Some PHP functions, like fopen(), have a return value of type "resource".
However, most of these functions require some actual outside resource, such as a file or database. Or they require additional PHP extension to be installed, such as curl_open().
I sometimes want to experiment with different value types on https://3v4l.org, where I cannot rely on external resources.
Another scenario where this might be relevant is unit tests, where we generally want as little side effects as possible.
So, what is the simplest way to get a value of type resource, without external side effects, 3rd party extensions, or external dependencies?
I use
fopen('php://memory', 'w'); or fopen('php://temp', 'w'); when I just need a file stream resource to play with.
php://temp is better if the buffer will exceed 2mb.
You can use php://memory or php://temp as resource. The first one doesn't even need access to the system /tmp folder.
Example:
$resource = fopen('php://temp', 'w+');
The best I've come up with so far is tmpfile().
It does work in https://3v4l.org/00VlY. Probably they have set up some kind of sandbox filesystem.
$resource = tmpfile();
var_dump(gettype($resource));
var_dump($resource);
var_dump(intval($resource));
I would say it is still not completely free of side effects, because it does something with a file somewhere. Better ideas are welcome!

Broadcast stream with PHP within localhost

Maybe I'm asking the impossible but I wanted to clone a stream multiple times. A sort of multicast emulation. The idea is to write every 0.002 seconds a 1300 bytes big buffer into a .sock file (instead of IP:port to avoid overheading) and then to read from other scripts the same .sock file multiple times.
Doing it through a regular file is not doable. It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly.
This works perfectly with the script that generates the chunks:
$handle = #fopen($url, 'rb');
$buffer = 1300;
while (1) {
$chunck = fread($handle, $buffer);
$handle2 = fopen('/var/tmp/stream_chunck.tmp', 'w');
fwrite($handle2, $chunck);
fclose($handle2);
readfile('/var/tmp/stream_chunck.tmp');
}
BUT the output of another script that reads the chunks:
while (1) {
readfile('/var/tmp/stream_chunck.tmp');
}
is messy. I don't know how to synchronize the reading process of chunks and I thought that sockets could make a miracle.
It works only within the same script that generates the buffer file and then echos it. The other scripts will misread it badly
Using a single file without any sort of flow control shouldn't be a problem - tail -F does just that. The disadvantage is that the data will just accululate indefinitely on the filesystem as long as a single client has an open file handle (even if you truncate the file).
But if you're writing chunks, then write each chunk to a different file (using an atomic write mechanism) then everyone can read it by polling for available files....
do {
while (!file_exists("$dir/$prefix.$current_chunk")) {
clearstatcache();
usleep(1000);
}
process(file_get_contents("$dir/$prefix.$current_chunk"));
$current_chunk++;
} while (!$finished);
Equally, you could this with a database - which should have slightly lower overhead for the polling, and simplifies the garbage collection of old chunks.
But this is all about how to make your solution workable - it doesn't really address the problem you are trying to solve. If we knew what you were trying to achieve then we might be able to advise on a more appropriate solution - e.g. if it's a chat application, video broadcast, something else....
I suspect a more appropriate solution would be to use mutli-processing, single memory model server - and when we're talking about PHP (which doesn't really do threading very well) that means an event based/asynchronous server. There's a bit more involved than simply calling socket_select() but there are some good scripts available which do most of the complicated stuff for you.

Processing huge yaml-files via php

I need to process a huge yaml-file - which is 450 MB - to get the data in a database. Therefore I tried to use "spyc". But the file is too big.
Every chapter has the line --- !de.db.net,DB::Util::M10lDocument. And I need the content of every chapter as an array. Therefore I tried to use spyc. But the complete file is too big for that. I don't know how to split for those chapters.
Is it possible to read the complete file just block by block?
Does anyone have an idea how to work with that big file?
--- is the document boundary marker for a YAML stream. Using a YAML parser that processes the file as a stream should allow you to process the file in document sized chunks as long as each document is small enough to fit in available memory.
The yaml_parse_file function provided by the yaml PECL extension includes the ability to parse a single document out of a stream of documents. There is no built in method to iterate over the documents (eg foreach support) but you could implement your own loop that fetched sequential documents and halted when yaml_parse_file returns false indicating that the requested document was not found.
<?php
$docNum = 0;
while (false !== ($doc = yaml_parse_file('example.yaml', $docNum))) {
var_dump($doc);
$docNum++;
}

uploading large object to Cloudfiles returns different md5

So I have this code and I'm trying to upload large files as per https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/Storage/Object.md to Rackspace:
$src_path = 'pathtofile.zip'; //about 700MB
$md5_checksum = md5_file($src_path); //result is f210775ccff9b0e4f686ea49ac4932c2
$trans_opts = array(
'name' => $md5_checksum,
'concurrency' => 6,
'partSize' => 25000000
);
$trans_opts['path'] = $src_path;
$transfer = $container->setupObjectTransfer($trans_opts);
$response = $transfer->upload();
Which allegedly uploads the file just fine
However when I try to download the file as recommended here https://github.com/rackspace/php-opencloud/blob/master/docs/userguide/ObjectStore/USERGUIDE.md:
$name = 'f210775ccff9b0e4f686ea49ac4932c2';
$object = $container->getObject($name);
$objectContent = $object->getContent();
$pathtofile = 'destinationpathforfile.zip';
$objectContent->rewind();
$stream = $objectContent->getStream();
file_put_contents($pathtofile, $stream);
$md5 = md5_file($pathtofile);
The result of md5_file ends up being different from 'f210775ccff9b0e4f686ea49ac4932c2'....moreover the downloaded zip ends up being unopenable/corrupted
What did I do wrong?
It's recommended that you only use multipart uploads for files over 5GB. For files under this threshold, you can use the normal uploadObject method.
When you use the transfer builder, it segments your large file into smaller segments (you provide the part size) and concurrently uploads each one. When this process has finished, a manifest file is created which contains a list of all these segments. When you download the manifest file, it collates them all together, effectively pretending to be the big file itself. But it's just really an organizer.
To get back to answering your question, the ETag header of a manifest file is not calculated how you may think. What you're currently doing is taking the MD5 checksum of the entire 700MB file, and comparing it against the MD5 checksum of the manifest file. But these aren't comparable. To quote the documentation:
the ETag header is calculated by taking the ETag value of each segment, concatenating them together, and then returning the MD5 checksum of the result.
There are also downsides to using this DLO operation that you need to be aware of:
End-to-end integrity is not assured. The eventual consistency model means that although you have uploaded a segment object, it might not appear in the container list immediately. If you download the manifest before the object appears in the container, the object will not be part of the content returned in response to a GET request.
If you think there's been an error in transmission, perhaps it's because a HTTP request failed along the way. You can use retry strategies (using the backoff plugin) to retry failed requests.
You can also turn on HTTP logging to check every network transaction to help with debugging. Be careful, though, using the above with echo out the HTTP request body (>25MB) into STDOUT. You might want to use this instead:
use Guzzle\Plugin\Log\LogPlugin;
use Guzzle\Log\ClosureLogAdapter;
$stream = fopen('php://output', 'w');
$logSubscriber = new LogPlugin(new ClosureLogAdapter(function($m) use ($stream) {
fwrite($stream, $m . PHP_EOL);
}), "# Request:\n{url} {method}\n\n# Response:\n{code} {phrase}\n\n# Connect time: {connect_time}\n\n# Total time: {total_time}", false);
$client->addSubscriber($logSubscriber);
As you can see, you're using a template to dictate what's outputted. There's a full list of template variables here.

Best way to store an image from a url in php?

I would like to know the best way to save an image from a URL in php.
At the moment I am using
file_put_contents($pk, file_get_contents($PIC_URL));
which is not ideal. I am unable to use curl. Is there a method specifically for this?
Using file_get_contents is fine, unless the file is very large. In that case, you don't really need to be holding the entire thing in memory.
For a large retrieval, you could fopen the remote file, fread it, say, 32KB at a time, and fwrite it locally in a loop until all the file has been read.
For example:
$fout = fopen('/tmp/verylarge.jpeg', 'w');
$fin = fopen("http://www.example.com/verylarge.jpeg", "rb");
while (!feof($fin)) {
$buffer= fread($fin, 32*1024);
fwrite($fout,$buffer);
}
fclose($fin);
fclose($fout);
(Devoid of error checking for simplicity!)
Alternatively, you could forego using the url wrappers and use a class like PEAR's HTTP_Request, or roll your own HTTP client code using fsockopen etc. This would enable you to do efficient things like send If-Modified-Since headers if you are maintaining a cache of remote files.
I'd recommend using Paul Dixon's strategy, but replacing fopen with fsockopen(). The reason is that some server configurations disallow URL access for fopen() and file_get_contents(). The setting may be found in php.ini and is called allow_url_fopen.

Categories