I use a PHP script to upload some daily videos to a Youtube channel (based on this code sample: https://developers.google.com/youtube/v3/code_samples/php#resumable_uploads)
The problem is after this loop:
// Read the media file and upload it chunk by chunk.
$status = false;
$handle = fopen($videoPath, "rb");
while (!$status && !feof($handle)) {
$chunk = fread($handle, $chunkSizeBytes);
$status = $media->nextChunk($chunk);
}
Normally the $status variable has the video id ($status['id']) after the upload is complete but since mid January all the uploads failed with one of this errors:
$status variable value stills as "false"
Inside the catch, the Google_Service_Exception message is like "A service error occurred: Error calling PUT https://www.googleapis.com/upload/youtube/v3/videos?part=status%2Csnippet&uploadType=resumable&upload_id=xxx: (400) Invalid request. The number of bytes uploaded is required to be equal or greater than 262144, except for the final request (it's recommended to be the exact multiple of 262144). The received request contained nnn bytes, which does not meet this requirement.", where nnn is less than 262144 and seems to be the last request.
When I access the Youtube channel I can see the new videos with a status "Preparing upload" or stuck with a fixed percentage.
My code has not changed for months but now I can't upload any new video.
Anyone can please help me to know what's wrong? Thanks in advance!
The solution proposed by #pom (thanks by the way) doesn't really solve this issue if you need to implement a progress indicator.
I'm facing the same problem than #astor; after calling ->nextChunk for the final chunk i get:
The number of bytes uploaded is required to be equal or greater than
262144, except for the final request (it's recommended to be the exact
multiple of 262144). The received request contained 38099 bytes,
which does not meet this requirement.
See this log file :
The code is copy-paste from the google/google-api-php-client doc.
The log file shows the size of the first chunk being a bit superior to the others (except the last one) which i couldn't explain. Another "strange" thing is that its size changes test after test.
The size of the last chunk seems correct, and at the end all bytes should have been uploaded.
However, uploading the remaining bytes in the last chunk ->nextChunk($chunk) throws this error.
One important precision is that my source file is on AWS S3. File operations (filesize, fread, fopen) are done with the Amazon S3 Stream Wrapper.
It may add some headers or IDK that causes the problem.
EDIT: I don't have such problem with local files
Has anyone run into the same problem? ?
...
$chunkSizeBytes = 5 * 1024 * 1024;
$client->setDefer(true);
$request = $service->files->create($file);
$media = new Google_Http_MediaFileUpload(
$client,
$request,
'text/plain',
null,
true,
$chunkSizeBytes
);
$media->setFileSize(filesize(TESTFILE));
// Upload the various chunks. $status will be false until the process is
// complete.
$status = false;
$handle = fopen(TESTFILE, "rb");
while (!$status && !feof($handle)) {
// read until you get $chunkSizeBytes from TESTFILE
// fread will never return more than 8192 bytes if the stream is read buffered and it does not represent a plain file
// An example of a read buffered file is when reading from a URL
$chunk = readVideoChunk($handle, $chunkSizeBytes);
$status = $media->nextChunk($chunk);
}
// The final value of $status will be the data from the API for the object
// that has been uploaded.
$result = false;
if ($status != false) {
$result = $status;
}
fclose($handle);
}
function readVideoChunk ($handle, $chunkSize)
{
$byteCount = 0;
$giantChunk = "";
while (!feof($handle)) {
// fread will never return more than 8192 bytes if the stream is read buffered and it does not represent a plain file
$chunk = fread($handle, 8192);
$byteCount += strlen($chunk);
$giantChunk .= $chunk;
if ($byteCount >= $chunkSize)
{
return $giantChunk;
}
}
return $giantChunk;
}
Maybe you can try like that and then let MediaFileUpload cut the chunk itself:
$media = new \Google_Http_MediaFileUpload(
$client,
$insertRequest,
'video/*',
file_get_contents($pathToFile), <== put file content instead of null
true,
$chunkSizeBytes
);
$media->setFileSize($size);
$status = false;
while (!$status) {
$status = $media->nextChunk();
}
in previous answer from eveyrat everything is okay except one thing, readVideoChunk() doesn't always return exact amount of bytes as it is required in chunk upload docs. Moreover google accepts only as much bytes as it was in the first chunk request. I fixed that issue by custom buffering of overlap bytes:
(consider following piece of the code as a method of a class, the class has to have $remaining property)
function readVideoChunk ($handle, $chunkSize)
{
$byteCount = 0;
$giantChunk = "";
while (!feof($handle)) {
// fread will never return more than 8192 bytes if the stream is read buffered and it does not represent a plain file
$chunk = fread($handle, 8192);
$byteCount += strlen($chunk);
$giantChunk .= $chunk;
if ($byteCount > $chunkSize)
{
break;
}
}
$chunks = str_split(($this->remaining . $giantChunk), $chunkSize);
if (count($chunks) > 1) {
$giantChunk = $chunks[0];
$this->remaining = $chunks[1];
}
else {
$giantChunk = $chunks[0];
$this->remaining = '';
}
return $giantChunk;
}
Related
I want to play video from a remote server. so I write this function.
$remoteFile = 'blabla.com/video_5GB.mp4';
play($remoteFile);
function play($url){
ini_set('memory_limit', '1024M');
set_time_limit(3600);
ob_start();
if (isset($_SERVER['HTTP_RANGE'])) $opts['http']['header'] = "Range: " . $_SERVER['HTTP_RANGE'];
$opts['http']['method'] = "HEAD";
$conh = stream_context_create($opts);
$opts['http']['method'] = "GET";
$cong = stream_context_create($opts);
$out[] = file_get_contents($url, false, $conh);
$out[] = $httap_response_header;
ob_end_clean();
array_map("header", $http_response_header);
readfile($url, false, $cong);
}
The above function works very well in playing videos. But I don't want to burden the remote server
My question is how can I cache video files every 5 hours to my server. if possible, the cache folder contains small files (5MB / 10MB) from remote video
As mentioned in my comment, the following code has been tested only on a small selection of MP4 files. It could probably do with some more work but it does fill your immediate needs as it is.
It uses exec() to spawn a separate process that generates the cache files when they are needed, i.e. on the first request or after 5 hours. Each video must have its own cache folder because the cached chunks are simply called 1, 2, 3, etc. Please see additional comments in the code.
play.php - This is the script that will be called by the users from the browser
<?php
ini_set('memory_limit', '1024M');
set_time_limit(3600);
$remoteFile = 'blabla.com/video_5GB.mp4';
play($remoteFile);
/**
* #param string $url
*
* This will serve the video from the remote url
*/
function playFromRemote($url)
{
ob_start();
$opts = array();
if(isset($_SERVER['HTTP_RANGE']))
{
$opts['http']['header'] = "Range: ".$_SERVER['HTTP_RANGE'];
}
$opts['http']['method'] = "HEAD";
$conh = stream_context_create($opts);
$opts['http']['method'] = "GET";
$cong = stream_context_create($opts);
$out[] = file_get_contents($url, false, $conh);
$out[] = $http_response_header;
ob_end_clean();
$fh = fopen('response.log', 'a');
if($fh !== false)
{
fwrite($fh, print_r($http_response_header, true)."\n\n\n\n");
fclose($fh);
}
array_map("header", $http_response_header);
readfile($url, false, $cong);
}
/**
* #param string $cacheFolder Directory in which to find the cached chunk files
* #param string $url
*
* This will serve the video from the cache, it uses a "completed.log" file which holds the byte ranges of each chunk
* this makes it easier to locate the first chunk of a range request. The file is generated by the cache script
*/
function playFromCache($cacheFolder, $url)
{
$bytesFrom = 0;
$bytesTo = 0;
if(isset($_SERVER['HTTP_RANGE']))
{
//the client asked for a specific range, extract those from the http_range server var
//can take the form "bytes=123-567" or just a from "bytes=123-"
$matches = array();
if(preg_match('/^bytes=(\d+)-(\d+)?$/', $_SERVER['HTTP_RANGE'], $matches))
{
$bytesFrom = intval($matches[1]);
if(!empty($matches[2]))
{
$bytesTo = intval($matches[2]);
}
}
}
//completed log is a json_encoded file containing an array or byte ranges that directly
//correspond with the chunk files generated by the cache script
$log = json_decode(file_get_contents($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'));
$totalBytes = 0;
$chunk = 0;
foreach($log as $ind => $bytes)
{
//find the first chunk file we need to open
if($bytes[0] <= $bytesFrom && $bytes[1] > $bytesFrom)
{
$chunk = $ind + 1;
}
//and while we are at it save the last byte range "to" which is the total number of bytes of all the chunk files
$totalBytes = $bytes[1];
}
if($bytesTo === 0)
{
if($totalBytes === 0)
{
//if we get here then something is wrong with the cache, revert to serving from the remote
playFromRemote($url);
return;
}
$bytesTo = $totalBytes - 1;
}
//calculate how many bytes will be returned in this request
$contentLength = $bytesTo - $bytesFrom + 1;
//send some headers - I have hardcoded MP4 here because that is all I have developed with
//if you are using different video formats then testing and changes will no doubt be required
header('Content-Type: video/mp4');
header('Content-Length: '.$contentLength);
header('Accept-Ranges: bytes');
//Send a header so we can recognise that the content was indeed served by the cache
header('X-Cached-Date: '.(date('Y-m-d H:i:s', filemtime($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'))));
if($bytesFrom > 0)
{
//We are sending back a range so it needs a header and the http response must be 206: Partial Content
header(sprintf('content-range: bytes %s-%s/%s', $bytesFrom, $bytesTo, $totalBytes));
http_response_code(206);
}
$bytesSent = 0;
while(is_file($cacheFolder.DIRECTORY_SEPARATOR.$chunk) && $bytesSent < $contentLength)
{
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'rb');
if($cfh !== false)
{
//if we are fetching a range then we might need to seek the correct starting point in the first chunk we look at
//this check will be performed on all chunks but only the first one should need seeking so no harm done
if($log[$chunk - 1][0] < $bytesFrom)
{
fseek($cfh, $bytesFrom - $log[$chunk - 1][0]);
}
//read and send data until the end of the file or we have sent what was requested
while(!feof($cfh) && $bytesSent < $contentLength)
{
$data = fread($cfh, 1024);
//check we are not going to be sending too much back and if we are then truncate the data to the correct length
if($bytesSent + strlen($data) > $contentLength)
{
$data = substr($data, 0, $contentLength - $bytesSent);
}
$bytesSent += strlen($data);
echo $data;
}
fclose($cfh);
}
//move to the next chunk
$chunk ++;
}
}
function play($url)
{
//I have chosen a simple way to make a folder name, this can be improved any way you need
//IMPORTANT: Each video must have its own cache folder
$cacheFolder = sha1($url);
if(!is_dir($cacheFolder))
{
mkdir($cacheFolder, 0755, true);
}
//First check if we are currently in the process of generating the cache and so just play from remote
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'caching.log'))
{
playFromRemote($url);
}
//Otherwise check if we have never completed the cache or it was completed 5 hours ago and if so spawn a process to generate the cache
elseif(!is_file($cacheFolder.DIRECTORY_SEPARATOR.'completed.log') || filemtime($cacheFolder.DIRECTORY_SEPARATOR.'completed.log') + (5 * 60 * 60) < time())
{
//fork the caching to a separate process - the & echo $! at the end causes the process to run as a background task
//and print the process ID returning immediately
//The cache script can be anywhere, pass the location to sprintf in the first position
//A base64 encoded url is passed in as argument 1, sprintf second position
$cmd = sprintf('php %scache.php %s & echo $!', __DIR__.DIRECTORY_SEPARATOR, base64_encode($url));
$pid = exec($cmd);
//with that started we need to serve the request from the remote url
playFromRemote($url);
}
else
{
//if we got this far then we have a completed cache so serve from there
playFromCache($cacheFolder, $url);
}
}
cache.php - This script will be called by play.php via exec()
<?php
//This script expects as argument 1 a base64 encoded url
if(count($argv)!==2)
{
die('Invalid Request!');
}
$url = base64_decode($argv[1]);
//make sure to use the same method of obtaining the cache folder name as the main play script
//or change the code to pass it in as an argument
$cacheFolder = sha1($url);
if(!is_dir($cacheFolder))
{
die('Invalid Arguments!');
}
//double check it is not already running
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'caching.log'))
{
die('Already Running');
}
//create a file so we know this has started, the file will be removed at the end of the script
file_put_contents($cacheFolder.DIRECTORY_SEPARATOR.'caching.log', date('d/m/Y H:i:s'));
//get rid of the old completed log
if(is_file($cacheFolder.DIRECTORY_SEPARATOR.'completed.log'))
{
unlink($cacheFolder.DIRECTORY_SEPARATOR.'completed.log');
}
$bytesFrom = 0;
$bytesWritten = 0;
$totalBytes = 0;
//this is the size of the chunk files, currently 10MB
$maxSizeInBytes = 10 * 1024 * 1024;
$chunk = 1;
//open the url for binary reading and first chunk for binary writing
$fh = fopen($url, 'rb');
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'wb');
if($fh !== false && $cfh!==false)
{
$log = array();
while(!feof($fh))
{
$data = fread($fh, 1024);
fwrite($cfh, $data);
$totalBytes += strlen($data); //use actual length here
$bytesWritten += strlen($data);
//if we are on or passed the chunk size then close the chunk and open a new one
//keeping a log of the byte range of the chunk
if($bytesWritten>=$maxSizeInBytes)
{
$log[$chunk-1] = array($bytesFrom,$totalBytes);
$bytesFrom = $totalBytes;
fclose($cfh);
$chunk++;
$bytesWritten = 0;
$cfh = fopen($cacheFolder.DIRECTORY_SEPARATOR.$chunk, 'wb');
}
}
fclose($fh);
$log[$chunk-1] = array($bytesFrom,$totalBytes);
fclose($cfh);
//write the completed log. This is a json encoded string of the chunk byte ranges and will be used
//by the play script to quickly locate the starting chunk of a range request
file_put_contents($cacheFolder.DIRECTORY_SEPARATOR.'completed.log', json_encode($log));
//finally remove the caching log so the play script doesn't think the process is still running
unlink($cacheFolder.DIRECTORY_SEPARATOR.'caching.log');
}
I have gz compressed file and uncompressing it to normal file so I use fwrite(). It works fine when do uncompress in a single PHP request.
Because of large compressed files, I do mind in PHP timeouts, I uncompress the file until 30 seconds and stop the process and store the current offset of gz file using gztell() then continue it form the next PHP request where I left.
I do gzseek() with current offset and continue the uncompression but gzread() continuously gives empty string
function gz_uncompress_file($source, $offset = 0){
$dest = str_replace('.gz', '', $source);
$fp_in = gzopen($source, 'rb');
if (empty($fp_in)) {
return 'Cannot open gzfile to uncompress sql';
}
$fp_out = fopen($dest, 'ab');
if (empty($fp_out)) {
return 'Cannot open temp file to uncompress sql';
}
gzseek($fp_in, $offset);
$break = false;
while (!gzeof($fp_in)){
$chunk_data = gzread($fp_in, 1024 * 512);
if (empty($chunk_data)) {
return 'empty string so stop the process';
}
fwrite($fp_out, $chunk_data);
//Clearning to save memory
unset($chunk_data);
if(!is_time_out()){
continue;
}
$break = true;
$offset = gztell($fp_in);
break;
}
fclose($fp_out);
gzclose($fp_in);
if ($break) {
//Saving offset
$this->set_option('sql_gz_uncompression_offset', $offset);
return 'continue_from_call';
}
echo "Un compression done";
}
I am using some code from another answer to stream audio over http - but don't know what 'chunksize' would be best... The files can be very large, and are streamed to an audio tag on a web page, I'd like a quick start up... The files are mainly .wav and .mp3
function streamfile($filename, $retbytes = TRUE) {
$CHUNK_SIZE = 1024*1024; // Size (in bytes) of tiles chunk
$buffer = '';
$cnt =0;
// $handle = fopen($filename, 'rb');
$handle = fopen($filename, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
$buffer = fread($handle, $CHUNK_SIZE);
echo $buffer;
ob_flush();
flush();
if ($retbytes) {
$cnt += strlen($buffer);
}
}
$status = fclose($handle);
if ($retbytes && $status) {
return $cnt; // return num. bytes delivered like readfile() does.
}
return $status;
}
All advice seems to be 'suck it and see'!
I tried a range of sizes from 1k to 1024k getting various results, but nothing remarkable.
I also experimented with variable block sizes - get some out quick to get the track playing and use that time to send bigger blocks - but again its, trial and error.
I also had to tweek various settings to stop PHP/webserver etc from 'helpfully' doing their own caching!
I have settled on 512k for now... seems as good as any(!).
Hope this helps someone!
On a Linux System if you monitor the System Activity with the Command strace you will soon find out where the limitations of your script are.
For example monitoring the Performance of a curl Request you might find that the Bandwidth is the limitating Factor.
It might show for Example:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 41
fcntl(41, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(41, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(41, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("<some.ip>")}, 16) = -1 EINPROGRESS (Operation now in progress)
sendto(41, "GET /some-service/some"..., 3480, MSG_NOSIGNAL, NULL, 0) = 3480
recvfrom(41, "HTTP/1.1 200 OK\r\nTransaction-Id:"..., 16384, 0, NULL, NULL) = 14720
recvfrom(41, "\332\347\330Y\271\263F\314\7\225\271\264[16b\336_\346B[\n\350\336\210\371\10\371\226\373:\363"..., 16384, 0, NULL, NULL) = 16384
recvfrom(41, "\255W\23#\10\325\365A<\26\1\4\\\246\252\347\350\17\346kH\250\\&`BU\337\352\346m\225"..., 16384, 0, NULL, NULL) = 12223
close(41) = 0
In this Example you see that you received 3 packages of 14, 16 and 12 kB with an upper Limit of 16kB by default.
The first Packages is not filled because of bandwidth limitations that do not permit to fill up the internal buffer in the required reading time limit.
If you want to change the internal chunk limit of the PHP Engine you can do that with:
$handle = fopen($filename, 'rb');
stream_set_chunk_size($handle, $CHUNK_SIZE);
stream_set_read_buffer($handle, $CHUNK_SIZE);
I'm facing to a problem and I'm not really sure if this is the right way of doing this. I need to copy a file from a remote server to my server with php.
I use the following script :
public function download($file_source, $file_target) {
$rh = fopen($file_source, 'rb');
$wh = fopen($file_target, 'w+b');
if (!$rh || !$wh) {
return false;
}
while (!feof($rh)) {
if (fwrite($wh, fread($rh, 4096)) === FALSE) {
return false;
}
echo ' ';
flush();
}
fclose($rh);
fclose($wh);
return true;
}
but in the end, the file size remains at 0.
EDIT : I update my question, because there are still some things I didn't understand :
About fread, I used 2048mb. But it didn't work.
I found the script above, which uses 4096mb.
My question : How to determine which quantity of memory (?) to use in order no get the file downloaded anytime ? Because this one works on a specific machine (dedicated), but will it on a shared host, if I cannot modify the php.ini ?
Thanks again
filesize() expects a filename/path. You're passing in a filehandle, which means filesize will FAIL and return a boolean false.
You then use that false as the size argument for your fread, which gets translated to an integer 0. So essentially you're sitting there telling php to read a file, 0 bytes at a time.
You cannot reliably get the size of a remote file anyways, so just have fread some fixed number of bytes, e.g. 2048, at a time.
while(!feof($handle)) {
$contents = fread($handle, 2048);
fwrite($f, $contents);
}
and if that file isn't too big and/or your PHP can handle it:
file_put_contents('local.mp4', file_get_contents('http://whatever/foo.mp4'));
PHP has built in support for reading EXIF and IPTC metadata, but I can't find any way to read XMP?
XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.
The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):
$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end = strpos($content, '</x:xmpmeta>');
$xmp_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp = simplexml_load_string($xmp_data);
Just two remarks:
XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.
I'm only replying to this after so much time because this seems to be the best result when searching Google for how to parse XMP data. I've seen this nearly identical snippet used in code a few times and it's a terrible waste of memory. Here is an example of the fopen() method Stefan mentions after his example.
<?php
function getXmpData($filename, $chunkSize)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$startTag = '<x:xmpmeta';
$endTag = '</x:xmpmeta>';
$buffer = NULL;
$hasXmp = FALSE;
while (($chunk = fread($file_pointer, $chunkSize)) !== FALSE) {
if ($chunk === "") {
break;
}
$buffer .= $chunk;
$startPosition = strpos($buffer, $startTag);
$endPosition = strpos($buffer, $endTag);
if ($startPosition !== FALSE && $endPosition !== FALSE) {
$buffer = substr($buffer, $startPosition, $endPosition - $startPosition + 12);
$hasXmp = TRUE;
break;
} elseif ($startPosition !== FALSE) {
$buffer = substr($buffer, $startPosition);
$hasXmp = TRUE;
} elseif (strlen($buffer) > (strlen($startTag) * 2)) {
$buffer = substr($buffer, strlen($startTag));
}
}
fclose($file_pointer);
return ($hasXmp) ? $buffer : NULL;
}
A simple way on linux is to call the exiv2 program, available in an eponymous package on debian.
$ exiv2 -e X extract image.jpg
will produce image.xmp containing embedded XMP which is now yours to parse.
I know... this is kind of an old thread, but it was helpful to me when I was looking for a way to do this, so I figured this might be helpful to someone else.
I took this basic solution and modified it so it handles the case where the tag is split between chunks. This allows the chunk size to be as large or small as you want.
<?php
function getXmpData($filename, $chunk_size = 1024)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'rb')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$tag = '<x:xmpmeta';
$buffer = false;
// find open tag
while ($buffer === false && ($chunk = fread($file_pointer, $chunk_size)) !== false) {
if(strlen($chunk) <= 10) {
break;
}
if(($position = strpos($chunk, $tag)) === false) {
// if open tag not found, back up just in case the open tag is on the split.
fseek($file_pointer, -10, SEEK_CUR);
} else {
$buffer = substr($chunk, $position);
}
}
if($buffer === false) {
fclose($file_pointer);
return false;
}
$tag = '</x:xmpmeta>';
$offset = 0;
while (($position = strpos($buffer, $tag, $offset)) === false && ($chunk = fread($file_pointer, $chunk_size)) !== FALSE && !empty($chunk)) {
$offset = strlen($buffer) - 12; // subtract the tag size just in case it's split between chunks.
$buffer .= $chunk;
}
fclose($file_pointer);
if($position === false) {
// this would mean the open tag was found, but the close tag was not. Maybe file corruption?
throw new RuntimeException('No close tag found. Possibly corrupted file.');
} else {
$buffer = substr($buffer, 0, $position + 12);
}
return $buffer;
}
?>
Bryan's solution was the best one so far, but it had a few issues so I modified it to simplify it, and remove some functionality.
There were three issues I found with his solution:
A) If the chunk extracted falls right in between one of the strings we're searching for, it won't find it. Small chunk sizes are more likely to cause this issue.
B) If the chunk contains both the start AND the end, it won't find it. This is an easy one to fix with an extra if statement to recheck the chunk that the start is found in to see if the end is also found.
C) The else statement added to the end to break the while loop if it doesn't find the xmp data has a side effect that if the start element isn't found on the first pass, it will not check anymore chunks. This is likely easy to fix too, but with the first issue it's not worth it.
My solution below isn't as powerful, but it's more robust. It will only check one chunk, and extract the data from that. It will only work if the start and end are in that chunk, so the chunk size needs to be large enough to ensure that it always captures that data. From my experience with Adobe Photoshop/Lightroom exported files, the xmp data typically starts at around 20kB, and ends at around 45kB. My chunk size of 50k seems to work nicely for my images, it would be much less if you strip some of that data on export, such as the CRS block that has a lot of develop settings.
function getXmpData($filename)
{
$chunk_size = 50000;
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
return $buffer;
}
Thank you Sebastien B. for that shortened version :). If you want to avoid the problem, when chunk_size is just too small for some files, just add recursion.
function getXmpData($filename, $chunk_size = 50000){
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
// recursion here
if(!strpos($buffer, '</x:xmpmeta>')){
$buffer = getXmpData($filename, $chunk_size*2);
}
return $buffer;
}
I've developped the Xmp Php Tookit extension : it's a php5 extension based on the adobe xmp toolkit, which provide the main classes and method to read/write/parse xmp metadatas from jpeg, psd, pdf, video, audio... This extension is under gpl licence. A new release will be available soon, for php 5.3 (now only compatible with php 5.2.x), and should be available on windows and macosx (now only for freebsd and linux systems).
http://xmpphptoolkit.sourceforge.net/
If you have ExifTool available (a very useful tool) and can run external commands, you can use it's option to extract XMP data (-xmp:all) and output it in JSON format (-json), which you can then easily convert to a PHP object:
$command = 'exiftool -g -json -struct -xmp:all "'.$image_path.'"';
exec($command, $output, $return_var);
$metadata = implode('', $output);
$metadata = json_decode($metadata);
There is now also a github repo you can add via composer that can read xmp data:
https://github.com/jeroendesloovere/xmp-metadata-extractor
composer require jeroendesloovere/xmp-metadata-extractor