How can I read PNG Metadata from PHP? - php

This is what I have so far:
<?php
$file = "18201010338AM16390621000846.png";
$test = file_get_contents($file, FILE_BINARY);
echo str_replace("\n","<br>",$test);
?>
The output is sorta what I want, but I really only need lines 3-7 (inclusively). This is what the output looks like now: http://silentnoobs.com/pbss/collector/test.php. I am trying to get the data from "PunkBuster Screenshot (±) AAO Bridge Crossing" to "Resulting: w=394 X h=196 sample=2". I think it'd be fairly straight forward to read through the file, and store each line in an array, line[0] would need to be "PunkBuster Screenshot (±) AAO Bridge Crossing", and so on. All those lines are subject to change, so I can't just search for something finite.
I've tried for a few days now, and it doesn't help much that I'm poor at php.

The PNG file format defines that a PNG document is split up into multiple chunks of data. You must therefore navigate your way to the chunk you desire.
The data you want to extract seem to be defined in a tEXt chunk. I've written the following class to allow you to extract chunks from PNG files.
class PNG_Reader
{
private $_chunks;
private $_fp;
function __construct($file) {
if (!file_exists($file)) {
throw new Exception('File does not exist');
}
$this->_chunks = array ();
// Open the file
$this->_fp = fopen($file, 'r');
if (!$this->_fp)
throw new Exception('Unable to open file');
// Read the magic bytes and verify
$header = fread($this->_fp, 8);
if ($header != "\x89PNG\x0d\x0a\x1a\x0a")
throw new Exception('Is not a valid PNG image');
// Loop through the chunks. Byte 0-3 is length, Byte 4-7 is type
$chunkHeader = fread($this->_fp, 8);
while ($chunkHeader) {
// Extract length and type from binary data
$chunk = #unpack('Nsize/a4type', $chunkHeader);
// Store position into internal array
if ($this->_chunks[$chunk['type']] === null)
$this->_chunks[$chunk['type']] = array ();
$this->_chunks[$chunk['type']][] = array (
'offset' => ftell($this->_fp),
'size' => $chunk['size']
);
// Skip to next chunk (over body and CRC)
fseek($this->_fp, $chunk['size'] + 4, SEEK_CUR);
// Read next chunk header
$chunkHeader = fread($this->_fp, 8);
}
}
function __destruct() { fclose($this->_fp); }
// Returns all chunks of said type
public function get_chunks($type) {
if ($this->_chunks[$type] === null)
return null;
$chunks = array ();
foreach ($this->_chunks[$type] as $chunk) {
if ($chunk['size'] > 0) {
fseek($this->_fp, $chunk['offset'], SEEK_SET);
$chunks[] = fread($this->_fp, $chunk['size']);
} else {
$chunks[] = '';
}
}
return $chunks;
}
}
You may use it as such to extract your desired tEXt chunk as such:
$file = '18201010338AM16390621000846.png';
$png = new PNG_Reader($file);
$rawTextData = $png->get_chunks('tEXt');
$metadata = array();
foreach($rawTextData as $data) {
$sections = explode("\0", $data);
if($sections > 1) {
$key = array_shift($sections);
$metadata[$key] = implode("\0", $sections);
} else {
$metadata[] = $data;
}
}

<?php
$fp = fopen('18201010338AM16390621000846.png', 'rb');
$sig = fread($fp, 8);
if ($sig != "\x89PNG\x0d\x0a\x1a\x0a")
{
print "Not a PNG image";
fclose($fp);
die();
}
while (!feof($fp))
{
$data = unpack('Nlength/a4type', fread($fp, 8));
if ($data['type'] == 'IEND') break;
if ($data['type'] == 'tEXt')
{
list($key, $val) = explode("\0", fread($fp, $data['length']));
echo "<h1>$key</h1>";
echo nl2br($val);
fseek($fp, 4, SEEK_CUR);
}
else
{
fseek($fp, $data['length'] + 4, SEEK_CUR);
}
}
fclose($fp);
?>
It assumes a basically well formed PNG file.

I found this problem a few days ago, so I made a library to extract the metadata (Exif, XMP and GPS) of a PNG in PHP, 100% native, I hope it helps. :) PNGMetadata

How about:
http://www.php.net/manual/en/function.getimagesize.php

Related

How to hash file with multiple algorithms at the same time in PHP?

I would like to hash a given file using multiple algorithms but now I'm doing it sequentially, like this:
return [
hash_file('md5', $uri),
hash_file('sha1', $uri),
hash_file('sha256', $uri)
];
Is there anyway to hash that file opening only one stream and not N where N is the amount of algos I want to use? Something like this:
return hash_file(['md5', 'sha1', 'sha256'], $uri);
You can open a file pointer and then use hash_init() with hash_update() to calculate the hash on the file without opening the file many times, then use hash_final() to get the resulting hash.
<?php
function hash_file_multi($algos = [], $filename) {
if (!is_array($algos)) {
throw new \InvalidArgumentException('First argument must be an array');
}
if (!is_string($filename)) {
throw new \InvalidArgumentException('Second argument must be a string');
}
if (!file_exists($filename)) {
throw new \InvalidArgumentException('Second argument, file not found');
}
$result = [];
$fp = fopen($filename, "r");
if ($fp) {
// ini hash contexts
foreach ($algos as $algo) {
$ctx[$algo] = hash_init($algo);
}
// calculate hash
while (!feof($fp)) {
$buffer = fgets($fp, 65536);
foreach ($ctx as $key => $context) {
hash_update($ctx[$key], $buffer);
}
}
// finalise hash and store in return
foreach ($algos as $algo) {
$result[$algo] = hash_final($ctx[$algo]);
}
fclose($fp);
} else {
throw new \InvalidArgumentException('Could not open file for reading');
}
return $result;
}
$result = hash_file_multi(['md5', 'sha1', 'sha256'], $uri);
var_dump($result['md5'] === hash_file('md5', $uri)); //true
var_dump($result['sha1'] === hash_file('sha1', $uri)); //true
var_dump($result['sha256'] === hash_file('sha256', $uri)); //true
Also posted to PHP manual: http://php.net/manual/en/function.hash-file.php#122549
Here's a modification of Lawrence Cherone's solution* that reads the file only once, and works even for non-seekable streams such as STDIN:
<?php
function hash_stream_multi($algos = [], $stream) {
if (!is_array($algos)) {
throw new \InvalidArgumentException('First argument must be an array');
}
if (!is_resource($stream)) {
throw new \InvalidArgumentException('Second argument must be a resource');
}
$result = [];
foreach ($algos as $algo) {
$ctx[$algo] = hash_init($algo);
}
while (!feof($stream)) {
$chunk = fread($stream, 1 << 20); // read data in 1 MiB chunks
foreach ($algos as $algo) {
hash_update($ctx[$algo], $chunk);
}
}
foreach ($algos as $algo) {
$result[$algo] = hash_final($ctx[$algo]);
}
return $result;
}
// test: hash standard input with MD5, SHA-1 and SHA-256
$result = hash_stream_multi(['md5', 'sha1', 'sha256'], STDIN);
print_r($result);
Try it online!
It works by reading the data from the input stream with fread() in chunks (of one megabyte, which should give a reasonable balance between performance and memory use) and feeding the chunks to each hash with hash_update().
*) Lawrence updated his answer while I was writing this, but I feel that mine is still sufficiently distinct to justify keeping both of them. The main differences between this solution and Lawrence's updated version are that my function takes an input stream instead of a filename, and that I'm using fread() instead of fgets() (since for hashing, there's no need to split the input on newlines).

Tailing Log File and Write results to new file

I'm not sure how to word this so I'll type it out and then edit and answer any questions that come up..
Currently on my local network device (PHP4 based) I'm using this to tail a live system log file: http://commavee.com/2007/04/13/ajax-logfile-tailer-viewer/
This works well and every 1 second it loads an external page (logfile.php) that does a tail -n 100 logfile.log The script doesn't do any buffering so the results it displayes onscreen are the last 100 lines from the log file.
The logfile.php contains :
<? // logtail.php $cmd = "tail -10 /path/to/your/logs/some.log"; exec("$cmd 2>&1", $output);
foreach($output as $outputline) {
echo ("$outputline\n");
}
?>
This part is working well.
I have adapted the logfile.php page to write the $outputline to a new text file, simply using fwrite($fp,$outputline."\n");
Whilst this works I am having issues with duplication in the new file that is created.
Obviously each time tail -n 100 is run produces results, the next time it runs it could produce some of the same lines, as this repeats I can end up with multiple lines of duplication in the new text file.
I can't directly compare the line I'm about to write to previous lines as there could be identical matches.
Is there any way I can compare this current block of 100 lines with the previous block and then only write the lines that are not matching.. Again possible issue that block A & B will contain identical lines that are needed...
Is it possible to update logfile.php to note the position it last tooked at in my logfile and then only read the next 100 lines from there and write those to the new file ?
The log file could be upto 500MB so I don't want to read it all in each time..
Any advice or suggestions welcome..
Thanks
UPDATE # 16:30
I've sort of got this working using :
$file = "/logs/syst.log";
$handle = fopen($file, "r");
if(isset($_SESSION['ftell'])) {
clearstatcache();
fseek($handle, $_SESSION['ftell']);
while ($buffer = fgets($handle)) {
echo $buffer."<br/>";
#ob_flush(); #flush();
}
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
} else {
fseek($handle, -1024, SEEK_END);
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
}
This seems to work, but it loads the entire file first and then just the updates.
How would I get it start with the last 50 lines and then just the updates ?
Thanks :)
UPDATE 04/06/2013
Whilst this works it's very slow with large files.
I've tried this code and it seems faster, but it doesn't just read from where it left off.
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Any way this can be amend so it will read from the last known position.. ?
Thanks
Introduction
You can tail a file by tracking the last position;
Example
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
$data = $tail->tail(100) ;
// Save $data to new file
TailLog is a simple class i wrote for this task here is a simple example to show its actually tailing the file
Simple Test
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
// Some Random Data
$data = array_chunk(range("a", "z"), 3);
// Write Log
file_put_contents($file, implode("\n", array_shift($data)));
// First Tail (2) Run
print_r($tail->tail(2));
// Run Tail (2) Again
print_r($tail->tail(2));
// Write Another data to Log
file_put_contents($file, "\n" . implode("\n", array_shift($data)), FILE_APPEND);
// Call Tail Again after writing Data
print_r($tail->tail(2));
// See the full content
print_r(file_get_contents($file));
Output
// First Tail (2) Run
Array
(
[0] => c
[1] => b
)
// Run Tail (2) Again
Array
(
)
// Call Tail Again after writing Data
Array
(
[0] => f
[1] => e
)
// See the full content
a
b
c
d
e
f
Real Time Tailing
while(true) {
$data = $tail->tail(100);
// write data to another file
sleep(5);
}
Note: Tailing 100 lines does not mean it would always return 100 lines. It would return new lines added 100 is just the maximum number of lines to return. This might not be efficient where you have heavy logging of more than 100 line per sec is there is any
Tail Class
class TailLog {
private $file;
private $data;
private $timeout = 5;
private $lock;
function __construct($file) {
$this->file = $file;
$this->lock = new TailLock($file);
}
public function tail($lines) {
$pos = - 2;
$t = $lines;
$fp = fopen($this->file, "r");
$break = false;
$line = "";
$text = array();
while($t > 0) {
$c = "";
// Seach for End of line
while($c != "\n" && $c != PHP_EOL) {
if (fseek($fp, $pos, SEEK_END) == - 1) {
$break = true;
break;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$c = fgetc($fp);
$pos --;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$t --;
$break && rewind($fp);
$text[$lines - $t - 1] = fgets($fp);
if ($break) {
break;
}
}
// Move to end
fseek($fp, 0, SEEK_END);
// Save Position
$this->lock->save(ftell($fp));
// Close File
fclose($fp);
return array_map("trim", $text);
}
}
Tail Lock
class TailLock {
private $file;
private $lock;
private $data;
function __construct($file) {
$this->file = $file;
$this->lock = $file . ".tail";
touch($this->lock);
if (! is_file($this->lock))
throw new Exception("can't Create Lock File");
$this->data = json_decode(file_get_contents($this->lock));
// Check if file is valida json
// Check if Data in the original files as not be delete
// You expect data to increate not decrease
if (! $this->data || $this->data->size > filesize($this->file)) {
$this->reset($file);
}
}
function getPosition() {
return $this->data->position;
}
function reset() {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = 0;
$this->update();
}
function save($pos) {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = $pos;
$this->update();
}
function update() {
return file_put_contents($this->lock, json_encode($this->data, 128));
}
}
Not really clear on how you want to use the output but would something like this work ....
$dat = file_get_contents("tracker.dat");
$fp = fopen("/logs/syst.log", "r");
fseek($fp, $dat, SEEK_SET);
ob_start();
// alternatively you can do a while fgets if you want to interpret the file or do something
fpassthru($fp);
$pos = ftell($fp);
fclose($fp);
echo nl2br(ob_get_clean());
file_put_contents("tracker.dat", ftell($fp));
tracker.dat is just a text file that contains where the read position position was from the previous run. I'm just seeking to that position and piping the rest to the output buffer.
Use tail -c <number of bytes, instead of number of lines, and then check the file size. The rough idea is:
$old_file_size = 0;
$max_bytes = 512;
function last_lines($path) {
$new_file_size = filesize($path);
$pending_bytes = $new_file_size - $old_file_size;
if ($pending_bytes > $max_bytes) $pending_bytes = $max_bytes;
exec("tail -c " + $pending_bytes + " /path/to/your_log", $output);
$old_file_size = $new_file_size;
return $output;
}
The advantage is that you can do away with all the special processing stuff, and get good performance. The disadvantage is that you have to manually split the output into lines, and probably you could end up with unfinished lines. But this isn't a big deal, you can easily work around by omitting the last line alone from the output (and appropriately subtracting the last line number of bytes from old_file_size).

Reversing A text file using PHP

I have a file that is sorted using natsort()...(In ascending order)
But actually i want to sort it in descending order..
I mean the last line of document must be first line and vice versa
Pls let me know is there any function or snippet to achive this..
I'm not that good at php, Appreciate all responses irrespective of quality...Thank You
use natsort() and than use function array_reverse().
Also refer link
PHP Grab last 15 lines in txt file
it might help you.
array_reverse will give the contents in descending order
$reverse = array_reverse($array, true);
Whilst not the most efficient approach for a large text file, you could use file, array_reverse and file_put_contents to achieve this as follows...
<?php
// Fetch each line from the file into an array
$fileLines = file('/path/to/text/file.txt');
// Swap the order of the array
$invertedLines = array_reverse($fileLines);
// Write the data back to disk
file_put_contents('/path/to/write/new/file/to.txt', $invertedLines);
?>
...to achieve what you're after.
For longer files:
<?php
function rfopen($path, $mode)
{
$fp = fopen($path, $mode);
fseek($fp, -1, SEEK_END);
if (fgetc($fp) !== PHP_EOL) fseek($fp, 1, SEEK_END);
return $fp;
}
function rfgets($fp, $strip = false)
{
$s = '';
while (true) {
if (fseek($fp, -2, SEEK_CUR) === -1) {
if (!empty($s)) break;
return false;
}
if (($c = fgetc($fp)) === PHP_EOL) break;
$s = $c . $s;
}
if (!$strip) $s .= PHP_EOL;
return $s;
}
$file = '/path/to/your/file.txt';
$src = rfopen($file, 'rb');
$tgt = fopen("$file.rev", 'w');
while ($line = rfgets($src)) {
fwrite($tgt, $line);
}
fclose($src);
fclose($tgt);
// rename("$file.rev", $file);
Replace '/path/to/your/file.txt' with the path to your file.
Uncomment the last line to overwrite your file.

reading and counting words from pdf document

i have been working on this text extraction project of various file extensions,
but i am having the most pain with pdf and powerpoint,here is the code for pdf
any one here know how to read text from existing pdf documents using any tool or library tcpdf , xpdf or fpdfi because i havent seen any exact solution for reading text from pdf or ppt,but please no zend solutions
function pdf2txt($filename){
$data = getFileData($filename);
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],strlen("stream\r\n"),strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =# gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
// Function : ps2txt()
// Arguments : $ps_data - postscript data you want to convert to plain text
// Description : Does a very basic parse of postscript data to
// : return the plain text
// Author : Jonathan Beckett, 2005-05-02
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
// Function : getFileData()
// Arguments : $filename - filename you want to load
// Description : Reads data from a file into a variable
// and passes that data back
// Author : Jonathan Beckett, 2005-05-02
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
// Function : getDataArray()
// Arguments : $data - data you want to chop up
// $start_word - delimiting characters at start of each chunk
// $end_word - delimiting characters at end of each chunk
// Description : Loop through an array of data and put all chunks
// between start_word and end_word in an array
// Author : Jonathan Beckett, 2005-05-02
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
this one is for powerpoint i found here some where but that isnt working also
function parsePPT($filename) {
// This approach uses detection of the string "chr(0f).Hex_value.chr(0x00).chr(0x00).chr(0x00)" to find text strings, which are then terminated by another NUL chr(0x00). [1] Get text between delimiters [2]
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0f),$line);
$outtext = '';
foreach($lines as $thisline) {
if (strpos($thisline, chr(0x00).chr(0x00).chr(0x00)) == 1) {
$text_line = substr($thisline, 4);
$end_pos = strpos($text_line, chr(0x00));
$text_line = substr($text_line, 0, $end_pos);
$text_line = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/","",$text_line);
if(substr($text_line,0,20)!="Click to edit Master")
if (strlen($text_line) > 1) {
$outtext.= substr($text_line, 0, $end_pos)."\n<br>";
}
}
}
return $outtext;
}
Why are you trying to reinvent the wheel? You could either resort to using ie. xpdf or a similar tool to extract the text data inside the PDF, and afterwards process the plain text file resulting from that operation. The same approach could be used for virtually any file format that contains text (ie. first convert to a plain text version, then process that)...
Indexing PDF Documents with Zend_Search_Lucene could be an interesting read if you opt for that solution.

How can I read XMP data from a JPG with PHP?

PHP has built in support for reading EXIF and IPTC metadata, but I can't find any way to read XMP?
XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.
The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):
$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end = strpos($content, '</x:xmpmeta>');
$xmp_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp = simplexml_load_string($xmp_data);
Just two remarks:
XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.
I'm only replying to this after so much time because this seems to be the best result when searching Google for how to parse XMP data. I've seen this nearly identical snippet used in code a few times and it's a terrible waste of memory. Here is an example of the fopen() method Stefan mentions after his example.
<?php
function getXmpData($filename, $chunkSize)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$startTag = '<x:xmpmeta';
$endTag = '</x:xmpmeta>';
$buffer = NULL;
$hasXmp = FALSE;
while (($chunk = fread($file_pointer, $chunkSize)) !== FALSE) {
if ($chunk === "") {
break;
}
$buffer .= $chunk;
$startPosition = strpos($buffer, $startTag);
$endPosition = strpos($buffer, $endTag);
if ($startPosition !== FALSE && $endPosition !== FALSE) {
$buffer = substr($buffer, $startPosition, $endPosition - $startPosition + 12);
$hasXmp = TRUE;
break;
} elseif ($startPosition !== FALSE) {
$buffer = substr($buffer, $startPosition);
$hasXmp = TRUE;
} elseif (strlen($buffer) > (strlen($startTag) * 2)) {
$buffer = substr($buffer, strlen($startTag));
}
}
fclose($file_pointer);
return ($hasXmp) ? $buffer : NULL;
}
A simple way on linux is to call the exiv2 program, available in an eponymous package on debian.
$ exiv2 -e X extract image.jpg
will produce image.xmp containing embedded XMP which is now yours to parse.
I know... this is kind of an old thread, but it was helpful to me when I was looking for a way to do this, so I figured this might be helpful to someone else.
I took this basic solution and modified it so it handles the case where the tag is split between chunks. This allows the chunk size to be as large or small as you want.
<?php
function getXmpData($filename, $chunk_size = 1024)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'rb')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$tag = '<x:xmpmeta';
$buffer = false;
// find open tag
while ($buffer === false && ($chunk = fread($file_pointer, $chunk_size)) !== false) {
if(strlen($chunk) <= 10) {
break;
}
if(($position = strpos($chunk, $tag)) === false) {
// if open tag not found, back up just in case the open tag is on the split.
fseek($file_pointer, -10, SEEK_CUR);
} else {
$buffer = substr($chunk, $position);
}
}
if($buffer === false) {
fclose($file_pointer);
return false;
}
$tag = '</x:xmpmeta>';
$offset = 0;
while (($position = strpos($buffer, $tag, $offset)) === false && ($chunk = fread($file_pointer, $chunk_size)) !== FALSE && !empty($chunk)) {
$offset = strlen($buffer) - 12; // subtract the tag size just in case it's split between chunks.
$buffer .= $chunk;
}
fclose($file_pointer);
if($position === false) {
// this would mean the open tag was found, but the close tag was not. Maybe file corruption?
throw new RuntimeException('No close tag found. Possibly corrupted file.');
} else {
$buffer = substr($buffer, 0, $position + 12);
}
return $buffer;
}
?>
Bryan's solution was the best one so far, but it had a few issues so I modified it to simplify it, and remove some functionality.
There were three issues I found with his solution:
A) If the chunk extracted falls right in between one of the strings we're searching for, it won't find it. Small chunk sizes are more likely to cause this issue.
B) If the chunk contains both the start AND the end, it won't find it. This is an easy one to fix with an extra if statement to recheck the chunk that the start is found in to see if the end is also found.
C) The else statement added to the end to break the while loop if it doesn't find the xmp data has a side effect that if the start element isn't found on the first pass, it will not check anymore chunks. This is likely easy to fix too, but with the first issue it's not worth it.
My solution below isn't as powerful, but it's more robust. It will only check one chunk, and extract the data from that. It will only work if the start and end are in that chunk, so the chunk size needs to be large enough to ensure that it always captures that data. From my experience with Adobe Photoshop/Lightroom exported files, the xmp data typically starts at around 20kB, and ends at around 45kB. My chunk size of 50k seems to work nicely for my images, it would be much less if you strip some of that data on export, such as the CRS block that has a lot of develop settings.
function getXmpData($filename)
{
$chunk_size = 50000;
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
return $buffer;
}
Thank you Sebastien B. for that shortened version :). If you want to avoid the problem, when chunk_size is just too small for some files, just add recursion.
function getXmpData($filename, $chunk_size = 50000){
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
// recursion here
if(!strpos($buffer, '</x:xmpmeta>')){
$buffer = getXmpData($filename, $chunk_size*2);
}
return $buffer;
}
I've developped the Xmp Php Tookit extension : it's a php5 extension based on the adobe xmp toolkit, which provide the main classes and method to read/write/parse xmp metadatas from jpeg, psd, pdf, video, audio... This extension is under gpl licence. A new release will be available soon, for php 5.3 (now only compatible with php 5.2.x), and should be available on windows and macosx (now only for freebsd and linux systems).
http://xmpphptoolkit.sourceforge.net/
If you have ExifTool available (a very useful tool) and can run external commands, you can use it's option to extract XMP data (-xmp:all) and output it in JSON format (-json), which you can then easily convert to a PHP object:
$command = 'exiftool -g -json -struct -xmp:all "'.$image_path.'"';
exec($command, $output, $return_var);
$metadata = implode('', $output);
$metadata = json_decode($metadata);
There is now also a github repo you can add via composer that can read xmp data:
https://github.com/jeroendesloovere/xmp-metadata-extractor
composer require jeroendesloovere/xmp-metadata-extractor

Categories