How can I read XMP data from a JPG with PHP? - php

PHP has built in support for reading EXIF and IPTC metadata, but I can't find any way to read XMP?

XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.
The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):
$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end = strpos($content, '</x:xmpmeta>');
$xmp_length = $xmp_data_end - $xmp_data_start;
$xmp_data = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp = simplexml_load_string($xmp_data);
Just two remarks:
XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.

I'm only replying to this after so much time because this seems to be the best result when searching Google for how to parse XMP data. I've seen this nearly identical snippet used in code a few times and it's a terrible waste of memory. Here is an example of the fopen() method Stefan mentions after his example.
<?php
function getXmpData($filename, $chunkSize)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$startTag = '<x:xmpmeta';
$endTag = '</x:xmpmeta>';
$buffer = NULL;
$hasXmp = FALSE;
while (($chunk = fread($file_pointer, $chunkSize)) !== FALSE) {
if ($chunk === "") {
break;
}
$buffer .= $chunk;
$startPosition = strpos($buffer, $startTag);
$endPosition = strpos($buffer, $endTag);
if ($startPosition !== FALSE && $endPosition !== FALSE) {
$buffer = substr($buffer, $startPosition, $endPosition - $startPosition + 12);
$hasXmp = TRUE;
break;
} elseif ($startPosition !== FALSE) {
$buffer = substr($buffer, $startPosition);
$hasXmp = TRUE;
} elseif (strlen($buffer) > (strlen($startTag) * 2)) {
$buffer = substr($buffer, strlen($startTag));
}
}
fclose($file_pointer);
return ($hasXmp) ? $buffer : NULL;
}

A simple way on linux is to call the exiv2 program, available in an eponymous package on debian.
$ exiv2 -e X extract image.jpg
will produce image.xmp containing embedded XMP which is now yours to parse.

I know... this is kind of an old thread, but it was helpful to me when I was looking for a way to do this, so I figured this might be helpful to someone else.
I took this basic solution and modified it so it handles the case where the tag is split between chunks. This allows the chunk size to be as large or small as you want.
<?php
function getXmpData($filename, $chunk_size = 1024)
{
if (!is_int($chunkSize)) {
throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
}
if ($chunkSize < 12) {
throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
}
if (($file_pointer = fopen($filename, 'rb')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$tag = '<x:xmpmeta';
$buffer = false;
// find open tag
while ($buffer === false && ($chunk = fread($file_pointer, $chunk_size)) !== false) {
if(strlen($chunk) <= 10) {
break;
}
if(($position = strpos($chunk, $tag)) === false) {
// if open tag not found, back up just in case the open tag is on the split.
fseek($file_pointer, -10, SEEK_CUR);
} else {
$buffer = substr($chunk, $position);
}
}
if($buffer === false) {
fclose($file_pointer);
return false;
}
$tag = '</x:xmpmeta>';
$offset = 0;
while (($position = strpos($buffer, $tag, $offset)) === false && ($chunk = fread($file_pointer, $chunk_size)) !== FALSE && !empty($chunk)) {
$offset = strlen($buffer) - 12; // subtract the tag size just in case it's split between chunks.
$buffer .= $chunk;
}
fclose($file_pointer);
if($position === false) {
// this would mean the open tag was found, but the close tag was not. Maybe file corruption?
throw new RuntimeException('No close tag found. Possibly corrupted file.');
} else {
$buffer = substr($buffer, 0, $position + 12);
}
return $buffer;
}
?>

Bryan's solution was the best one so far, but it had a few issues so I modified it to simplify it, and remove some functionality.
There were three issues I found with his solution:
A) If the chunk extracted falls right in between one of the strings we're searching for, it won't find it. Small chunk sizes are more likely to cause this issue.
B) If the chunk contains both the start AND the end, it won't find it. This is an easy one to fix with an extra if statement to recheck the chunk that the start is found in to see if the end is also found.
C) The else statement added to the end to break the while loop if it doesn't find the xmp data has a side effect that if the start element isn't found on the first pass, it will not check anymore chunks. This is likely easy to fix too, but with the first issue it's not worth it.
My solution below isn't as powerful, but it's more robust. It will only check one chunk, and extract the data from that. It will only work if the start and end are in that chunk, so the chunk size needs to be large enough to ensure that it always captures that data. From my experience with Adobe Photoshop/Lightroom exported files, the xmp data typically starts at around 20kB, and ends at around 45kB. My chunk size of 50k seems to work nicely for my images, it would be much less if you strip some of that data on export, such as the CRS block that has a lot of develop settings.
function getXmpData($filename)
{
$chunk_size = 50000;
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
return $buffer;
}

Thank you Sebastien B. for that shortened version :). If you want to avoid the problem, when chunk_size is just too small for some files, just add recursion.
function getXmpData($filename, $chunk_size = 50000){
$buffer = NULL;
if (($file_pointer = fopen($filename, 'r')) === FALSE) {
throw new RuntimeException('Could not open file for reading');
}
$chunk = fread($file_pointer, $chunk_size);
if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
$buffer = substr($chunk, $posStart);
$posEnd = strpos($buffer, '</x:xmpmeta>');
$buffer = substr($buffer, 0, $posEnd + 12);
}
fclose($file_pointer);
// recursion here
if(!strpos($buffer, '</x:xmpmeta>')){
$buffer = getXmpData($filename, $chunk_size*2);
}
return $buffer;
}

I've developped the Xmp Php Tookit extension : it's a php5 extension based on the adobe xmp toolkit, which provide the main classes and method to read/write/parse xmp metadatas from jpeg, psd, pdf, video, audio... This extension is under gpl licence. A new release will be available soon, for php 5.3 (now only compatible with php 5.2.x), and should be available on windows and macosx (now only for freebsd and linux systems).
http://xmpphptoolkit.sourceforge.net/

If you have ExifTool available (a very useful tool) and can run external commands, you can use it's option to extract XMP data (-xmp:all) and output it in JSON format (-json), which you can then easily convert to a PHP object:
$command = 'exiftool -g -json -struct -xmp:all "'.$image_path.'"';
exec($command, $output, $return_var);
$metadata = implode('', $output);
$metadata = json_decode($metadata);

There is now also a github repo you can add via composer that can read xmp data:
https://github.com/jeroendesloovere/xmp-metadata-extractor
composer require jeroendesloovere/xmp-metadata-extractor

Related

How can I read newly appended lines from a LARGE (4GB+) open file?

Using PHP 7.3, I'm trying to achieve "tail -f" functionality: open a file, waiting for some other process to write to it, then read those new lines.
Unfortunately, it seems that fgets() caches the EOF condition. Even when there's new data available (filemtime changes), fgets() returns a blank line.
The important part: I cannot simply close, reopen, then seek, because the file size is tens of gigs in size, well above the 32 bit limit. The file must stay open in order to be able to read new data from the correct position.
I've attached some code to demonstrate the problem. If you append data to the input file, filemtime() detects the change, but fgets() reads nothing new.
fread() does seem to work, picking up the new data but I'd rather not have to come up with a roll-your-own "read a line" solution.
Does anyone know how I might be able to poke fgets() into realising that it's not the EOF?
$fn = $argv[1];
$fp = fopen($fn, "r");
fseek($fp, -1000, SEEK_END);
$filemtime = 0;
while (1) {
if (feof($fp)) {
echo "got EOF\n";
sleep(1);
clearstatcache();
$tmp = filemtime($fn);
if ($tmp != $filemtime) {
echo "time $filemtime -> $tmp\n";
$filemtime = $tmp;
}
}
$l = trim(fgets($fp, 8192));
echo "l=$l\n";
}
Update: I tried excluding the call to feof (thinking that may be where the state becomes cached) but the behaviour doesn't change; once fgets reaches the original file pointer position, any further fgets reads will return false, even if more data is subsequently appended.
Update 2: I ended up rolling my own function that will continue returning new data after the first EOF is reached (in fact, it has no concept of EOF, just data available / data not available). Code not heavily tested, so use at your own risk. Hope this helps someone else.
*** NOTE this code was updated 20th June 2021 to fix an off-by-one error. The comment "includes line separator" was incorrect up to this point.
define('FGETS_TAIL_CHUNK_SIZE', 4096);
define('FGETS_TAIL_SANITY', 65536);
define('FGETS_TAIL_LINE_SEPARATOR', 10);
function fgets_tail($fp) {
// Get complete line from open file which may have additional data written to it.
// Returns string (including line separator) or FALSE if there is no line available (buffer does not have complete line, or is empty because of EOF)
global $fgets_tail_buf;
if (!isset($fgets_tail_buf)) $fgets_tail_buf = "";
if (strlen($fgets_tail_buf) < FGETS_TAIL_CHUNK_SIZE) { // buffer not full, attempt to append data to it
$t = fread($fp, FGETS_TAIL_CHUNK_SIZE);
if ($t != false) $fgets_tail_buf .= $t;
}
$ptr = strpos($fgets_tail_buf, chr(FGETS_TAIL_LINE_SEPARATOR));
if ($ptr !== false) {
$rv = substr($fgets_tail_buf, 0, $ptr + 1); // includes line separator
$fgets_tail_buf = substr($fgets_tail_buf, $ptr + 1); // may reduce buffer to empty
return($rv);
} else {
if (strlen($fgets_tail_buf) < FGETS_TAIL_SANITY) { // line separator not found, try to append some more data
$t = fread($fp, FGETS_TAIL_CHUNK_SIZE);
if ($t != false) $fgets_tail_buf .= $t;
}
}
return(false);
}
The author found the solution himself how to create PHP tail viewer for gians log files 4+ Gb in size.
To mark this question as replied, I summary the solution:
define('FGETS_TAIL_CHUNK_SIZE', 4096);
define('FGETS_TAIL_SANITY', 65536);
define('FGETS_TAIL_LINE_SEPARATOR', 10);
function fgets_tail($fp) {
// Get complete line from open file which may have additional data written to it.
// Returns string (including line separator) or FALSE if there is no line available (buffer does not have complete line, or is empty because of EOF)
global $fgets_tail_buf;
if (!isset($fgets_tail_buf)) $fgets_tail_buf = "";
if (strlen($fgets_tail_buf) < FGETS_TAIL_CHUNK_SIZE) { // buffer not full, attempt to append data to it
$t = fread($fp, FGETS_TAIL_CHUNK_SIZE);
if ($t != false) $fgets_tail_buf .= $t;
}
$ptr = strpos($fgets_tail_buf, chr(FGETS_TAIL_LINE_SEPARATOR));
if ($ptr !== false) {
$rv = substr($fgets_tail_buf, 0, $ptr + 1); // includes line separator
$fgets_tail_buf = substr($fgets_tail_buf, $ptr + 1); // may reduce buffer to empty
return($rv);
} else {
if (strlen($fgets_tail_buf) < FGETS_TAIL_SANITY) { // line separator not found, try to append some more data
$t = fread($fp, FGETS_TAIL_CHUNK_SIZE);
if ($t != false) $fgets_tail_buf .= $t;
}
}
return(false);
}

What is the best way to read last lines (i.e. "tail") from a file using PHP?

In my PHP application I need to read multiple lines starting from the end of
many files (mostly logs). Sometimes I need only the last one, sometimes I need
tens or hundreds. Basically, I want something as flexible as the Unix tail
command.
There are questions here about how to get the single last line from a file (but
I need N lines), and different solutions were given. I'm not sure about which
one is the best and which performs better.
Methods overview
Searching on the internet, I came across different solutions. I can group them
in three approaches:
naive ones that use file() PHP function;
cheating ones that runs tail command on the system;
mighty ones that happily jump around an opened file using fseek().
I ended up choosing (or writing) five solutions, a naive one, a cheating one
and three mighty ones.
The most concise naive solution,
using built-in array functions.
The only possible solution based on tail command, which has
a little big problem: it does not run if tail is not available, i.e. on
non-Unix (Windows) or on restricted environments that don't allow system
functions.
The solution in which single bytes are read from the end of file searching
for (and counting) new-line characters, found here.
The multi-byte buffered solution optimized for large files, found
here.
A slightly modified version of solution #4 in which buffer length is
dynamic, decided according to the number of lines to retrieve.
All solutions work. In the sense that they return the expected result from
any file and for any number of lines we ask for (except for solution #1, that can
break PHP memory limits in case of large files, returning nothing). But which one
is better?
Performance tests
To answer the question I run tests. That's how these thing are done, isn't it?
I prepared a sample 100 KB file joining together different files found in
my /var/log directory. Then I wrote a PHP script that uses each one of the
five solutions to retrieve 1, 2, .., 10, 20, ... 100, 200, ..., 1000 lines
from the end of the file. Each single test is repeated ten times (that's
something like 5 × 28 × 10 = 1400 tests), measuring average elapsed
time in microseconds.
I run the script on my local development machine (Xubuntu 12.04,
PHP 5.3.10, 2.70 GHz dual core CPU, 2 GB RAM) using the PHP command line
interpreter. Here are the results:
Solution #1 and #2 seem to be the worse ones. Solution #3 is good only when we need to
read a few lines. Solutions #4 and #5 seem to be the best ones.
Note how dynamic buffer size can optimize the algorithm: execution time is a little
smaller for few lines, because of the reduced buffer.
Let's try with a bigger file. What if we have to read a 10 MB log file?
Now solution #1 is by far the worse one: in fact, loading the whole 10 MB file
into memory is not a great idea. I run the tests also on 1MB and 100MB file,
and it's practically the same situation.
And for tiny log files? That's the graph for a 10 KB file:
Solution #1 is the best one now! Loading a 10 KB into memory isn't a big deal
for PHP. Also #4 and #5 performs good. However this is an edge case: a 10 KB log
means something like 150/200 lines...
You can download all my test files, sources and results
here.
Final thoughts
Solution #5 is heavily recommended for the general use case: works great
with every file size and performs particularly good when reading a few lines.
Avoid solution #1 if you
should read files bigger than 10 KB.
Solution #2
and #3
aren't the best ones for each test I run: #2 never runs in less than
2ms, and #3 is heavily influenced by the number of
lines you ask (works quite good only with 1 or 2 lines).
This is a modified version which can also skip last lines:
/**
* Modified version of http://www.geekality.net/2011/05/28/php-tail-tackling-large-files/ and of https://gist.github.com/lorenzos/1711e81a9162320fde20
* #author Kinga the Witch (Trans-dating.com), Torleif Berger, Lorenzo Stanco
* #link http://stackoverflow.com/a/15025877/995958
* #license http://creativecommons.org/licenses/by/3.0/
*/
function tailWithSkip($filepath, $lines = 1, $skip = 0, $adaptive = true)
{
// Open file
$f = #fopen($filepath, "rb");
if (#flock($f, LOCK_SH) === false) return false;
if ($f === false) return false;
if (!$adaptive) $buffer = 4096;
else {
// Sets buffer size, according to the number of lines to retrieve.
// This gives a performance boost when reading a few lines from the file.
$max=max($lines, $skip);
$buffer = ($max < 2 ? 64 : ($max < 10 ? 512 : 4096));
}
// Jump to last character
fseek($f, -1, SEEK_END);
// Read it and adjust line number if necessary
// (Otherwise the result would be wrong if file doesn't end with a blank line)
if (fread($f, 1) == "\n") {
if ($skip > 0) { $skip++; $lines--; }
} else {
$lines--;
}
// Start reading
$output = '';
$chunk = '';
// While we would like more
while (ftell($f) > 0 && $lines >= 0) {
// Figure out how far back we should jump
$seek = min(ftell($f), $buffer);
// Do the jump (backwards, relative to where we are)
fseek($f, -$seek, SEEK_CUR);
// Read a chunk
$chunk = fread($f, $seek);
// Calculate chunk parameters
$count = substr_count($chunk, "\n");
$strlen = mb_strlen($chunk, '8bit');
// Move the file pointer
fseek($f, -$strlen, SEEK_CUR);
if ($skip > 0) { // There are some lines to skip
if ($skip > $count) { $skip -= $count; $chunk=''; } // Chunk contains less new line symbols than
else {
$pos = 0;
while ($skip > 0) {
if ($pos > 0) $offset = $pos - $strlen - 1; // Calculate the offset - NEGATIVE position of last new line symbol
else $offset=0; // First search (without offset)
$pos = strrpos($chunk, "\n", $offset); // Search for last (including offset) new line symbol
if ($pos !== false) $skip--; // Found new line symbol - skip the line
else break; // "else break;" - Protection against infinite loop (just in case)
}
$chunk=substr($chunk, 0, $pos); // Truncated chunk
$count=substr_count($chunk, "\n"); // Count new line symbols in truncated chunk
}
}
if (strlen($chunk) > 0) {
// Add chunk to the output
$output = $chunk . $output;
// Decrease our line counter
$lines -= $count;
}
}
// While we have too many lines
// (Because of buffer size we might have read too many)
while ($lines++ < 0) {
// Find first newline and remove all text before that
$output = substr($output, strpos($output, "\n") + 1);
}
// Close file and return
#flock($f, LOCK_UN);
fclose($f);
return trim($output);
}
This would also work:
$file = new SplFileObject("/path/to/file");
$file->seek(PHP_INT_MAX); // cheap trick to seek to EoF
$total_lines = $file->key(); // last line number
// output the last twenty lines
$reader = new LimitIterator($file, $total_lines - 20);
foreach ($reader as $line) {
echo $line; // includes newlines
}
Or without the LimitIterator:
$file = new SplFileObject($filepath);
$file->seek(PHP_INT_MAX);
$total_lines = $file->key();
$file->seek($total_lines - 20);
while (!$file->eof()) {
echo $file->current();
$file->next();
}
Unfortunately, your testcase segfaults on my machine, so I cannot tell how it performs.
My little copy paste solution after reading all this here.
/**
* #param $pathname
* #param $lines
* #param bool $echo
* #return int
*/
private function tailonce($pathname, $lines, $echo = true)
{
$realpath = realpath($pathname);
$fp = fopen($realpath, 'r', FALSE);
$flines = 0;
$a = -1;
while ($flines <= $lines) {
fseek($fp, $a--, SEEK_END);
$char = fread($fp, 1);
if ($char == "\n") $flines++;
}
$out = fread($fp, 1000000);
fclose($fp);
if ($echo) echo $out;
return $a+2;
}
A continuous tail function as in tail -f
It does not close $fp cause you must kill it with
Ctrl-C anyway. usleep for saving your cpu time, only tested on windows so far.
/**
* #param $pathname
*/
private function tail($pathname)
{
$realpath = realpath($pathname);
$fp = fopen($realpath, 'r', FALSE);
$lastline = '';
fseek($fp, $this->tailonce($pathname, 1, false), SEEK_END);
do {
$line = fread($fp, 1000);
if ($line == $lastline) {
usleep(50);
} else {
$lastline = $line;
echo $lastline;
}
} while ($fp);
}
You need to put this code into a class!
Yet another function, you can use regexes to separate items. Usage
$last_rows_array = file_get_tail('logfile.log', 100, array(
'regex' => true, // use regex
'separator' => '#\n{2,}#', // separator: at least two newlines
'typical_item_size' => 200, // line length
));
The function:
// public domain
function file_get_tail( $file, $requested_num = 100, $args = array() ){
// default arg values
$regex = true;
$separator = null;
$typical_item_size = 100; // estimated size
$more_size_mul = 1.01; // +1%
$max_more_size = 4000;
extract( $args );
if( $separator === null ) $separator = $regex ? '#\n+#' : "\n";
if( is_string( $file )) $f = fopen( $file, 'rb');
else if( is_resource( $file ) && in_array( get_resource_type( $file ), array('file', 'stream'), true ))
$f = $file;
else throw new \Exception( __METHOD__.': file must be either filename or a file or stream resource');
// get file size
fseek( $f, 0, SEEK_END );
$fsize = ftell( $f );
$fpos = $fsize;
$bytes_read = 0;
$all_items = array(); // array of array
$all_item_num = 0;
$remaining_num = $requested_num;
$last_junk = '';
while( true ){
// calc size and position of next chunk to read
$size = $remaining_num * $typical_item_size - strlen( $last_junk );
// reading a bit more can't hurt
$size += (int)min( $size * $more_size_mul, $max_more_size );
if( $size < 1 ) $size = 1;
// set and fix read position
$fpos = $fpos - $size;
if( $fpos < 0 ){
$size -= -$fpos;
$fpos = 0;
}
// read chunk + add junk from prev iteration
fseek( $f, $fpos, SEEK_SET );
$chunk = fread( $f, $size );
if( strlen( $chunk ) !== $size ) throw new \Exception( __METHOD__.": read error?");
$bytes_read += strlen( $chunk );
$chunk .= $last_junk;
// chunk -> items, with at least one element
$items = $regex ? preg_split( $separator, $chunk ) : explode( $separator, $chunk );
// first item is probably cut in half, use it in next iteration ("junk") instead
// also skip very first '' item
if( $fpos > 0 || $items[0] === ''){
$last_junk = $items[0];
unset( $items[0] );
} // … else noop, because this is the last iteration
// ignore last empty item. end( empty [] ) === false
if( end( $items ) === '') array_pop( $items );
// if we got items, push them
$num = count( $items );
if( $num > 0 ){
$remaining_num -= $num;
// if we read too much, use only needed items
if( $remaining_num < 0 ) $items = array_slice( $items, - $remaining_num );
// don't fix $remaining_num, we will exit anyway
$all_items[] = array_reverse( $items );
$all_item_num += $num;
}
// are we ready?
if( $fpos === 0 || $remaining_num <= 0 ) break;
// calculate a better estimate
if( $all_item_num > 0 ) $typical_item_size = (int)max( 1, round( $bytes_read / $all_item_num ));
}
fclose( $f );
//tr( $all_items );
return call_user_func_array('array_merge', $all_items );
}
I like the following method, but it won't work on files up to 2GB.
<?php
function lastLines($file, $lines) {
$size = filesize($file);
$fd=fopen($file, 'r+');
$pos = $size;
$n=0;
while ( $n < $lines+1 && $pos > 0) {
fseek($fd, $pos);
$a = fread($fd, 1);
if ($a === "\n") {
++$n;
};
$pos--;
}
$ret = array();
for ($i=0; $i<$lines; $i++) {
array_push($ret, fgets($fd));
}
return $ret;
}
print_r(lastLines('hola.php', 4));
?>

PHP: Retrieving lines from the end of a large text file

I've searched for an answer for quite a while, and haven't found anything that works correctly.
I have log files, some reaching 100MB in size, around 140,000 lines of text.
With PHP, I am trying to get the last 500 lines of the file.
How would I get the 500 lines? With most functions, the file is read into memory, and that isn't a plausible case for this matter. I would preferably stay away from executing system commands.
If you are on a 'nix machine, you should be able to use shell escaping and the tool 'tail'.
It's been a while, but something like this:
$lastLines = `tail -n 500`;
notice the use of tick marks, which executes the string in BASH or similar and returns the results.
I wrote this function which seems to work quite nicely to me. It returns an array of lines just like file. If you want it to return a string like file_get_contents, then just change the return statement to return implode('', array_reverse($lines));:
function file_get_tail($filename, $num_lines = 10){
$file = fopen($filename, "r");
fseek($file, -1, SEEK_END);
for ($line = 0, $lines = array(); $line < $num_lines && false !== ($char = fgetc($file));) {
if($char === "\n"){
if(isset($lines[$line])){
$lines[$line][] = $char;
$lines[$line] = implode('', array_reverse($lines[$line]));
$line++;
}
}else
$lines[$line][] = $char;
fseek($file, -2, SEEK_CUR);
}
fclose($file);
if($line < $num_lines)
$lines[$line] = implode('', array_reverse($lines[$line]));
return array_reverse($lines);
}
Example:
file_get_tail('filename.txt', 500);
If you want to do it in PHP:
<?php
/**
Read last N lines from file.
#param $filename string path to file. must support seeking
#param $n int number of lines to get.
#return array up to $n lines of text
*/
function tail($filename, $n)
{
$buffer_size = 1024;
$fp = fopen($filename, 'r');
if (!$fp) return array();
fseek($fp, 0, SEEK_END);
$pos = ftell($fp);
$input = '';
$line_count = 0;
while ($line_count < $n + 1)
{
// read the previous block of input
$read_size = $pos >= $buffer_size ? $buffer_size : $pos;
fseek($fp, $pos - $read_size, SEEK_SET);
// prepend the current block, and count the new lines
$input = fread($fp, $read_size).$input;
$line_count = substr_count(ltrim($input), "\n");
// if $pos is == 0 we are at start of file
$pos -= $read_size;
if (!$pos) break;
}
fclose($fp);
// return the last 50 lines found
return array_slice(explode("\n", rtrim($input)), -$n);
}
var_dump(tail('/var/log/syslog', 50));
This is largely untested, but should be enough for you to get a fully working solution.
The buffer size is 1024, but can be changed to be bigger or larger. (You could even dynamically set it based on $n * estimate of line length.) This should be better than seeking character by character, although it does mean we need to do substr_count() to look for new lines.

Reading large files from end

Can I read a file in PHP from my end, for example if I want to read last 10-20 lines?
And, as I read, if the size of the file is more than 10mbs I start getting errors.
How can I prevent this error?
For reading a normal file, we use the code :
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
My file might go over 10mbs, but I just need to read the last few lines. How do I do it?
Thanks
You can use fopen and fseek to navigate in file backwards from end. For example
$fp = #fopen($file, "r");
$pos = -2;
while (fgetc($fp) != "\n") {
fseek($fp, $pos, SEEK_END);
$pos = $pos - 1;
}
$lastline = fgets($fp);
It's not pure PHP, but the common solution is to use the tac command which is the revert of cat and loads the file in reverse. Use exec() or passthru() to run it on the server and then read the results. Example usage:
<?php
$myfile = 'myfile.txt';
$command = "tac $myfile > /tmp/myfilereversed.txt";
exec($command);
$currentRow = 0;
$numRows = 20; // stops after this number of rows
$handle = fopen("/tmp/myfilereversed.txt", "r");
while (!feof($handle) && $currentRow <= $numRows) {
$currentRow++;
$buffer = fgets($handle, 4096);
echo $buffer."<br>";
}
fclose($handle);
?>
It depends how you interpret "can".
If you wonder whether you can do this directly (with PHP function) without reading the all the preceding lines, then the answer is: No, you cannot.
A line ending is an interpretation of the data and you can only know where they are, if you actually read the data.
If it is a really big file, I'd not do that though.
It would be better if you were to scan the file starting from the end, and gradually read blocks from the end to the file.
Update
Here's a PHP-only way to read the last n lines of a file without reading through all of it:
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Following snippet worked for me.
$file = popen("tac $filename",'r');
while ($line = fgets($file)) {
echo $line;
}
Reference: http://laughingmeme.org/2008/02/28/reading-a-file-backwards-in-php/
If your code is not working and reporting an error you should include the error in your posts!
The reason you are getting an error is because you are trying to store the entire contents of the file in PHP's memory space.
The most effiicent way to solve the problem would be as Greenisha suggests and seek to the end of the file then go back a bit. But Greenisha's mecanism for going back a bit is not very efficient.
Consider instead the method for getting the last few lines from a stream (i.e. where you can't seek):
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
unset($content[$i1-$lines_to_keep]);
}
So if you know that your max line length is 4096, then you would:
if (4096*lines_to_keep<filesize($input_file)) {
fseek($fp, -4096*$lines_to_keep, SEEK_END);
}
Then apply the loop I described previously.
Since C has some more efficient methods for dealing with byte streams, the fastest solution (on a POSIX/Unix/Linux/BSD) system would be simply:
$last_lines=system("last -" . $lines_to_keep . " filename");
For Linux you can do
$linesToRead = 10;
exec("tail -n{$linesToRead} {$myFileName}" , $content);
You will get an array of lines in $content variable
Pure PHP solution
$f = fopen($myFileName, 'r');
$maxLineLength = 1000; // Real maximum length of your records
$linesToRead = 10;
fseek($f, -$maxLineLength*$linesToRead, SEEK_END); // Moves cursor back from the end of file
$res = array();
while (($buffer = fgets($f, $maxLineLength)) !== false) {
$res[] = $buffer;
}
$content = array_slice($res, -$linesToRead);
If you know about how long the lines are, you can avoid a lot of the black magic and just grab a chunk of the end of the file.
I needed the last 15 lines from a very large log file, and altogether they were about 3000 characters. So I just grab the last 8000 bytes to be safe, then read the file as normal and take what I need from the end.
$fh = fopen($file, "r");
fseek($fh, -8192, SEEK_END);
$lines = array();
while($lines[] = fgets($fh)) {}
This is possibly even more efficient than the highest rated answer, which reads the file character by character, compares each character, and splits based on newline characters.
Here is another solution. It doesn't have line length control in fgets(), you can add it.
/* Read file from end line by line */
$fp = fopen( dirname(__FILE__) . '\\some_file.txt', 'r');
$lines_read = 0;
$lines_to_read = 1000;
fseek($fp, 0, SEEK_END); //goto EOF
$eol_size = 2; // for windows is 2, rest is 1
$eol_char = "\r\n"; // mac=\r, unix=\n
while ($lines_read < $lines_to_read) {
if (ftell($fp)==0) break; //break on BOF (beginning...)
do {
fseek($fp, -1, SEEK_CUR); //seek 1 by 1 char from EOF
$eol = fgetc($fp) . fgetc($fp); //search for EOL (remove 1 fgetc if needed)
fseek($fp, -$eol_size, SEEK_CUR); //go back for EOL
} while ($eol != $eol_char && ftell($fp)>0 ); //check EOL and BOF
$position = ftell($fp); //save current position
if ($position != 0) fseek($fp, $eol_size, SEEK_CUR); //move for EOL
echo fgets($fp); //read LINE or do whatever is needed
fseek($fp, $position, SEEK_SET); //set current position
$lines_read++;
}
fclose($fp);
Well while searching for the same thing, I can across the following and thought it might be useful to others as well so sharing it here:
/* Read file from end line by line */
function tail_custom($filepath, $lines = 1, $adaptive = true) {
// Open file
$f = #fopen($filepath, "rb");
if ($f === false) return false;
// Sets buffer size, according to the number of lines to retrieve.
// This gives a performance boost when reading a few lines from the file.
if (!$adaptive) $buffer = 4096;
else $buffer = ($lines < 2 ? 64 : ($lines < 10 ? 512 : 4096));
// Jump to last character
fseek($f, -1, SEEK_END);
// Read it and adjust line number if necessary
// (Otherwise the result would be wrong if file doesn't end with a blank line)
if (fread($f, 1) != "\n") $lines -= 1;
// Start reading
$output = '';
$chunk = '';
// While we would like more
while (ftell($f) > 0 && $lines >= 0) {
// Figure out how far back we should jump
$seek = min(ftell($f), $buffer);
// Do the jump (backwards, relative to where we are)
fseek($f, -$seek, SEEK_CUR);
// Read a chunk and prepend it to our output
$output = ($chunk = fread($f, $seek)) . $output;
// Jump back to where we started reading
fseek($f, -mb_strlen($chunk, '8bit'), SEEK_CUR);
// Decrease our line counter
$lines -= substr_count($chunk, "\n");
}
// While we have too many lines
// (Because of buffer size we might have read too many)
while ($lines++ < 0) {
// Find first newline and remove all text before that
$output = substr($output, strpos($output, "\n") + 1);
}
// Close file and return
fclose($f);
return trim($output);
}
As Einstein said every thing should be made as simple as possible but no simpler. At this point you are in need of a data structure, a LIFO data structure or simply put a stack.
A more complete example of the "tail" suggestion above is provided here. This seems to be a simple and efficient method -- thank-you. Very large files should not be an issue and a temporary file is not required.
$out = array();
$ret = null;
// capture the last 30 files of the log file into a buffer
exec('tail -30 ' . $weatherLog, $buf, $ret);
if ( $ret == 0 ) {
// process the captured lines one at a time
foreach ($buf as $line) {
$n = sscanf($line, "%s temperature %f", $dt, $t);
if ( $n > 0 ) $temperature = $t;
$n = sscanf($line, "%s humidity %f", $dt, $h);
if ( $n > 0 ) $humidity = $h;
}
printf("<tr><th>Temperature</th><td>%0.1f</td></tr>\n",
$temperature);
printf("<tr><th>Humidity</th><td>%0.1f</td></tr>\n", $humidity);
}
else { # something bad happened }
In the above example, the code reads 30 lines of text output and displays the last temperature and humidity readings in the file (that's why the printf's are outside of the loop, in case you were wondering). The file is filled by an ESP32 which adds to the file every few minutes even when the sensor reports only nan. So thirty lines gets plenty of readings so it should never fail. Each reading includes the date and time so in the final version the output will include the time the reading was taken.

How can I read PNG Metadata from PHP?

This is what I have so far:
<?php
$file = "18201010338AM16390621000846.png";
$test = file_get_contents($file, FILE_BINARY);
echo str_replace("\n","<br>",$test);
?>
The output is sorta what I want, but I really only need lines 3-7 (inclusively). This is what the output looks like now: http://silentnoobs.com/pbss/collector/test.php. I am trying to get the data from "PunkBuster Screenshot (±) AAO Bridge Crossing" to "Resulting: w=394 X h=196 sample=2". I think it'd be fairly straight forward to read through the file, and store each line in an array, line[0] would need to be "PunkBuster Screenshot (±) AAO Bridge Crossing", and so on. All those lines are subject to change, so I can't just search for something finite.
I've tried for a few days now, and it doesn't help much that I'm poor at php.
The PNG file format defines that a PNG document is split up into multiple chunks of data. You must therefore navigate your way to the chunk you desire.
The data you want to extract seem to be defined in a tEXt chunk. I've written the following class to allow you to extract chunks from PNG files.
class PNG_Reader
{
private $_chunks;
private $_fp;
function __construct($file) {
if (!file_exists($file)) {
throw new Exception('File does not exist');
}
$this->_chunks = array ();
// Open the file
$this->_fp = fopen($file, 'r');
if (!$this->_fp)
throw new Exception('Unable to open file');
// Read the magic bytes and verify
$header = fread($this->_fp, 8);
if ($header != "\x89PNG\x0d\x0a\x1a\x0a")
throw new Exception('Is not a valid PNG image');
// Loop through the chunks. Byte 0-3 is length, Byte 4-7 is type
$chunkHeader = fread($this->_fp, 8);
while ($chunkHeader) {
// Extract length and type from binary data
$chunk = #unpack('Nsize/a4type', $chunkHeader);
// Store position into internal array
if ($this->_chunks[$chunk['type']] === null)
$this->_chunks[$chunk['type']] = array ();
$this->_chunks[$chunk['type']][] = array (
'offset' => ftell($this->_fp),
'size' => $chunk['size']
);
// Skip to next chunk (over body and CRC)
fseek($this->_fp, $chunk['size'] + 4, SEEK_CUR);
// Read next chunk header
$chunkHeader = fread($this->_fp, 8);
}
}
function __destruct() { fclose($this->_fp); }
// Returns all chunks of said type
public function get_chunks($type) {
if ($this->_chunks[$type] === null)
return null;
$chunks = array ();
foreach ($this->_chunks[$type] as $chunk) {
if ($chunk['size'] > 0) {
fseek($this->_fp, $chunk['offset'], SEEK_SET);
$chunks[] = fread($this->_fp, $chunk['size']);
} else {
$chunks[] = '';
}
}
return $chunks;
}
}
You may use it as such to extract your desired tEXt chunk as such:
$file = '18201010338AM16390621000846.png';
$png = new PNG_Reader($file);
$rawTextData = $png->get_chunks('tEXt');
$metadata = array();
foreach($rawTextData as $data) {
$sections = explode("\0", $data);
if($sections > 1) {
$key = array_shift($sections);
$metadata[$key] = implode("\0", $sections);
} else {
$metadata[] = $data;
}
}
<?php
$fp = fopen('18201010338AM16390621000846.png', 'rb');
$sig = fread($fp, 8);
if ($sig != "\x89PNG\x0d\x0a\x1a\x0a")
{
print "Not a PNG image";
fclose($fp);
die();
}
while (!feof($fp))
{
$data = unpack('Nlength/a4type', fread($fp, 8));
if ($data['type'] == 'IEND') break;
if ($data['type'] == 'tEXt')
{
list($key, $val) = explode("\0", fread($fp, $data['length']));
echo "<h1>$key</h1>";
echo nl2br($val);
fseek($fp, 4, SEEK_CUR);
}
else
{
fseek($fp, $data['length'] + 4, SEEK_CUR);
}
}
fclose($fp);
?>
It assumes a basically well formed PNG file.
I found this problem a few days ago, so I made a library to extract the metadata (Exif, XMP and GPS) of a PNG in PHP, 100% native, I hope it helps. :) PNGMetadata
How about:
http://www.php.net/manual/en/function.getimagesize.php

Categories