ImageMagick / GraphicsMagick / libvips Images randomly corrupted - php

We are using ImageMagick for resizing/thumbnailing JPGs to a specific size. The source file is loaded via HTTP. It's working as expected, but from time to time some images are partially broken.
We already tried different software like GraphicsMagick or VIPS, but the problem is still there. It also only seems to happen if there are parallel processes. So the whole script is locked via sempahores, but it also does not help
We found multiple similar problems, but all without any solution: https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=22506
We also wonder, why it is the same behaviour in all these softwares. We also tried different PHP versions. It seems to happen more often on source images with a huge dimension/filesize.
Any idea what to do here?
Example 1 Example 2 Example 3

I would guess the source image has been truncated for some reason. Perhaps something timed out during the download?
libvips is normally permissive, meaning that it'll try to give you something, even if the input is damaged. You can make it strict with the fail flag (ie. fail on the first warning).
For example:
$ head -c 10000 shark.jpg > truncated.jpg
$ vipsthumbnail truncated.jpg
(vipsthumbnail:9391): VIPS-WARNING **: 11:24:50.439: read gave 2 warnings
(vipsthumbnail:9391): VIPS-WARNING **: 11:24:50.439: VipsJpeg: Premature end of JPEG file
$ echo $?
0
I made a truncated jpg file, then ran thumbnail. It gave a warning, but did not fail. If I run:
$ vipsthumbnail truncated.jpg[fail]
VipsJpeg: Premature end of input file
$ echo $?
1
Or in php:
$thumb = Vips\Image::thumbnail('truncated.jpg[fail]', 128);
Now there's no output, and there's an error code. I'm sure there's an imagemagick equivalent, though I don't know it.
There's a downside: thumbnailing will now fail if there's anything wrong with the image, and it might be something you don't care about, like invalid resolution.

After some additional investigation we discovered that indeed the sourceimage was already damaged. It was downloaded via a vpn connection which was not stable enough. Sometimes the download stopped, so the JPG was only half written.

Related

How to validate a PDF?

I have a PDF file, generated in php (using tFDF) that opens fine in chrome, firefox and safari. But not in Edge or Adobe acrobat reader. Edge displays a white page, and Adobe reports "An error exists on this page. Acrobat may not be able to display the page correctly".
Is it possible to retrieve what that error might be? The PDF file uses some embedded fonts that might be the problem, but im not sure.
You can use Ghostscript like this (paths are pure conjectures):
# On Linux
gs -o /dev/null -sDEVICE=nullpage -dBATCH -dNOPAUSE /home/ebaars/sample.pdf
# On Windows, using gswin32
gswin32 -o nul -sDEVICE=nullpage -dBATCH -dNOPAUSE C:\Users\Eric\Desktop\Sample.pdf
Or you can use iText (that is, pdftk) and ask to, say, (un)compress the file and reparse it to another file. Meanwhile, the library will performs the checks.
You can also check out this other answer.
update
This error, "'0,686 is not an operator" -- it means that it found a number where it expected an operator. I assume by "tFDF" you mean "tcPDF"? I suspect - I might be wrong - that we're looking at a i18n error, where a number such as 2/3, which should be "0.66666", is represented by the server code with a decimal comma, making it what the PDF interpreter believes a list ("0,666").
To be sure, I would either need the PDF - I would uncompress it with iText, then rewrite 0,686 etc. as 0.686 etc., then see whether this way it works or not - or the exact PHP code that generated the file, plus the configuration of the server (to verify whether locale settings are appropriate).
My guess is that it is a library bug. Check software versions, in case it is possible to update the code and maybe get rid of the problem that way.
I have met this bug several times, since I am from Italy and "one thousand and one cent" is written here as "1.000,01" or "1'000,01" instead of "1000.01".

check if file is fully downloaded using wget

I'm using php wget to download mp4 files from another server
exec("wget -P files/ $http_url");
but I didn't find any option to check if file downloaded correctly, or not yet.
I tried to get duration file using getID3(), but it always return good value, even if file not downloaded correctly
// Check file duration
$file = $getID3->analyze($filepath);
echo $file['playtime_string']; // 15:00 always good value
there is any function to check that?
Thanks
First off I would try https instead. If the server(s) you're connecting to happen to support it, you get around this entire issue because lost bytes are usually caused by flaky hardware or bad MTU settings on a router on their network. The http connections gracefully degrade to giving you as much of the file as it could manage, whereas https connections just plain fail when they lose bytes because you can't decrypt non-intact packets.
Lazy IT people tend to get prodded to fix complete failures of https, but they get less pressure to diagnose and fix corner cases like missing bytes that only occur larger transactions over http.
If https is not available, keep reading.
An HTTP server response may include a Content-Length header indicating the number of bytes in a particular transaction.
If the header is there, you should be able to see it by running wget directly, adding the -v flag.
If it's not there, I believe wget will report Length: unspecified followed by the content-type header's value.
If it tells you (and assuming the byte count is accurate) then you can just compare the byte count of the file you got and the one in the transaction.
If the server(s) you're contacting don't provide this header, you're left with less exact methods, like finding some player that will basically play the mp3 until it ends and then see how long it took and then compare that to the length listed in the ID3 tag (which is in the very beginning of the file). You're not going to be able to get an exact match though, because the time in the tag (if it's there) is only accurate to the second, meaning half a second could be gone from the end of the file and you wouldn't know.

PHP: Save Dynamic URL Image to Disk

Having trouble capturing the following dynamic image on disk, all I get is a 1K size file
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER
I have setup PHP cURL feature to work just fine on static imagery, but does not work for the above link. Similarly, also copy function, file_put_contents (file_get_contents)...they all work fine for static image. Plenty of references in SO for usage of these PHP functions, so I will not get into details here. Just the copy command:
copy('http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER', 'precip5.png');
Behavior is same, getting precip5.png size 760 bytes, on my windows development box and linux staging box, so can rule OS issues out. Again, all PHP functions do exactly the same thing - generate a file - but empty. Command line curl program is also generating that same junk 1K file.
So, the issue seems to be source and the best I can tell is that it is a dynamic (streaming?) image.
Ideally, I would like this be done in PHP or some command line utility like curl. I am trying to avoid adding java (imageio) dependency just for this...until I absolutely have have to go there...
I am trying to understand the nature of the beast (the image) first ;-)...
The URL you are saving produces HTML output, not the image. You are missing the parameter &print=1
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER&print=1

How to configure logrotate with php logs

I'm running php5 FPM with APC as an opcode and application cache. As is usual, I am logging php errors into a file.
Since that is becoming quite large, I tried to configure logrotate. It works, but after rotation, php continues to log to the existing logfile, even when it is renamed. This results in scripts.log being a 0B file, and scripts.log.1 continuing to grow further.
I think (haven't tried) that running php5-fpm reload in postrotate could resolve this, but that would clear my APC cache each time.
Does anybody know how to get this working properly?
I found that "copytruncate" option to logrotate ensures that the inode doesn't change. Basically what is [sic!] was looking for.
This is probably what you're looking for. Taken from: How does logrotate work? - Linuxquestions.org.
As written in my comment, you need to prevent PHP from writing into the same (renamed) file. Copying a file normally creates a new one, and the truncating is as well part of that options' name, so I would assume, the copytruncate option is an easy solution (from the manpage):
copytruncate
Truncate the original log file in place after creating a copy,
instead of moving the old log file and optionally creating a new
one, It can be used when some program can not be told to close
its logfile and thus might continue writing (appending) to the
previous log file forever. Note that there is a very small time
slice between copying the file and truncating it, so some log-
ging data might be lost. When this option is used, the create
option will have no effect, as the old log file stays in place.
See Also:
Why we should use create and copytruncate together?
Another solution I found on a server of mine is to tell php to reopen the logs. I think nginx has this feature too, which makes me think it must be quite common place. Here is my configuration :
/var/log/php5-fpm.log {
rotate 12
weekly
missingok
notifempty
compress
delaycompress
postrotate
invoke-rc.d php5-fpm reopen-logs > /dev/null
endscript
}

Error when unzipping a group of images

I am importing public domain books from archive.org to my site, and have a php import script set up to do it. However, when I attempt to import the images and run
exec( "unzip $images_file_arg -d $book_dir_arg", $output, $status );
it will occasionally return me a $status of 1. Is this ok? I have not had any problems with the imported images so far. I looked up the man page for unzip, but it didn't tell me much. Could this possibly cause problems, and do I have to check each picture individually, or am I safe?
EDIT: Oops. I should have checked the manpage straight away. They tell us what the error codes mean:
The exit status (or error level) approximates the exit codes defined by PKWARE and takes on the following values, except under VMS:
normal; no errors or warnings detected.
one or more warning errors were encountered, but processing completed successfully anyway. This includes zipfiles where one or more files was skipped due to unsupported compression method or encryption with an unknown password.
a generic error in the zipfile format was detected. Processing may have completed successfully anyway; some broken zipfiles created by other archivers have simple work-arounds.
a severe error in the zipfile format was detected. Processing probably failed immediately.
(many more)
So, apparently some archives may have had files in them skipped, but zip didn't break down; it just did all it could.
It really should work, but there are complications with certain filenames. Are any of them potentially tricky with unusual characters? escapeshellarg is certainly something to look into. If you get a bad return status, you should be concerned because that means zip exited with some error or other. At the very least, I would suggest you log the filenames in those case (error_log($filename)) and see if there is anything that might cause problems. zip itself runs totally independantly of PHP and will do everything fine if it's getting passed the right arguments by the shell, and the files really are downloaded and ready to unzip.
Maybe you are better suited with the php integrated ziparchive-class.
http://www.php.net/manual/de/class.ziparchive.php
Especially http://www.php.net/manual/de/function.ziparchive-extractto.php it returns you TRUE, if extracting was successful, otherwise FALSE.

Categories