The following code retrieves an image and saves it to a local folder. A jpg file is indeed saved to local disk, with around 40KB filesize (seems correct). When I put the local path in an img tag, the file does not display.
Firebug > Inspect Element shows a size of 0 X 0 and I'm unable to view the image when saved to my desktop.
file_put_contents, file_get_contents and getimagesize don't return FAILs. $url IS a valid image. The problem is just saving it locally, the file seems to be corrupt - how come?
$url = $image->request_url; //the image generated on the remote server
//print_r(getimagesize($url)); die;
$img = 'thumbalizr/cache/screenshot_' . $row['id'] . '.jpg'; //path to our local cache folder + unique filename
if( !$captured_file = file_get_contents($url) ) die('file could not be retrieved');
elseif( !file_put_contents($img, $captured_file, FILE_APPEND) ) die('file could not be written to local disk'); //write the image to our local cache
"Are you sure the path is correct? Have you tried an absolute path?" YES
"Have you checked that the image is downloaded correctly, perhaps with another utility (e.g. ftp, diff)?" I can download the img via ftp but it does not open on my local computer either.
"What do you get if you call the URL directly in the browser?" FF just prints out the URL instead of showing the image
"Why are you using FILE_APPEND? if the target already exists, this writes to the end, which will naturally give you a corrupt image" I removed FILE_APPEND, no difference
"source and final extension are the same?" Yes I tried with jpg, jpeg and png - no difference
"First of all, example code is wrong. Can't use $capture_file in file_put_content because that variable is not defied becouso of if else if block logic." - WRONG, that code does run!
"Can you look into the image file" - no! Although the file has a realistic file size and I can download it, it's impossible to open it.
First off check the files you're been downloading in a text editor to see if you're getting HTML error pages instead of binary image data.
Second, I would use curl for this as it provides better success/error information. Here's your example modified to use it.
//path to our local cache folder + unique filename
$img_path = 'thumbalizr/cache/screenshot_' . $row['id'] . '.jpg';
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $image->request_url);
curl_setopt($c, CURLOPT_HEADER, 0);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($c, CURLOPT_AUTOREFERER, true);
curl_setopt($c, CURLOPT_BINARYTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c, CURLOPT_FORBID_REUSE, true);
// curl can automatically write the file out for you
curl_setopt($c, CURLOPT_FILE, $img_path);
// You can get more elaborate success/error info from
// curl_getinfo() http://www.php.net/manual/en/function.curl-getinfo.php
if (!curl_exec($c)) {
die('file could not be retrieved');
}
Related
So I found the perfect example of a page I would like to download images from, this so happens to be http://www.habbo.com/habbo-imaging/avatarimage?figure=ch-215-110.hd-180-7.lg-275-110.hr-893-61&direction=3&head_direction=3&headonly=1&gesture=sml&size=1
Now, when you go to save the image if you were to on your desktop, it reads as a PNG file, although I am trying to save it using PHP but I want it to save as GIF.
What I've so far is:
$ch = curl_init('https://www.habbo.com/habbo-imaging/avatarimage?figure=hr-125&direction=3&head_direction=3&headonly=1&gesture=sml&size=1');
$fp = fopen('game/c_images/badges/' . $badge_id . '.gif', 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
I managed to get the permissions working and a image saves, but it's saving just the file name and the photo isn't there. So I'm guessing it has something to do with me saving a PNG file as a GIF, surely there's something I'm missing.
yes,You can save image using html dom.
like use
`$new_image_name = '2018_'.mt_rand();
$base_url = 'https://www.habbo.com/habbo-imaging/avatarimage?figure=hr-125&direction=3&head_direction=3&headonly=1&gesture=sml&size=1';
$img = "./image_path/$new_image_name.jpg";
file_put_contents($img,file($base_url));
$local_image_url = "http://example.org/test/path_to_image/$new_image_name.jpg";`
Pulling images from a url when I ran into something I never had before. The header check returned a 403 error and although the images extensions were listed as .jpg they were returned as a application/octet-stream, and checking the content type returned text/html.
I have read the 403 "typically" is to prevent screen scrapping, but this is just on the images.
I found it odd that I could view the source of the web page, see the image src, and click on it and return the image to the browser, but not via code.
Is there a way to convert the image url into an actual image? I eventually want to pull height, width, size info from the images and save them to a folder on my server.
$html = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag){
$image_src = $tag->getAttribute('src');
echo get_headers($image_src, 1); //returns a 403 Forbidden Error
echo image_type_to_mime_type(exif_imagetype($image_src)); //returns application/octet-stream
$i = getimagesize($image_src);
var_dump($i); //returns bool(false)
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_CUSTOMREQUEST, 'HEAD');
curl_setopt($c, CURLOPT_HEADER, 1);
curl_setopt($c, CURLOPT_NOBODY, true);
curl_setopt($c, CURLOPT_URL, $image_src);
curl_exec($c);
echo $content_type = curl_getinfo($c, CURLINFO_CONTENT_TYPE); //returns text/html
}
In my experience when dealing with images getting application/octet-stream when you expect to have a mime type of image/jpeg, image/png, etc. is due to the script not being able to process the image correctly, due to incorrect PHP config. (For example having an image bigger than the max file upload or post size gives a mime of octet-stream)
Using file_get_contents() on a url, you will need to ensure that allow_url_fopen is enabled, so that fopen is allowed to get the contents of a URL as though it were a local file. (PHP INI allow_url_fopen)
Alternatively look at using cURL to download the url and go from there (Look at this answer for a way of doing this). Try both of the config change and the cURL process to see if they yield the same results.
However the fact you are getting a 403 error sounds like it is something on the remote side that is not allowing you to retrieve the images through your specific request. As you correctly identified this could be a security attempt to stop scraping. Have you tried using a different website to grab the images from, or a server that is under your control?
Hope something here helps :)
I've written a script that searches through exiting legal case dockets for things like "motion to intervene" and "motion to compel". If the regular expression returns true, then it looks to see if there is a scanned image of the document online for public use. That image is a TIFF file, but not an ordinary tiff file. Here is a link to an example of what I'm trying to copy to my own server.
http://www.oscn.net/applications/oscn/getimage.tif?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1012443256
Here is the error you get if you only try to look at the http://www.oscn.net/applications/oscn/getimage.tif
It is a TIFF file but dynamic. I've used the fopen(), CURL, etc without success. I've used these types of functions with JPG images from random sites just to check to make sure that my server allowed this type of stuff and it worked.
I don't have PDFlib installed on the server (I checked the PEAR and it's not available there either, though I'm not 100% sure that is where it would be.) My host uses cPanel. The server is running Apache. I'm not sure where else to look for a solution to this problem.
I've seen some solutions that used PDFlib but each of those grabbed a normal TIFF image, not one that was dynamically created. My thought though is that it shouldn't matter if I can get the image data to stream, shouldn't I be able to use fopen() and write or buffer that data into my own .tif file?
Thanks for any input and Happy Thanksgiving!
UPDATE: The issue wasn't with CURL, it was with the URL I scraped to pass to CURL. When I printed the $url to the screen, it looked right, but it wasn't. Somewhere & was turned into &, which then threw off CURL because it was fetching an invalid URL (invalid at least according to the remote server where the TIF file is).
For those of you finding this later, here is the script that works perfectly.
//*******************************************************************************
$url = 'http://www.oscn.net/applications/oscn/getimage.tif"
$url .= '?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1016063497';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // set the url
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // get the transfer as a string, rather than output it directly
print "Attempting to fetch file...\n";
$img = curl_exec($ch); // get the image
//I used the time() so that in testing I would know when a new file was created rather than always overwriting the old file. This will be changed for final version
if($img){
$fh = fopen('oscn_docs/' . time(). '.tif', 'w'); // this will simply overwrite the file. If that's not what you want to do, you'll have to change the 'w' argument!
if($fh){
$byteswritten = fwrite($fh, $img);
fclose($fh);
}else{
print "Unable to open file.\n";
}
}else{
print "Unable to fetch file.\n";
}
print "Done.\n";
exit(0);
//*******************************************************************************
jarod
For those of you finding this later, here is the script that works perfectly.
//*******************************************************************************
$url = 'http://www.oscn.net/applications/oscn/getimage.tif"
$url .= '?submitted=true&casemasterid=2565129&db=OKLAHOMA&barcode=1016063497';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // set the url
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // get the transfer as a string, rather than output it directly
print "Attempting to fetch file...\n";
$img = curl_exec($ch); // get the image
//I used the time() so that in testing I would know when a new file was created rather than always overwriting the old file. This will be changed for final version
if($img){
$fh = fopen('oscn_docs/' . time(). '.tif', 'w'); // this will simply overwrite the file. If that's not what you want to do, you'll have to change the 'w' argument!
if($fh){
$byteswritten = fwrite($fh, $img);
fclose($fh);
}else{
print "Unable to open file.\n";
}
}else{
print "Unable to fetch file.\n";
}
print "Done.\n";
exit(0);
//*******************************************************************************
I'm willing to use thumbnails into my website which is mainly like websites directory.
I've been thinking to save url thumbnails into certain directory !
Example :-
I'm going to use free websites thumbnails service that gives me code to show thumbnail image of any URL as follow
<img src='http://thumbnails_provider.com/code=MY_ID&url=ANY_SITE.COM'/>
This would show the thumbnail of ANY_SITE.COM
i want to save the generate thumbnail image into certain directory my_site.com/thumbnails
Why i'm doing this ?
in fact my database table is like my_table {id,url,image} where i'm going to give the image thumbnail random name and store its new name into my_table related to its url then i can call it back anytime and i know how to do it but i don't know how to save it into certain directory.
any help ~thanks
Using cURL should work for you:
$file = 'the URL';
$ch = curl_init ($file);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata=curl_exec($ch);
curl_close ($ch);
$fullpath = 'path to destination';
$fp = fopen($fullpath);
fwrite($fp, $rawdata);
fclose($fp);
You could use curl to fetch the remote image. You can save it with curl_setopt($handler, CURLOPT_FILE, '/my/image/path/here.jpg');. The id could be something simple like a hash of the original URL. Obviously you'd have to check to make sure the directories exist before you save the file (using is_dir() and creating them with mkdir() if they don't).
The following code transfers an image that is created on the fly from a server to a client site using cURL. It stopped working recently and have not been able to find out what the problem is:
// get_image.php
ob_start();
// create a new CURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, 'url/to/image.php');
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// set timeouts
set_time_limit(30);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
// open a stream for writing
$outFile = fopen($fileDestination, 'wb');
curl_setopt($ch, CURLOPT_FILE, $outFile);
// grab file from URL
curl_exec($ch);
fclose($outFile);
// close CURL resource, and free up system resources
curl_close($ch);
ob_end_clean();
//image.php
/*
* Create image based on client site ...
*/
$filePath = 'path/to/image.png'
$imageFile = file_get_contents($filePath);
header("content-type: image/png");
echo $imageFile;
unlink($filePath);
The file get_image.php is located in a client site and calls the file image.php located in my server.
After running this code the image in the client site is about 7 bytes larger than the original, these bytes seem to be line breaks. After debugging for several hours I found out that these bytes are added when I echo $imageFile. If the 7 bytes are manually removed from the resulting image, the image displays correctly.
There are no errors nor exceptions thrown. The image created in the server is created with no issues. The only output in FF is "The image 'url/to/image.php' cannot be displayed, because it contains errors"
I am not sure what is causing this. Help is greatly appreciated.
Onema
UPDATE:
http://files.droplr.com/files/38059844/V5Jd.Screen%20shot%202011-01-12%20at%2012.17.53%20PM.png
http://files.droplr.com/files/38059844/QU4Z.Screen%20shot%202011-01-12%20at%2012.23.37%20PM.png
Some things to check.
That both files are stored without BOMs
That '<?php' are the first five characters and '?>' the last two in both files.
That when you remove the ob_start() and ob_end-clean(() it should show no error messages.
If you put the unlink before the genereation, you can see the genereated file - check it is valid.
You might want to start the practice of leaving the final ?> from the end of your files - it isn't necessary, and can cause problems if there is whitespace and newlines following the php delimiter.