Making domdocument img search more lightweight - php

This grabs all images from a page, and is supposed to check if image has more width and height than 200. if so, grab the first of it. But its an expensive process, and im wondering if there are more lightweight approaches to this than using getimagesize. Does anyone know of a different approach without the use of external services like YQL etc?
if($ogimage!=''|| !empty($ogimage)){
$arrimg = $ogimage;
} else {
$imgarr = array();
foreach ($doc->getElementsByTagName('img') as $img) {
$arrimg_push = $img->getAttribute('src');
array_push($imgarr, $arrimg_push);
}
$i=0;
foreach($imgarr as $img){
list($width, $height, $type, $attr) = getimagesize($img);
if($width > 200 && $height > 200){
if($i > 0){
$arrimg = $img;
$i++;
}
}
}
}

ImageMagick's pingImage or pingImageFile will read as little of the image file as possible to get the basic attributes, which can then be accessed using getImageWidth and getImageHeight.

Related

Scrape a webpage and get images more than particular width and particular height using php

Am able to scrape the images from a website using php but I want to scrape the only first image that has height greater than 200px and width 200px. How can I get the dimensions of first image source? Here is my code..
$html_3 = file_get_contents('http://beignindian.com');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html_3, $matches );
$main_image_1 = $matches[ 1 ][ 0 ];
You can use getimagesize function to get the image height and width. Once you get it then add if condition to execute further code.
list($width, $height) = getimagesize($main_image_1); // I am assuming that $main_image_1 has image source.
echo "width: " . $width . "<br />";
echo "height: " . $height;
if($width > 200 && $height > 200) {
// perform something here.
}
Update:
If you need to loop through all the images from a website then use following code:
$host = "http://www.beingindian.com/";
$html = file_get_contents($host);
// create new DOMDocument
$document = new DOMDocument('1.0', 'UTF-8');
// set error level
$internalErrors = libxml_use_internal_errors(true);
// load HTML
$document->loadHTML($html);
// Restore error level
libxml_use_internal_errors($internalErrors);
$images = $document->getElementsByTagName('img');
foreach ($images as $image) {
$image_source = $image->getAttribute('src');
// check if image URL is an absolute URL or relative URL
$image_url = (filter_var($image_source, FILTER_VALIDATE_URL))?$image_source:$host.$image_source;
list($width, $height) = getimagesize($image_url);
if($width > 200 && $height > 200) {
// perform something here.
}
else {
// perform something here.
}
}

PHP trying to understand how to generate thumbnail from directory

This is currently what I have. When I include it in my index.php and then call the function on pageload, I get a blank page. So something is wrong here, but I don't know what. I feel like I'm really close though. I just want to create thumbnails of images in a directory, and then show them in HTML as a list of images you can click that trigger lightboxes.
I'm still really shaky in PHP. I'm trying to wrap my head around editing images in a directory.
<?php
function buildThumbGallery(){
$h = opendir('/Recent_Additions/'); //Open the current directory
while (false !== ($curDir = readdir($h))) {
if (!file_exists('/Recent_Additions/thumbs/')) {
$thumbDir = mkdir('/Recent_Additions/thumbs/', 0777, true);
}else{
$thumbDir = '/Recent_Additions/thumbs/';
}
$width = 200;
$height = 200;
foreach ($curDir as $image) {
$filePath = $curDir."/".$image;
$genThumbImg = $image->scaleImage($width, $height, true);
$newThumb = imagejpeg($genThumbImg, $thumbDir, 100);
echo '<li> <a class="fancybox" data-fancybox-group="'.basename($curDir).'" href="'.$filePath.'" title="'.basename($curDir)." ".strpbrk(basename($filePath, ".jpg"), '-').'"><img src="'.$newThumb.'"/>'.basename($curDir).'</a>';
imagedestroy($newThumb);
}echo '</li>';
}
?>
You are doing several things wrong:
You're not closing the while loop.
Readdir already loops through a directory, your foreach is not doing anything.
You are missing quotes in your echo.
You are calling the method scaleImage on a string, I think you meant to call the function imagescale.
You're missing and misunderstanding a lot of stuff, take a look at how to create a thumbnail here: https://stackoverflow.com/a/11376379/4193448
Also see if you can enable PHP errors, getting a blank page while your code is full of errors is not really helping is it?
::EDIT::
With help from #swordbeta, I got my script working properly. Here is the code for future reference:
<?php
function buildThumbGallery(){
$curDir = "./Recent_Additions/";
$thumbsPath = $curDir."thumbs/";
if (!file_exists($thumbsPath)) {
mkdir($thumbsPath, 0777, true);
}
foreach(scandir($curDir) as $image){
if ($image === '.' || $image === '..' || $image === 'thumbs') continue;
if(!file_exists($thumbsPath.basename($image, ".jpg")."_thumb.jpg")){
// Max vert or horiz resolution
$maxsize=200;
// create new Imagick object
$thumb = new Imagick($curDir.$image); //'input_image_filename_and_location'
$thumb->setImageFormat('jpg');
// Resizes to whichever is larger, width or height
if($thumb->getImageHeight() <= $thumb->getImageWidth()){
// Resize image using the lanczos resampling algorithm based on width
$thumb->resizeImage($maxsize,0,Imagick::FILTER_LANCZOS,1);
}else{
// Resize image using the lanczos resampling algorithm based on height
$thumb->resizeImage(0,$maxsize,Imagick::FILTER_LANCZOS,1);
}
// Set to use jpeg compression
$thumb->setImageCompression(Imagick::COMPRESSION_JPEG);
$thumb->setImageCompressionQuality(100);
// Strip out unneeded meta data
$thumb->stripImage();
// Writes resultant image to output directory
$thumb->writeImage($thumbsPath.basename($image, ".jpg")."_thumb.jpg"); //'output_image_filename_and_location'
// Destroys Imagick object, freeing allocated resources in the process
$thumb->destroy();
}else{
echo '<a class="fancybox" data-fancybox-group="'.basename($curDir).'" href="'.$curDir.basename($image, "_thumb.jpg").'" title="Recent Addition - '.basename($image, ".jpg").'"><img src="'.$thumbsPath.basename($image, ".jpg")."_thumb.jpg".'"/></a>';
echo '<figcaption>'.basename($image, ".jpg").'</figcaption>' . "<br/>";
}
}
}
?>
::Original Post::
Ok, after going back and doing some more research and suggestions from #swordbeta, i've got something that works. My only issue now is I can't get the images to show in my index.php. I'll style the output in CSS later, right now I just want to see the thumbnails, and then later build them into lightbox href links:
<?php
function buildThumbGallery(){
$curDir = "./Recent_Additions/";
$thumbsPath = $curDir."/thumbs/";
if (!file_exists($thumbsPath)) {
mkdir($thumbsPath, 0777, true);
}
$width = 200;
foreach(scandir($curDir) as $image){
if ($image === '.' || $image === '..') continue;
if(file_exists($thumbsPath.basename($image)."_thumb.jpg")){
continue;
}else{
// Max vert or horiz resolution
$maxsize=200;
// create new Imagick object
$thumb = new Imagick($curDir.$image); //'input_image_filename_and_location'
// Resizes to whichever is larger, width or height
if($thumb->getImageHeight() <= $thumb->getImageWidth()){
// Resize image using the lanczos resampling algorithm based on width
$thumb->resizeImage($maxsize,0,Imagick::FILTER_LANCZOS,1);
}else{
// Resize image using the lanczos resampling algorithm based on height
$thumb->resizeImage(0,$maxsize,Imagick::FILTER_LANCZOS,1);
}
// Set to use jpeg compression
$thumb->setImageCompression(Imagick::COMPRESSION_JPEG);
$thumb->setImageCompressionQuality(100);
// Strip out unneeded meta data
$thumb->stripImage();
// Writes resultant image to output directory
$thumb->writeImage($thumbsPath.basename($image)."_thumb.jpg"); //'output_image_filename_and_location'
// Destroys Imagick object, freeing allocated resources in the process
$thumb->destroy();
}
} echo '<img src="'.$thumbsPath.basename($image)."_thumb.jpg".'" />' . "<br/>";
}
?>
At the moment, the output from the echo isn't showing anything, but the rest of the script is working properly (i.e. generating thumbnail images in a thumbs directory).
I'm guessing i'm not formatting my echo properly. This script is called in my index.php as <?php buildThumbGallery(); ?> inside a styled <div> tag.

Get the size of an image while parsing with the DOMDocument

I am parsing website (not any in particular but a variety) and saving various information from them including images. So far I am saving the image src and then going back through again to check the size of the image. What I would like to do is check the image size when I first parse the page....
Here's what I have...
$imgs = $dom->getElementsByTagName('img');
$imageArray = array();
foreach ($imgs as $img) {
$image = $this->returnProperURL($img->getAttribute('src'));
$imageSize = getimagesize($image);
$imageWidth = $imageSize[0];
$imageHeight = $imageSize[1];
if($imageWidth > $this->minImageWidth && $imageHeight > $this->minImageHeight){
$imageArray[] = $image;
}
}
instead of using the getimagesize() after I have the image src is there a way I can check the image size the first time through when I am parsing it? As I'm sure you suspect, it's taking twice as long as it should.

PHP - Get information about image (Height and Width) without loading it

Is it possible to get image information without loading the actual image with PHP? In my case I want the Height and Width.
I have this code to fetch images from a directory. I echo out the image's url and fetch it with JS.
<?php
$directory = "./images/photos/";
$sub_dirs = glob($directory . "*");
$i = 0;
$len = count($sub_dirs);
foreach($sub_dirs as $sub_dir)
{
$images = glob($sub_dir . '/*.jpg');
$j = 0;
$len_b = count($images);
foreach ($images as $image)
{
if ($j == $len_b - 1) {
echo $image;
} else {
echo $image . "|";
}
$j++;
}
if ($i == $len - 1) {
} else {
echo "|";
}
$i++;
}
?>
getImageSize() is the proper way to get this information in PHP
It does a minimal amount of work based on the type of image. For example, a GIF image's height/width are stored in a header. Very easy to access and read. So this is how the function most likely gets that information from the file. For a JPEG, it has to do a little more work, using the SOFn markers.
The fastest way to access this information would be to maintain a database of file dimensions every time a new one is uploaded.
Given your current situation. I recommend writing a PHP script that takes all of your current image files, gets the size with this function, and then inserts the info into a database for future use.
It depends what you mean by "without loading it".
The built-in getimagesize() does this.
list($w, $h) = getimagesize($filename);
You can programmatically get the image and check the dimensions using Javascript...
var img = new Image();
img.onload = function() {
alert(this.width + 'x' + this.height);
}
img.src = 'http://www.google.com/intl/en_ALL/images/logo.gif';
This can be useful if the image is not a part of the markup ;)
You can store the width'n'height information in a text file, and load it later on.
list($width, $height, $type, $attr) = getimagesize($_FILES["Artwork"]['tmp_name']);
Use this.

Choosing a thumbnail from an external link

I am trying to build a script that retrieves a list of thumbnail images from an external link, much like Facebook does when you share a link and can choose the thumbnail image that is associated with that post.
My script currently works like this:
file_get_contents on the URL
preg_match_all to match any <img src="" in the contents
Works out the full URL to each image and stores it in an array
If there are < 10 images it loops through and uses getimagesize to find width and height
If there are > 10 images it loops through and uses fread and imagecreatefromstring to find width and height (for speed)
Once all width and heights are worked out it loops through and only adds the images to a new array that have a minimum width and height (so only larger images are shown, smaller images are less likely to be descriptive of the URL)
Each image has its new dimensions worked out (scaled down proportionally) and are returned...
<img src="'.$image[0].'" width="'.$image[1].'" height="'.$image[2].'"><br><br>
At the moment this works fine, but there are a number of problems I can potentially have:
SPEED! If the URL has many images on the page it will take considerably longer to process
MEMORY! Using getimagesize or fread & imagecreatefromstring will store the whole image in memory, any large images on the page could eat up the server's memory and kill my script (and server)
One solution I have found is being able to retrieve the image width and height from the header of the image without having to download the whole image, though I have only found some code to do this for JPG's (it would need to support GIF & PNG).
Can anyone make any suggestions to help me with either problem mentioned above, or perhaps you can suggest another way of doing this I am open to ideas... Thanks!
** Edit: Code below:
// Example images array
$images = array('http://blah.com/1.jpg', 'http://blah.com/2.jpg');
// Find the image sizes
$image_sizes = $this->image_sizes($images);
// Find the images that meet the minimum size
for ($i = 0; $i < count($image_sizes); $i++) {
if ($image_sizes[$i][0] >= $min || $image_sizes[$i][1] >= $min) {
// Scale down the original image size
$dimensions = $this->resize_dimensions($scale_width, $scale_height, $image_sizes[$i][0], $image_sizes[$i][1]);
$img[] = array($images[$i], $dimensions['width'], $dimensions['height']);
}
}
// Output the images
foreach ($img as $image) echo '<img src="'.$image[0].'" width="'.$image[1].'" height="'.$image[2].'"><br><br>';
/**
* Retrieves the image sizes
* Uses the getimagesize() function or the filesystem for speed increases
*/
public function image_sizes($images) {
$out = array();
if (count($images) < 10) {
foreach ($images as $image) {
list($width, $height) = #getimagesize($image);
if (is_numeric($width) && is_numeric($height)) {
$out[] = array($width, $height);
}
else {
$out[] = array(0, 0);
}
}
}
else {
foreach ($images as $image) {
$handle = #fopen($image, "rb");
$contents = "";
if ($handle) {
while(true) {
$data = fread($handle, 8192);
if (strlen($data) == 0) break;
$contents .= $data;
}
fclose($handle);
$im = #imagecreatefromstring($contents);
if ($im) {
$out[] = array(imagesx($im), imagesy($im));
}
else {
$out[] = array(0, 0);
}
#imagedestroy($im);
}
else {
$out[] = array(0, 0);
}
}
}
return $out;
}
/**
* Calculates restricted dimensions with a maximum of $goal_width by $goal_height
*/
public function resize_dimensions($goal_width, $goal_height, $width, $height) {
$return = array('width' => $width, 'height' => $height);
// If the ratio > goal ratio and the width > goal width resize down to goal width
if ($width/$height > $goal_width/$goal_height && $width > $goal_width) {
$return['width'] = floor($goal_width);
$return['height'] = floor($goal_width/$width * $height);
}
// Otherwise, if the height > goal, resize down to goal height
else if ($height > $goal_height) {
$return['width'] = floor($goal_height/$height * $width);
$return['height'] = floor($goal_height);
}
return $return;
}
getimagesize reads only header, but imagecreatefromstring reads whole image. Image read by GD, ImageMagick or GraphicsMagic is stored as bitmap so it consumes widthheight(3 or 4) bytes, and there's nothing you can do about it.
The best possible solution for your problem is to make curl multi-request (see http://ru.php.net/manual/en/function.curl-multi-select.php ), and then one by one process recieved images with GD or any other library. And to make memory consumption a bit lower, you can store image files on disk, not in memory.
The only idea that comes to mind for your current approach (which is impressive) is to check the HTML for existing width and height attributes and skip the file read process altogether if they exist.

Categories