Parse external HTML and return images

Parse external HTML and return images - php

I'm building a site that depends on bookmarklets. These bookmarklets pull the URL and a couple of other elements. However, I need to select 1 image from the page the user bookmarks. Currently I'm trying to use the PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
It pulls the HTML as expected, and returns the tags as expected. However, I want to take this a step further and only return images with a min width of 40px. I know about the function getimagesize() but from what I understand, this is resource heavy. Is there a better method available to pre-process the image and achieve the results I'm looking for?
Thanks!

First check if the image HTML tag has a width attribute. If it's above 40, skip over it. As Matthew mentioned, it will get false positives where people sized down a large image to 40px wide, but that's no big deal; the point of this step is to quickly weed out the first dozen or so images that are obviously too big.
Once the script catches an image that SAYS it's under 40px wide, check the header information to deduce a general width based on the size of the file. This is faster than getimagesize because you don't have to download the image to get the info.
function get_image_kb($path) {
$headers = get_headers($path);
$len = explode(" ",$headers[6]);
return $len[1];
}
$imageKb = get_image_kb('test1.jpg');
// I'm going to gander 40x80 is about 2000kb
$cutoffSize = 2000;
if ($imageKb < $cutoffSize) {
// this is the one!
}
else {
// it was a phoney, keep scraping
}
Setting it at 2000kb will also let through images that are 100x30, which isn't good.
However, at this point, you've weeded out most of the huge 800kb files that would really slow you down, and because we know it's under 2kb, it's not too taxing to test this one with getimagesize() to get an accurate width.
You can tweak the process depending on how picky you are for the 40px mark, as usual higher accuracy takes more time, and vice versa.

Related

separating image tag from image resizing

I currently have a function in PHP called img(), that takes at least two arguments, as such:
print img("foo.jpg", 100);
That would return something like this:
<img src='/cache/foo_XXXXX.jpg' width='100' height='76' />
So, the function takes the arguments, resize the original file and save it as a cached file on the server and return the image tag. Obviously, if the cached version already exists, it just returns it and saves the processing for the next time.
Now, this works perfectly in 99 cases out of 100, but for some sites there are pages which are quite image-heavy, and since the cached versions of the image is expired after X number of days, the img() functions needs to recreate 50+ high resolution images to small thumbnails, and loading such a page takes several seconds and that's not acceptable.
So my idea is to separate this. So doing this:
print img("foo.jpg", 100);
would return this:
<img src='img.php?ARGUMENTS' width='?' height='?' />
Which means that the img() function doesn't actually handle the file at all, but rather output HTML that tells the browser to get the image from a PHP script rather than a static image file on the server.
Now, the "ARGUMENT" part would of course contain all the information that img.php needs to deal with the file in the same way that img() has dealt with it - perhaps a serialized string version of all the arguments to the function?
But the problem is the width/height values of the image tag! Since processing is moved outside the scope of the current script, it has no idea how large the resulting image would be, right? I obviously can't wait for img.php to complete and return it. I am using imagemagick for the processing (so the second argument can be 100, "50%", "<40" and whatnot) and perhaps there is a way to "dryrun" it with imagemagick - i.e. having imagemagick return the resulting size of a specific file using a specific command?
How would you solve this?

You have the list of all images somewhere in database, right? You can add the fields for width and height and calculate scaled width and height while generating HTML. You could also add thumbnails width and height into database as well.

How does the three20 gallery load images?

On my server, I have three files per image.
A thumbnail file, which is cropped to 128 by 128.
A small file, which I aspect fit to a max of 160 by 240.
A large file, which I aspect fit to a max of 960 by 540.
My method for returning these URLs to three20's gallery looks like this:
- (NSString*)URLForVersion:(TTPhotoVersion)version {
switch (version) {
case TTPhotoVersionLarge:
return _urlLarge;
case TTPhotoVersionMedium:
return _urlSmall;
case TTPhotoVersionSmall:
return _urlSmall;
case TTPhotoVersionThumbnail:
return _urlThumb;
default:
return nil;
}
}
After having logged when these various values are called, the following happens:
When the thumbnail page loads, only thumbnails are called (as expected)
When an image is tapped, the thumbnail appears, and not the small image.
After that thumbnail appears, the large image is loaded directly (without the small image being displayed).
What I desire to happen is the following
This is the same (thumbnails load as expected on the main page)
When the image is tapped, the small image is loaded first
Then after that, the large image is loaded.
Or, the following
Thumbnails
Straight to large image.
The problem with the thumb, is that I crop it so it is a square.
This means that when a thumbnail image is displayed in the main viewer (after thumb was tapped), it is oversized, and when the large image loads, it immediately scales down to fit.
That looks really bad, and to me, it would make far more sense if it loaded the thumbs in the thumbnail view, and then the small image followed by the large image in the detail view.
Does anyone have any suggestions on how to fix this?
Is the best way simply to make the thumbs the same aspect ratio?
I would appreciate any advice on this issue

Looking at the three20 source I can see that TTPhotoView loads the preview image using the following logic:
- (BOOL)loadPreview:(BOOL)fromNetwork {
if (![self loadVersion:TTPhotoVersionLarge fromNetwork:NO]) {
if (![self loadVersion:TTPhotoVersionSmall fromNetwork:NO]) {
if (![self loadVersion:TTPhotoVersionThumbnail fromNetwork:fromNetwork]) {
return NO;
}
}
}
return YES;
}
The problem is that as your small image is on the server and not locally the code skips the image and uses the Thumbnail for the preview.
I would suggest that your best solution would be to edit the thumbnails so that they have the same aspect ratio as the large images. This is what the developer of this class seems to have expected!

I think you have three ways to go here:
modify the actual loadPreview implementation from TTPhotoView so that it implements the logic you want (i.e., allowing loading the small version from the network);
subclass TTPhotoView and override loadPreview to the same effect as above;
pre-cache the small versions of your photos; i.e, modify/subclass TTThumbView so that when TTPhotoVersionThumbnail is set, it pre-caches the TTPhotoVersionSmall version; in this case, being the image already present locally, loadPreview will find it without needing to go out for the network; as an aside, you might do the pre-caching at any time that you see fit for your app; to pre-cache the image you would create a TTButton with the proper URL (this will both deal with the TTURLRequest and the cache for you);
otherwise, you could do the crop on-the-fly from the small version to the thumbnail version by using this UIImage category; in this case you should also tweak the way your TTThumbView is drawn by overriding its imageForCurrentState method so that the cropping is applied when necessary. Again, either you modify directly TTThumbView or you subclass it; alternatively, you can define layoutSubviews in your photo view controller and modify there each of the TTThumbViews you have:
- (void)layoutSubviews {
[super layoutSubviews];
for (NSInteger i = 0; i < _thumbViews.count; ++i) {
TTThumbView* tv = [_thumbViews objectAtIndex:i];
[tv contentForCurrentState].image = <cropped image>;
If you prefer not using the private method contentForCurrentState, you could simply do:
[tv addSubview:<cropped image>];
As you can see, each option will have its pros and cons; 1 and 2 are the easiest to implement, but the small version will be loaded from the network so it could add some delay; the same holds true for 4, although the approach is different; 3 gives you the most responsive implementation (no additional delay from the network, since you pre-cache), but it is possibly the most complex solution to implement (either you download the image and cache it yourself, or use TTButton to do that for you, which is kind of not very "clean").
Anyway, hope it helps.

How to improve Image Scraping (using PHP and JS) to Imitate Facebook Previewer

I've developed an image-scraping mechanism in PHP+JS that allows a user to share URLs and get a rendered preview (very much like Facebook's previewer when you share links). However, the whole process sometimes gets slow or sometimes fetches wrong images, so in general, I'd like to know how to improve it, especially its speed and accuracy. Stuff like parsing the DOM faster or getting image sizes faster. Here's the process I'm using, for those who want to know more:
A. Get the HTML of the page using PHP (I actually use one of CakePHP's classes, which in turn use fwrite and fread to fetch the HTML. I wonder if cURL would be significantly better).
B. Parse the HTML using DOMDocument to get the img tags, while also filtering out any "image" that is not a png, jpg, or gif (you know, sometimes people place tracking scripts inside img tags).
$DOM = new DOMDocument();
#$DOM->loadHTML($html); //$html here is a string returned from step A
$images = $DOM->getElementsByTagName('img');
$imagesSRCs = array();
foreach ($images as $image) {
$src = trim($image->getAttribute('src'));
if (!preg_match('/\.(jpeg|jpg|png|gif)/', $src)) {
continue;
}
$src = urldecode($src);
$src = url_to_absolute($url, $src); //custom function; $url is the link shared
$imagesSRCs[] = $src;
}
$imagesSRCs = array_unique($imagesSRCs); // eliminates copies of a same image
C. Send an array with all those image tags to a page which processes using Javascript (specifically, JQuery). This processing consists mostly in discarding images that are less than 80pixels (so I dont get blank gifs, hundreds of tiny icons, etc.). Because it must calculate each image size, I decided to use JS instead of PHP's getimagesize() because it was insanely slow. Thus, as the images get loaded by the browser, it does the following:
$('.fetchedThumb').load(function() {
$smallestDim = Math.min(this.width, this.height);
if ($smallestDim < 80) {
$(this).parent().parent().remove(); //removes container divs and below
}
});

Rather than downloading the content like this, why not create a server-side component that uses something like wkhtmltoimage or PhantomJS to render an image of the page, and then just scale the image down to a preview size.

This is exactly why I made jQueryScrape
It's a very lightweight jQuery plugin + PHP proxy that lets you scrape remote pages asynchronously, and it's blazing fast. That demo I linked above goes to around 8 different sites and pulls in tons of content, usually in less than 2 seconds.
The biggest bottleneck when scraping with PHP is that PHP will try to download all referenced content (meaning images) as soon as you try to parse anything server side. To avoid this, the proxy in jQueryScrape actually breaks image tags on the server before sending it to the client (by changing all img tags to span tags.)
The jQuery plugin then provides a span2img method that converts those span tags back to images, so the downloading of images is left to the browser and happens as the content is rendered. You can at that point use the result as a normal jQuery object for parsing and rendering selections of the remote content. See the github page for basic usage.

Reuse PHP image randomizer

I wrote a simple image randomizer on PHP that picks a random image from a list using the rand() function. The code works perfectly, and a random image is generated when I include it on my html as a picture.
The problem comes when I try to include it twice in the same html. A random image WILL be generated and displayed for both times I included it, but it will be the same image. In other words, I get a repeated random image on my page.
An easy way to solve this is to simply copy the randomizer.php, give it a new name, and include both images in HTML. The reason I don't want to do this is because my final HTML will have about 25 pictures, and I simply feel like there should be a better way to do this. Keep in mind that I CANNOT add any PHP functions into my HTML, given that my files are hosted in different servers, and my HTML server does not support PHP.
If anyone know of a better fix other than creating 25 copies of my randomizer.php file (or creating 25 different files that include it), please let me know. I will most definitely appreciate your input!!
Thank you very, very much!!
Here's a snippet of the code:
if (count($fileList) > 0) {
do { //do-while loop will get a new random image until that image has not been used yet in this session
$imageNumber = rand( 0 , ( count($fileList) - 1) ); //get random image from fileList
$iterations++;
} while( !(empty($_SESSION['img' . $imageNumber])) && iterations < 200);
$_SESSION['img' . $imageNumber] = True; //this image number has been displayed
$_SESSION['shown']++; //increments the number of shown pictures in this signature
$img = $folder.$fileList[$imageNumber];
}

It may be that the browser thinks it is the same image and is caching, try setting the name of the image (emit a header with content-disposition/filename IIRC) and/or adding a unique tag to the end of the image name with a random string, ( e.g. image.jpg?e0.6613725793930488 )

My guess is that rand() either didn't reseed, or is seeded with the same value.
Have you considered calling srand() - or "the better random number generator" combination of mt_srand() and mt_rand()?

Use face from an image for postcard (see screenshot)

I am looking to build an app similar to santayourself (facebook application) a screenshot below
The application will accept a photo and then the users can move it with controls for zoom and rotate and moving image right, left, up and down. Then once the face is as required then the user can click save to save the image on the server (php, apache, linux).
Any recommendations on how to go about this project? I guess a javascript solution will be better. Any suggestions welcome.

javascript AND php GD-library would do it - most of the things described above can be done w javascript alone. The fastest way to do this would be to have the santa mask done w a transparent png absolutely placed over a simalarly placed client photo that is however placed in a div the same size as the mask with overflow set to hidden. Since the client phot is absolute within the div it can be moved around and its size can be manipulated by the user through some mechanism as shown above. However - rotation will be a bitch and here you will have to use php gd-library or image majik (personally i would dump rotation). This is a simple-ish job but time consuming - the user-interface to image manipulation is tricky tho. If the output for this is for print-from-screen i would not bother w further server-side manipulation, but rather just store the image to mask positional relationship (1/2 kb) of data...

yep. javascript is the way to go about interactive things like this. I can see this easily being done with a simple script and some PNGs (though you might have to do something creative for the rotation). PHP would only be needed for saving.
EDIT: Actually, now that I think of it, a HTML 5 canvas approach would be best. It's got lots of transformation and pixel-manipulation methods, and can even save the image client-side! Remember, though that HTML 5 is not supported in all browsers (basically everything except IE).
(HTML 5 Canvas Spec)
The drawImage method is what you're looking for:
(I quote from spec)
void drawImage(in HTMLImageElement image, in float dx, in float dy, in optional float dw, in float dh);
So, your HTML would have a canvas element that draws the user's picture:
<canvas id="canvasElement" width="xx px" height="xx px">
<!-- What to display in browsers that don't support canvas -->
<p>Your browser doesn't support canvas</p>
</canvas>
Then, your javascript:
var view;
var context;
var userPhoto=new Image;
userPhoto.src="uploaded.jpg";
// Update these with UI settings
var position = {x:x, y:y};
var scale;
var rotation;
function init() {
// Run this once at the loading of your page
view = document.getElementById("canvasElement");
context = view.getContext("2d");
}
function update() {
// Run this every time you want the picture size, position, rotation updated
context.clearRect(0, 0, view.width, view.height);
// Scale X and Y
context.scale( scale, scale );
// Rotate (convert degrees to radians)
context.rotate( rotation / 3.14159 * 180 )
// Draw the image at X and Y
context.drawImage( userPhoto, position.x, position.y )
}
HTML 5 Canvas is very powerful, so there's tons of other things you can do to your image if you go this direction. However, another viable solution would be to use flash, which is supported everywhere — but I recommend HTML 5 as it is the way of the future (Steve Jobs: Thoughts on Flash).

Have a look at the jCrop library (jQuery), you may be able to tweak it enough to do what you want to do.
http://deepliquid.com/content/Jcrop.html (they obviously supply a few demos)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parse external HTML and return images - php

Related

separating image tag from image resizing

How does the three20 gallery load images?

How to improve Image Scraping (using PHP and JS) to Imitate Facebook Previewer

Reuse PHP image randomizer

Use face from an image for postcard (see screenshot)

Categories

Resources