Fetching image from google using dom - php

I want to fetch image from google using PHP. so I tried to get help from net I got a script as I needed but it is showing this fatal error
Fatal error: Call to a member function find() on a non-object in C:\wamp\www\nq\qimages.php on line 7**
Here is my script:
<?php
include "simple_html_dom.php";
$search_query = "car";
$search_query = urlencode( $search_query );
$html = file_get_html( "https://www.google.com/search?q=$search_query&tbm=isch" );
$image_container = $html->find('div#rcnt', 0);
$images = $image_container->find('img');
$image_count = 10; //Enter the amount of images to be shown
$i = 0;
foreach($images as $image){
if($i == $image_count) break;
$i++;
// DO with the image whatever you want here (the image element is '$image'):
echo $image;
}
?>
I am also using Simple html dom.

Look at my example that works and gets first image from google results:
<?php
$url = "https://www.google.hr/search?q=aaaa&biw=1517&bih=714&source=lnms&tbm=isch&sa=X&ved=0CAYQ_AUoAWoVChMIyKnjyrjQyAIVylwaCh06nAIE&dpr=0.9";
$content = file_get_contents($url);
libxml_use_internal_errors(true);
$dom = new DOMDocument;
#$dom->loadHTML($content);
$images_dom = $dom->getElementsByTagName('img');
foreach ($images_dom as $img) {
if($img->hasAttribute('src')){
$image_url = $img->getAttribute('src');
}
break;
}
//this is first image on url
echo $image_url;

This error usually means that $html isn't an object.
It's odd that you say this seems to work. What happens if you output $html? I'd imagine that the url isn't available and that $html is null.
Edit: Looks like this may be an error in the parser. Someone has submitted a bug and added a check in his code as a workaround.

Related

PHP Why is my code not entering the foreach loop on line 20? The file_get_contents() doesnt seem to be working

This screenshot shows that the URL is getting stored in $url
This screenshots shows that after I add echo $html to the code, it says undefined variable $url and file_get_contents(): filename cannot be empty
Also, I have tried almost everything that's there on stackoverflow including file_get_html() and cURL. Nothing seems to work. Please tell me where I'm going wrong here.
<?php
include_once('simple_html_dom.php');
$base_url = "https://www.instagram.com/";
$html = "";
if ( isset($_POST['username']) ) {
$url = $base_url.htmlspecialchars($_POST['username'])."/";//concatenate $base_url to username to generate full URL
}
$html = file_get_contents($url); //access the URL in $url
$doc = new DOMDocument;
$doc->loadHTML($html); //get HTML of the webpage given by file_get_contents
$tags = $doc->getElementsByTagName('img');
$arr = (array)$tags;
if (empty($arr)) {
echo 'emptyarray';
}
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}
?>
Edit:
If 'http:// stackoverflow.com/questions' is used instead of 'https:// www.instagram.com/ its_kushal_here' file_get_contents() is working fine and not failing.
When you refreshed the page did you make sure your post parameters carried through to the new request ?
The issue seems to be here
if ( isset($_POST['username']) ) {
$url = $base_url.htmlspecialchars($_POST['username'])."/";
}
If $_POST['username'] is not set then $url will not be defined. Also remove the # from #$doc->loadHTML($html); so you can see the error it outputs. That will help you workout what fails after that point.

Getting Notice: Undefined offset error: 1 when using file_get_contents and explode

I am trying to figure out some things about getting data from an external page using the PHP file_get_contents function.
This is the PHP code I am trying to get to work:
$url = 'http://www.controller.com/listings/aircraft/for-sale/list/category/3/jet-aircraft/manufacturer/cessna/model/citation-mustang';
$content = file_get_contents($url);
$first_step = explode('<div class="listing">',$content);
$second_step = explode("</div>",$first_step[1]);
echo $second_step[0];
It's a simple code to get the content of the divs with class 'listing' to echo on a page. For one reason or another, I keep getting the
notice Undefined offset error: 1
and can't figure out a way to fix this. When I turn off error reporting, it just returns an empty page. I already read it has something to do with empty arrays or something, but not sure how to fix this.
Thanks in advance!
You can get element by class name using DOMDocument :
$url = 'http://www.controller.com/listings/aircraft/for-sale/list/category/3/jet-aircraft/manufacturer/cessna/model/citation-mustang';
$content = file_get_contents($url);
$doc = new DOMDocument();
if (!$doc->loadHTML($content)) {
die ('error');
}
$a = new DOMXPath($doc);
$class = 'listing';
$divs = $a->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' $class ')]");
// $divs contains every divs with "listing" in his class
// you can get content like that :
foreach ($divs as $div) {
echo $div->nodeValue;
// or
echo $div->textContent;
}
More info with this question from stackoverflow : Get all elements by class name using DOMDocument

Scraping Thumbnail from NYtimes

My scraping code works for just about every site i've come accross while testing... except for nytimes.com articles. I use ajax with the following PHP code (i've left out some details to focus on my specific problem):
$link = "http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp";
$article = new DOMDocument;
$article->loadHTMLFile($link);
//generate image array
$images = $article->getElementsByTagName("img");
foreach ($images as $image) {
$source = $image->getAttribute("src");
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
My problem is that the main images on nytimes pages don't even seem to get picked up by the getElementsByTagName. Pinterest finds a way to scrape the main images from this site for example: http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp whereas I cannot. Any suggestions?
OK. So this is what I tried so far as I found your question interesting.
When I do this on browser console using jQuery, I do get results on images. My query was
var a= new Array();
$('img[src]').each(function(){ a.push($(this).attr('src'));});
console.log(a);
Also see screenshot of results
Note that console.log(arrayname) work in Chrome browser.
So ideally your code must work. Please consider adding a is_null check like I've done.
Below is the code where I try loading the URL using a different approach(perhaps better too) and get the root cause of why you get only single image of NYT logo.
The resultant HTML screenshot is attached .
<?php
$html = file_get_contents("http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp");
echo $html;
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
#$doc->loadHTML("<html><body>".$html."</body></html>");
$xpath = new DOMXpath($doc);
$images = $xpath->query("//*/img");
if (!is_null($images)) {
echo sizeof($images);
foreach ($images as $image) {
$source = $image->getAttribute('src');
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
}
?>
You can't get the content via feed unless you are authenticated.
You can try-
To use context parameter in file_get_contents method
You can try consuming the RSS/ATOM feeds of the article.
you download the page as HTML and then load it in file_get_contents methods. PS: This works.

Regex Replacement Dependent On Class

I have the following code that replaces all tags on a page and adds the nCode image resizer to it. The code is as follows:
function ncode_the_content($content) {
return preg_replace("/<img([^`|>]*)>/im", "<img onload=\"NcodeImageResizer.createOn(this);\"$1>", $content); }
}
What I need to do is make it so that if an image has the class of "noresize" it doesn't do the preg_match.
I have only managed to get it so that if there is the "noresize" class anywhere on the page it stops resizing all images instead of just the one with the correct class.
Any suggestions?
UPDATE:
Am I even remotely in the right ballpark with this?
function ncode_the_content($content) {
//Load the HTML page
$html = file_get_contents($content);
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
//Query the DOM
$linksnoresize = $xpath->query( 'img[#class = "noresize"]' );
$links = $xpath->query( 'img[]' );
//Display the results as in the previous example
foreach($links as $link){
echo $link->getAttribute('onload'), 'NcodeImageResizer.createOn(this);';
}
foreach($linksnoresize as $link){
echo $link->getAttribute('onload'), '';
}
}
Here's some untested code:
$dom = DOMDocument::loadHTML($content);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!strstr($image->getAttribute("class"), "noresize")) {
$image->setAttribute("onload", "NcodeImageResizer.createOn(this);");
}
}
But, if it were me, I would eschew any such inline event handler and instead just find the appropriate elements with Javascript.
I ended up just using pure CSS and adding a around the images I didn't want to be resized. Forced the width and height of that div back to auto and then removed the warning message that was displayed above them. Seems to work fine. Thanks for your help :)

simplexml load on google weather api prooblem

Hi I have been having problems with the google weather api having errors Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 2: parser error ....
I tried to use the script of the main author(thinking it was my edited script) but still I am having this errors I tried 2
//komunitasweb.com/2009/09/showing-the-weather-with-php-and-google-weather-api/
and
//tips4php.net/2010/07/local-weather-with-php-and-google-weather/
The weird part is sometimes it fixes itself then goes back again to the error I have been using it for months now without any problem, this just happened yesterday. Also the demo page of the authors are working but I have the same exact code any help please.
this is my site http://j2sdesign.com/weather/widgetlive1.php
#Mike I added your code
<?
$xml = file_get_contents('http://www.google.com/ig/api?weather=jakarta'); if (! simplexml_load_string($xml)) { file_put_contents('malformed.xml', $xml); }
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=jakarta');
$information = $xml->xpath("/xml_api_reply/weather/forecast_information");
$current = $xml->xpath("/xml_api_reply/weather/current_conditions");
$forecast_list = $xml->xpath("/xml_api_reply/weather/forecast_conditions");
?>
and made a list of the error but I can't seem to see the error cause it's been fixing itself then after sometime goes back again to the error
here is the content of the file
<?php include_once('simple_html_dom.php'); // create doctype $dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes //header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$pages = array( 'http://myshop.com/small_houses.html', 'http://myshop.com/medium_houses.html', 'http://myshop.com/large_houses.html' ) foreach($pages as $page) { $product = array(); $source = file_get_html($page); foreach($source->find('img') as $src) { if (strpos($src->src,"http://myshop.com") === false) { $product['image'] = "http://myshop.com/$src->src"; } } foreach($source->find('p[class*=imAlign_left]') as $description) { $product['description'] = $description->innertext; } foreach($source->find('span[class*=fc3]') as $title) { $product['title'] = $title->innertext; } //debug perposes! echo "Current Page: " . $page . "\n"; print_r($product); echo "\n\n\n"; //Clear seperator } ?>
When simplexml_load_string() fails you need to store the data you're trying to load somewhere for review. Examining the data is the first step to diagnose what it causing the error.
$xml = file_get_contents('http://example.com/file.xml');
if (!simplexml_load_string($xml)) {
file_put_contents('malformed.xml', $xml);
}

Categories