Get img src with PHP Simple HTML DOM - php

Demo
I need to get the image src from the following code
HTML
<div class="avatar profile_CF48B2B4A31B43EC96F0561F498CE6BF ">
<a onclick="">
<img id="lazyload_-247847544_0" height="74" width="74" class="avatar potentialFacebookAvatar avatarGUID:CF48B2B4A31B43EC96F0561F498CE6BF" src="http://media-cdn.tripadvisor.com/media/photo-l/05/f3/67/c3/lilrazzy.jpg" />
</a>
</div>
I tried writing the js:
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF] a img') as $element) {
$img = $element->getAttribute('src');
echo $img;
}
But it shows src key doesn't exists. How can I scrap review avatar images?
UPDATE:
The image url is not found when I looked at the page source, But firebug shows the image url:
<img id='lazyload_1953171323_17' height='24' alt='4 helpful votes' width='25' class='icon lazy'/>
Here is my page's source code:
<div class="col1of2">
<div class="member_info">
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-SRC_175428572" class="memberOverlayLink" onmouseover="ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_photo'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', 0, (new Element(this)).getElement('.avatar')&&(new Element(this)).getElement('.avatar').getStyle('border-radius')=='100%'?-10:0);">
<div class="avatar profile_3E0FAF58557D3375508A9E5D9A7BD42F ">
<a onclick=>
<img id='lazyload_1953171323_15' height='74' width='74' class='avatar potentialFacebookAvatar avatarGUID:3E0FAF58557D3375508A9E5D9A7BD42F'/>
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname hvrIE6 mbrName_3E0FAF58557D3375508A9E5D9A7BD42F" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">Prataspeles</span>
</div>
</div>
<div class="location">
Latvia
</div>
</div>
<div class="memberBadging">
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-CONT" class="totalReviewBadge badge no_cpu" onclick="ta.trackEventOnPage('Reviews','show_reviewer_info_window','review_count'); ta.util.cookie.setPIDCookie('15984'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', -10, -50);">
<div class="reviewerTitle">Reviewer</div>
<img id='lazyload_1953171323_16' height='24' alt='4 reviews' width='25' class='icon lazy'/>
<span class="badgeText">4 reviews</span>
</div>
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-HV" class="helpfulVotesBadge badge no_cpu" onclick="ta.trackEventOnPage('Reviews','show_reviewer_info_window','helpful_count'); ta.util.cookie.setPIDCookie('15983'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', -22, -50);">
<img id='lazyload_1953171323_17' height='24' alt='4 helpful votes' width='25' class='icon lazy'/>
<span class="badgeText">4 helpful votes</span>
</div>
</div>
</div>
Is there any problem because of using lazyload?
UPDATE 2
Using lazyload makes my images load once the pages are loaded, i tried getting image ids and compare them with the lazyload js array, but this id doesn't match with the lazyload var array.
Question:
How to get this js array from this JSON?
Example:
{"id":"lazyload_-205858383_0","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/05/f3/67/c3/lilrazzy.jpg"}
, {"id":"lazyload_-205858383_1","tagType":"img","scroll":true,"priority":100,"data":"http://c1.tacdn.com/img2/icons/gray_flag.png"}
, {"id":"lazyload_-205858383_2","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/01/2a/fd/98/avatar.jpg"}
, {"id":"lazyload_-205858383_3","tagType":"img","scroll":true,"priority":100,"data":"http://c1.tacdn.com/img2/icons/gray_flag.png"}
, {"id":"lazyload_-205858383_4","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/5e/avatar036.jpg"}
, {"id":"lazyload_-205858383_5","tagType":"img","scroll":false,"priority":100,"data":"http://c1.tacdn.com/img2/badges/badge_helpful.png"}

You are having difficulty because javascipt is used to lazy load the image once the page is loaded. Use phpDom to find the Id of the element, and then use regular expression to find the relevant images based on this Id.
To achieve this, try something like :
$json = json_decode("<JSONSTRING HERE>");
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF] a img') as $element) {
$imgId = $element->getAttribute('id');
foreach ($json as $lazy)
{
if ($lazy["id"] == $imgId) echo $lazy["data"];
}
}
The above is untested so you will need to resolve the kinks. They key is to extract the relevant javascript and convert it to json.
Alternatively, you can use string search functions to get the row which contains the information about the img, and extract the required value.

If you're looking for all IDs that contain the substring, "lazyload", you might try the wildcard selector and upon a hit look at the 'src' property of the element found. See the jsfiddle below. Good luck!
$(document.body).find('img[id*=lazyload]').each(function() {
console.log($(this).prop('src'));
});
Jsfiddle

Try this -
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF ] a img') as $element) {
$img = $element->getAttribute('src');
echo $img;
}
There is space after the class name. You have to add space at the end of class name.
OR
use even full class name
$html->find('div[class=avatar profile_CF48B2B4A31B43EC96F0561F498CE6BF ] a img'

Use jQuery selectors i.e. $('#lazyload_-247847544_0') and you can get the image source using this
var src = $('#lazyload_-247847544_0').attr('src');
Or more specifically
$('.profile_CF48B2B4A31B43EC96F0561F498CE6BF #lazyload_-247847544_0').attr('src');
Thanks

function getReviews(){
$url = 'http://www.tripadvisor.com/Hotel_Review-g274965-d952833-Reviews-Ezera_Maja-Liepaja_Kurzeme_Region.html';
$html = new simple_html_dom();
$html = file_get_html($url);
$array = array();
$i = 0;
// IMG ID
foreach($html->find('div[class=avatar] a img') as $element) { $array[$i]['id'] = $element->getAttribute('id'); $i++;} unset($i);$i = 0;
// IMG SRC
$p1 = strpos( $html, 'var lazyImgs =' ) + 14;
$p2 = strpos( $html, ']', $p1 );
$raw = substr( $html, $p1, $p2 - $p1 ) . ']';
$images = json_decode($raw);
foreach ($images as $image){
$id = $image->id;
$data = $image->data;
foreach ($array as $element){
if ( isset($element['id']) && $element['id'] == $id){
$array[$i]['image'] = $data;
$i++;
}
}
}
$html->clear();
unset($html);
return $array;
}
Get IMG ID in array. Then scrach var Lazyload in json and decode. Then compare 2 arrays and if id mach add data to array.
Thanks to everybody!

Related

Use Simple HTML DOM Parser to JSON?

I'm trying to group each of the elements of a scraped website, convert it into a json element but it doesn't seem to be working.
<?php
// Include the php dom parser
include_once 'simple_html_dom.php';
header('Content-type: application/json');
// Create DOM from URL or file
$html = file_get_html('urlhere');
foreach($html->find('hr ul') as $ul)
{
foreach($ul->find('div.product') as $li)
$data[$count]['products'][]['li']= $li->innertext;
$count++;
}
echo json_encode($data);
?>
This returns
{"":{"products":[{"li":" <a class=\"th\" href=\"\/products\/56942-haters-crewneck-sweatshirt\"> <div style=\"background-image:url('http:\/\/s0.merchdirect.com\/images\/15814\/v600_B_AltApparel_Crew.png');\"> <img src=\"http:\/\/s0.com\/images\/6398\/product-image-placeholder-600.png\"> <\/div> <\/a> <div class=\"panel panel-info\" style=\"display: none;\"> <div class=\"name\"> <a href=\"\/products\/56942-haters-crewneck-sweatshirt\"> Haters Crewneck Sweatshirt <\/a> <\/div> <div class=\"subtitle\"> $60.00 <\/div> <\/div> "}
When I'm actually hoping to achieve:
{"products":[{
"link":"/products/56942-haters-crewneck-sweatshirt",
"image":"http://s0.com/images/15814/v600_B_AltApparel_Crew.png",
"name":"Haters Crewneck Sweatshirt",
"subtitle":"60.00"}
]}
How do I get rid of all of the redundant information and probably name each element in the reformatted json?
Thanks!
You simply need to extend your logic within the inner loop:
foreach($html->find('hr ul') as $ul)
{
foreach($ul->find('div.product') as $li) {
$product = array();
$product['link'] = $li->find('a.th')[0]->href;
$product['name'] = trim($li->find('div.name a')[0]->innertext);
$product['subtitle'] = trim($li->find('div.subtitle')[0]->innertext);
$product['image'] = explode("'", $li->find('div')[0]->style)[1];
$data[$count]['products'][] = $product;
}
}
echo json_encode($data);

Import data from xml

I'm trying to get data from an XML file using simple_xml , so far I can get all the data except the images . How can I call a single image name ?
<?php
$ur="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($ur);
foreach ($xml->property as $property):
var_dump($property->images->image);
echo 'images->image">'; // this is not displaying
endforeach;?>
My code output as the image below . How can i display image number 1
public 1 => string 'http://media2.jupix.co.uk/v3/clients/657/properties/1356/IMG_1356_9_large.jpg' (length=77)
I think SimpleXMLElement::xpath can do what you are looking for:
You can give this a try:
<?php
$ur="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($ur);
$image = $xml->xpath('//property/images/image[#modified="2014-07-23 14:22:05"]')[1]->__toString();
var_dump($image);
Or you can loop through all the images and check for the name the you are looking for:
$images = $xml->xpath('//property/images/image');
foreach ($images as $image) {
$url = $image->__toString();
if (false !== strpos($url, "_9_large.jpg")) {
var_dump($url);
}
}
If you want to get the second image of each /images section, then you could do it like this:
$images = $xml->xpath('//property/images');
foreach ($images as $image) {
if (isset($image->children()[1])) {
var_dump($image->children()[1]->__toString());
}
}
Thanks Guy I found solution to my problem .
Looking at the back at the question it seems I did not put it in a proper way
All i wanted is to display images within that section. Xpath was not necessary But i have learned from it . Here is my solution if you can improve it you are much welcome.
$url ="http://services2.jupix.co.uk/api/get_properties.php?clientID=35871cc1b6d9ec6237aaaf94aa0e0836&passphrase=cvYG9f";
$xml = simplexml_load_file($url);
foreach ($xml->property as $property):
?>
<li>
<h3> <?php echo $property->addressStreet;?> </h3>
<?php
$imgCount = count($property->images->image);
for ($i=0; $i < $imgCount; $i++) { ?>
<img src="<?php echo $property->images->image[$i];?>">
<?php } ?>
<p><?php echo limit_text($property->fullDescription,30);?></p>
<h4>£ <?php echo $property->price;?> </h4>
</li>
<?php endforeach; ?>

Getting element inside other element by class php DOMDocument

Hi Guys i do have this Html Code :
<div class="post-thumbnail2">
<a href="http://example.com" title="Title">
<img src="http://linkimgexample/image.png" alt="Title"/>
</a>
</div>
I want to get the value of src image (http://linkimgexample/image.png) and the value of the href link (http://example.com) using php DOMDocument
what i did to get the link was something like that :
$divs = $dom->getElementsByTagName("div");
foreach($divs as $div) {
$cl = $div->getAttribute("class");
if ($cl == "post-thumbnail2") {
$links = $div->getElementsByTagName("a");
foreach ($links as $link)
echo $link->getAttribute("href")."<br/>";
}
}
i could do the same for src img
$imgs = $div->getElementsByTagName("img");
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
but sometime in the website there is no image and the Html code is like that :
<div class="post-thumbnail2">
</div>
so my questions is how could i get the 2 value at the same time it means when there is no image i show some message
to be more clear this is an example :
<div class="post-thumbnail2">
<a href="http://example1.com" title="Title">
<img src="http://linkimgexample/image1.png" alt="Title"/>
</a>
</div>
<div class="post-thumbnail2">
</div>
<div class="post-thumbnail2">
<a href="http://example3.com" title="Title">
<img src="http://linkimgexample/image2.png" alt="Title"/>
</a>
</div>
i want the result to be
http://example1.com - http://linkimgexample/image1.png
http://example2.com - there is no image here !
http://example3.com - http://linkimgexample/image2.pn
DOMElement::getElementsByTagName returns a DOMNodeList, that means you can find out if a img-element was found by checking the length property.
$imgs = $div->getElementsByTagName("img");
if($imgs->length > 0) {
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
} else {
echo "there is no image here!<br/>";
}
You should think about using XPath - it makes your life traversing the DOM a bit easier:
$doc = new DOMDocument();
if($doc->loadHtml($xmlData)) {
$xpath = new DOMXPath($doc);
$postThumbLinks = $xpath->query("//div[#class='post-thumbnail2']/a");
foreach($postThumbLinks as $link) {
$imgList = $xpath->query("./img", $link);
$imageLink = "there is no image here!";
if($imgList->length > 0) {
$imageLink = $imgList->item(0)->getAttribute('src');
}
echo $link->getAttribute('href'), " - ", $link->getAttribute('title'),
" - ", $imageLink, "<br/>", PHP_EOL;
}
} else {
echo "can't load HTML document!", PHP_EOL;
}

php: find an url image in specific div

I have this piece of code:
if (!$thumbdone) {
if (intval($fb_image_use_content)==1) {
$imgreg = '/<img .*src=["\']([^ ^"^\']*)["\']/';
preg_match_all($imgreg, trim($post->post_content), $matches);
if (isset($matches[1][0])) {
//There's an image on the content
$image=$matches[1][0];
$pos = strpos($image, site_url());
if ($pos === false) {
if (stristr($image, 'http://') || stristr($image, 'https://')) {
//Complete URL - offsite
$fb_image=$image;
} else {
$fb_image=site_url().$image;
}
} else {
//Complete URL - onsite
$fb_image=$image;
}
$thumbdone=true;
}
}
}
Find the first image within a code.
The problem is I want to find the first image inside a div structured in this way:
<div style="display:block" class="ogimage">
<img class="aligncenter wp-image-1030 size-full" src="http:/site.com/image.jpg" alt="image" width="600" height="315">
</div>
I Googled this:
https://www.google.it/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=php%20get%20image%20from%20div
I also tried with http://www.phpliveregex.com/
but nothing. Help?
Use the right tool for the job instead of trying to parse HTML using a regular expression.
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$img = $xpath->query('//div[#class="ogimage"]/img');
echo $img->item(0)->getAttribute('src');
Working Demo
You can use explode like this:
$html = '<div style="display:block" class="ogimage"><img class="aligncenter wp-image-1030 size-full" src="http:/site.com/image.jpg" alt="image" width="600" height="315"></div>';
$result = explode($html)[7];

How to get the title or another attribute of an Image in PHP

I am building a custom shirt builder for a website http://mytempsite.net/gotie/mixandmatch
What I have setup is step one, they will select a shirt color from 12 different shirts and then move on to the next step where they will be able to choose a tie. I need to be able to pass a variable to that next page telling it to only pull images with ties on it for example the red shirt.
My thought of doing this is by having that attribute at the image alt or title tag and then getting that attribute from the current image that is being displayed.
What i need to know is how?
i tried using this code as a start
<?php
$url=$this->helper('core/url')->getCurrentUrl();;
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$divs = $doc->getElementByID('loadarea');
foreach ($divs as $div) {
echo "Found the loadarea div <br />";
}
?>
but it didn't work, and it also caused my page to load really slowly.
just in case here is code to thumbnails
<?php
$products = Mage::getModel('catalog/product')->getCollection();
foreach($products as $prod) {
$product = Mage::getModel('catalog/product')->load($prod->getId());
$pro_title = $product->getName();
$img = $product->getImageUrl();
echo "<a href='".$img."' title='".$pro_title."' rel='enlargeimage' rev='targetdiv:loadarea,enabletitle:no,trigger:click,preload:none,fx:fade'><img src='".$img."' width='100px'/></a>";
}?>
I hope that i worded this question correctly as to not be too localized. If so i will reword it.
I believe $someVar = get_field('image file') should return an array of all the tagged data in the image. You can then access the data from the array (ie. $someVar['alt']).
Here is a simpler solution that doesn't require you make the image into an object.
$html=file_get_contents("URL OF YOUR SITE");
$doc = new DOMDocument();
#$doc -> loadHTML($html);
// Add a class to your big image to identify it (ex. selected)
$tags = getElementsByClassName($doc, 'selected');
foreach ($tags as $tag) {
echo $tag->getAttribute('alt');
}
// Not my function but its very useful. I'll track down where I got it
// and add source later.
function getElementsByClassName(DOMDocument $DOMDocument, $ClassName) {
$Elements = $DOMDocument -> getElementsByTagName("*");
$Matched = array();
foreach($Elements as $node)
{
if( ! $node -> hasAttributes())
continue;
$classAttribute = $node -> attributes -> getNamedItem('class');
if( ! $classAttribute)
continue;
$classes = explode(' ', $classAttribute -> nodeValue);
if(in_array($ClassName, $classes))
$Matched[] = $node;
}
return $Matched;
}
What i ended up doing is when the thumbnail image gets clicked a hidden text box gets the product_id of that shirt and then when you go to the next page it automatically pulls the product image of that product. I have inserted the code for my entire form incase anyone ever needs to do this in the future :)
Happy Coding!
<form id="GoTie_Builder" method="POST" action="/gotie/mixandmatch/tie">
<script>
function changeInput(pro_id)
{
var my_form = document.getElementById('GoTie_Builder');
my_form.shirt_color.value = pro_id;
}
function changePattern(pattern)
{
var my_div = document.getElementById('shirt_zoom');
my_div.innerHTML = '<img src="http://www.tuxedojunction.com/Content/Products/Vests/LegacyBlueVelvet_s_1.jpg" />';
}
</script>
<div id="shirt_zoom" style="width:300px; height:100px;">
<img src="http://www.tuxedojunction.com/Content/Products/Vests/LegacyBlueVelvet_s_1.jpg" />
</div>
<div class="Builder_thumbnails" style="float:left;">
<?php
$cat_id = 8;
$products = Mage::getModel('catalog/category')->load($cat_id)->getProductCollection();
echo '<input id="shirt_color" type="text" name="shirt_color" value="0">';
foreach($products as $prod) {
$product = Mage::getModel('catalog/product')->load($prod->getId());
$pattern = $this->helper('catalog/image')->init($product, 'thumbnail');
//echo "<img onclick='changeInput($pro_id)' class='product_thumbnail' src='".$pattern."' alt='".$pro_id."' width='100px'/>";
$pro_id = $product->getId();
$pro_title = $product->getName();
$img = $product->getImageUrl();
$input_id = "shirt_color";
echo "<a href='".$img."' title='".$pro_title."' rel='enlargeimage' rev='targetdiv:loadarea,enabletitle:no,trigger:click,preload:none,fx:fade'><img onclick='changeInput($pro_id)' class='product_thumbnail' src='".$this->helper('catalog/image')->init($product, 'thumbnail')."' alt='".$pro_id."' width='100px' height='100px'/></a>";
}?>
</div>
<div id="loadarea" style="width:300px;top: 0px;right: 0px;float: right;position: relative;"><img src="http://cdn4.blackenterprise.com/wp-content/blogs.dir/1/files/2011/07/White-Shirt-620x480.jpg" width="500px;"/>
</div>
<input type="submit" value="Choose a Tie" />
</form>

Categories