How to get data attribute value? - php

I have a url within a data-attribute and I need to get the first one:
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this)"; class="carousel-cell-image" data-flickity-lazyload="http://esportareinsvizzera.com/site/wp-content/uploads/8.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.finanziamentiprestitimutui.com/wp-content/uploads/2014/09/esportazioni-finanziamento-credito.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.infologis.biz/wp-content/uploads/2013/09/Export.jpg">
</div>
<div class="carousel-cell">
<img onerror="this.parentNode.removeChild(this);" class="carousel-cell-image" data-flickity-lazyload="http://www.cigarettespedia.com/images/2/25/Esportazione_horizontal_name_ks_20_s_green_italy.jpg">
</div>
I have been reading lots of answers like this one and this one but I am not a php guy.
I was using this to get the first img but now I need the actual data attribute value instead
<?php
$custom_image = usp_get_meta(false, 'usp-custom-4');
$custom_image = htmlspecialchars_decode($custom_image);
$custom_image = nl2br($custom_image);
$custom_image = preg_replace('/<br \/>/iU', '', $custom_image);
preg_match('/<img.+src=[\'"](?P<src>.+?)[\'"].*>/i',$custom_image, $image);
?>
<img src="<?php echo $image['src']; ?>" alt="<?php the_title(); ?>">

Use DOMDocument to parse the HTML, get the elements corresponding to img tags and get the data-flickity-lazyload attribute of the first img tag:
...
$DOM = new DOMDocument;
$DOM->loadHTML($custom_image);
$items = $DOM->getElementsByTagName('img');
$mySrc = $items->item(0)->getAttribute('data-flickity-lazyload');

Related

PHP replace image src and add a new attribute in image tag from a string containing different html tags [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
I have a site where i get products description from database and decode html like this in PHP and display it on webpage frontend:
$data['description'] = html_entity_decode($product_info['description'], ENT_QUOTES, 'UTF-8');
It returns html like the following:
<div class="container">
<div class="textleft">
<p>
<span style="font-size:medium">
<strong>Product Name:</strong>
</span>
<br />
<span style="font-size:14px">Some description here Click here to see full details.</span>
</p>
</div>
<div class="imageblock">
<a href="some-link">
<img src="http://myproject.com/image/catalog/image1.jpg" style="width: 500px; height: 150px;" />
</a>
</div>
<div style="clear:both">
</div>
<div class="container">
<div class="textleft">
<p>
<span style="font-size:medium">
<strong>Product Name:</strong>
</span>
<br />
<span style="font-size:14px">Some description here Click here to see full details.</span>
</p>
</div>
<div class="imageblock">
<a href="some-link">
<img src="http://myproject.com/image/catalog/image2.jpg" style="width: 500px; height: 150px;" />
</a>
</div>
<div style="clear:both">
</div>
There could be many images in the product description. I have added just 2 in my example. What I need to do is replace src of every image with src="image/catalog/blank.gif" for all images and add a new attribute
data-src="http://myproject.com/image/catalog/image1.jpg"
for image 1 and
data-src="http://myproject.com/image/catalog/image2.jpg"
for image 2. data-src attribute should get the original src value of each image. How can I achieve that?
I have tried preg_replace like following
$data['description'] = preg_replace('((\n)?src="\b.*?")', 'src="image/catalog/blank.gif', $data['description']);
It replaces src attribute of every image, but how can i add data-src with original image path. I need this before page load, so is there any way to do it with PHP?
Simply adjust your regular expression. Capture the text you want using (parentheses), then reference to that group 1 using $1 or \1.
preg_replace('(src="(.*?)")', 'src="image/catalog/blank.gif" data-src="$1"', $data['description']);
Demo: https://repl.it/repls/SpottedZanyDiscussion
I think this might be what you are looking for:
http://php.net/manual/en/domdocument.getelementsbytagname.php
$data['description'] = html_entity_decode($product_info['description'], ENT_QUOTES, 'UTF-8');
$doc = new DOMDocument();
$doc->loadHTML($data['description']);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$old_src = $tag->getAttribute('src');
$new_src_url = 'image/catalog/blank.gif';
$tag->setAttribute('src', $new_src_url);
$tag->setAttribute('data-src', $old_src);
}
$data['description'] = $doc->saveHTML();
I havn't tested this though, so don't just copy and paste.

symfony domcrawler parsing not working

i would like to grab multiple of values from the following html:
<div class="video">
<a href="https://example.com/23422" class="hRotator">
<div class="thumb_container" data-previewvideo="http://example.com/vid.mp4">
<img src="http://example.com/thumb.jpg" class="thumb" alt="">
<img class="hSprite" src="https://example.com/spacer.gif" sprite="https://example.com/23422.jpg" id="23422">
<video autoplay="autoplay" loop="loop" muted="muted" playsinline="" webkit-playsinline="" poster="https://example.com/poster.jpg" src="https://example.com/23422.mp4"></video>
</div>
</div>
</a>
</div>
I'm using symfony domparser but dont seem to get it right
$crawler->filter('div .videoList')->first()->filter('div .video')->each(function($video) {
$link = $video->filter("a");
$href = $link->attr("href");
$thumb_container = $link->filter("div .thumb_container");
$preview_video = $thumb_container->attr("data-previewvideo");
$thumbnail_image = $thumb_container->filter("img .thumb")->attr("src");
$hSprite = $thumb_container->filter("img .hSprite")->first();
$image_sprite = $hSprite->attr("sprite");
$id = $hSprite->attr("id");
}
How i should parse the html?

Recursive context nodes for xpath->query

Basically what I'm trying to achieve is replacing the content of the src-attributes of a bunch of img-nodes by the content of the corresponding data-src-nodes in a page like the following one.
<html>
<body>
<div id="a">
<img src="" data-src="myValue" />
<img src="" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
I want to do this by finding a common base node (in this case the img nodes in the div with id a) and based on that node
the node containing the value to copy and#
the node retrieving the value
Script
<?PHP
$html = '<html><body><div id="a"><img src="" data-src="myValue"/><img src="" data-src="myValue2"/></div><img src="" data-src="myValue"/></body></html>';
$doc = new DOMDocument();
#$doc->loadHTML($html);
$basenode = false;
$xpath = new DOMXPath($doc);
$entries = $xpath->query('(//div[#id="a"])');
if ($entries->length > 0) $basenode = $entries->item(0);
if ($basenode) {
$img = $xpath->query('//img', $basenode);
foreach ($img as $curImg) {
$from = $xpath->query('//#data-src', $curImg);
$to = $xpath->query('//#src', $curImg);
$to->item(0)->value = $from->item(0)->value;
}
echo $doc->saveXML();
}
?>
Expected output
<html>
<body>
<div id="a">
<img src="myValue" data-src="myValue" />
<img src="myValue2" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
Actual output
<html>
<body>
<div id="a">
<img src="myValue" data-src="myValue" />
<img src="" data-src="myValue2" />
</div>
<img src="" data-src="myValue" />
</body>
</html>
So, the line
$from = $xpath->query('//#data-src', $curImg);
seems to actually base its search on the root node and not the img-node selected before. How can I solve this?
(I know that a possible workaround would be to omit selecting the img-nodes explicitly and doing something like from='//div[#id="a"]/img/#data-src' and to='//div[#id="a"]/img/#src' but I'm a bit concerned, that I might end up copying values between attributes of different nodes)
/ at the beginning specifies an absolute location path (i.e, from the document root). Instead, you want to use a relative one (relative to the context node).
For example; .//#data-src, or descendant::img/#data-src, and so on.

php remove tags before a specified tag

I want to remove all image-tags before the headline starts, but they are not nested the same way. And then remove the empty tags.
<div class="c2">
<img src="image/file" width="480" height="360" alt="Image" />
</div>
<div class="c2">
<div class="headline">
headline
</div>
<div class="headline">
headline2
</div>
</div>
and different nested tags like
<div class="c2">
<p>
<img src="image/A.JPG" width="480" height="319" alt="Image" />
</p>
<div class="headline">
A headline
</div>
</div>
i think that could be solved recursively, but i dont know how.
Thanks for your help!
EDIT: if you want to remove only <img> followed by <div><div class="headline>" or <div class="headline">, use this xpath:
$imgs = $xpath->query("//img[../following-sibling::div[1]/div/#class='headline' or ../following-sibling::div[1]/#class='headline']");
see it working: http://codepad.viper-7.com/QhprLP
Do it like this:
$doc = new DOMDocument();
$doc->loadHTML($x); // assuming HTML in $x
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img"); // select all <img> nodes
foreach ($imgs as $img) { // loop through list of all <img> nodes
$parent = $img->parentNode;
$parent->removeChild($img); // delete <img> node
if ($parent->childNodes->length >= 1) // if parent node of <img> is empty delete it
$parent->parentNode->removeChild($parent);
}
echo htmlentities($doc->saveHTML()); // display the new HTML
see it working: http://codepad.viper-7.com/350Hw6

how do I get sets of data with xpath

My below code retrieves a series of images from the search results of a site and also the corresponding age data. It works fine however I get a list of images followed by a list of the information in the age field.
img img img img age age age age and so on.
How do I combine these so I can display them in sets: img age img age img age
<?php
error_reporting(-1);
$html = new DOMDocument();
#$html->loadHtmlFile('http://www.site.com/searchresults.html');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='age']" );
$tags = $html->getElementsByTagName('img');
foreach ($tags as $tag) {
$image = $tag->getAttribute('src');
echo '<img src='. $image .' alt="image" ><br>';
}
foreach ($nodelist as $n)
{
echo $n->nodeValue."<br>";
}
?>
Sample page, I want to extract the img source title data from <div class="age" title="30 usa">:
<div id="sr-15763292" class="search-result">
<div class="thumb-wrapper">
<a class="bioLink" href="http://www.site.com/user/" title="View user"><img src="http://www.site.com/img/15763292.jpg" class="thumb" alt="user" width="140" height="105"></a>
<p class="status"><a href="http://www.site.com/user/" >Online</a></p>
</div>
<div class="rating">
<div class="rating-stars rating4"></div>
</div>
<div class="age" title="30 usa">
<p>30</p>
<p class="gender m">m</p>
<p>USA</p>
</div>
<div>
<p class="headline">Hello there.</p>
</div>
</div>
It's hard to answer if we don't know what the HTML looks like! Assuming it looks something like this
<div class="age"><p>21</p>
<img src="a.jpg" />
</div>
<div class="age"><p>51</p>
<img src="b.jpg" />
</div>
you need to find each div and then find the image inside each div. getElementsByTagName() will give you a list even if there's only one result, so use item() to fetch the first.
error_reporting(-1);
$html = new DOMDocument();
#$html->loadHtmlFile('results.html');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#class='age']" );
foreach ($nodelist as $node) {
$tags = $node->getElementsByTagName('img');
$image = $tags->item(0)->getAttribute('src');
echo '<img src="'. $image .'" alt="image" ><br>';
echo $node->textContent . '<br>';
}
If the HTML is like this
<div class="age"><p>21</p></div><img src="a.jpg" />
you can try
$node->nextSibling()
As a general point trace through the HTML and think how do I get from A to B? Go forwards? backwards? up to parent, to the next node and down again ...?

Categories