how to remove one image in string via php dom - php

I have an string with some <img> in it.
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img)
{
if($img->getAttribute('src') == 'pic.jpg')
{
// I want delete that picture form string
$img->parentNode->removeChild($img);
}
else
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;
In the end of function when I print $string, the target pic has been delete but for other pic, no add any class to them!
but If I remove $img->parentNode->removeChild($img); , the class will add!
what's my wrong?
EDIT
please check for this sample string:
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';

You can delete nodes if you iterate backwards.
Simply change
// Forward iteration
foreach ($imgs as $img) {
to
// Reverse iteration
for($i = $imgs->length; --$i >= 0;) {
$img = $imgs->item($i);
Ref: http://php.net/manual/en/class.domnodelist.php#83390

Finaly I solved this with this change....hope help others...
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
$imgs1 = $imgs2 = array();
foreach($imgs as $img) {
if($img->getAttribute('src') == 'pic.jpg')
{
$imgs1[] = $img;
}
else
$imgs2[] = $img;
}
foreach($imgs1 as $img) {
$img->parentNode->removeChild($img);
}
foreach ($imgs2 as $img)
{
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;

Related

replace image src with full image tag in string

I have this code to replace image tags in the string with their respective src..
$url='<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f602.png">checking<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f601.png"><img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f62c.png">working check<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f600.png">';
$doc = new DOMDocument();
#$doc->loadHTML($url);
$tags = $doc->getElementsByTagName('img');
$str = "-" ;
foreach ($tags as $tag) {
$img_path = $tag->getAttribute('src');
$directory = $img_path;
$ee = pathinfo($directory);
$pic_name= $ee['basename'];
$next = "" ;
if ($tag->nextSibling && get_class($tag->nextSibling) == "DOMText") {
$next = $tag->nextSibling->wholeText . "-" ;
}
$str .= $pic_name . "-" . $next ;
}
echo $str ;
output of above code is
-1f602.png-checking-1f601.png-1f62c.png-working check-1f600.png-
Now how can I replace this image src enclosed in "-" to full image tag as above?
You can store values in array. now you can use this array to do your further work.
$url='<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f602.png">checking<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f601.png"><img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f62c.png">working check<img class="emojioneemoji" src="http://localhost/sng/assets/js/plugins/em/2.1.4/assets/png/1f600.png">';
$doc = new DOMDocument();
#$doc->loadHTML($url);
$tags = $doc->getElementsByTagName('img');
$str = [] ;
$i = 0;
foreach ($tags as $tag) {
$img_path = $tag->getAttribute('src');
$directory = $img_path;
$ee = pathinfo($directory);
$pic_name= $ee['basename'];
$next = "" ;
if ($tag->nextSibling && get_class($tag->nextSibling) == "DOMText") {
$next = $tag->nextSibling->wholeText;
}
array_push($str, ['src' => $pic_name, 'next' => $next]);
}
echo "<pre>";
print_r($str);

PHP DOMDocument not replacing all images in string

I have a function, that should replace imgs:
$content = '<img width="500" height="500" src="hhhh.jpg" /> AWDEQWE ASdAa <p>sdasdasdas</p> <img width="500" height="500" src="hhhh.jpg" /> <p>awedaweq</p>';
$document = new DOMDocument;
$document->loadHTML($content);
$imgs= $document->getElementsByTagName('img');
foreach ($imgs as $img) {
$src= $img->getAttribute('src');
$width= $img->getAttribute('width');
$height= $img->getAttribute('height');
$link= $document->createElement('a');
$link->setAttribute('class', 'player');
$link->setAttribute('href', $src);
$link->setAttribute('style', "display: block; width: {$width}px; height: {$height}px;");
$img->parentNode->replaceChild($link, $img);
}
return $document->saveHTML();
It works fine, but only for the first image.
What is wrong with my code?
The replaceChild method call will affect the live nodeList you are iterating over, and actually removes the node from $imgs. After this mutation the loop (i.e. the iterator underlying it) loses track on where it was in the original nodeList, and so the loop exits.
The solution is to first create a copy of $imgs into a standard array, and the loop over that:
foreach ($imgs as $img) {
$images[] = $img;
}
// now proceed with the loop you really want:
foreach ($images as $img) {
// ...etc
}
Your problem is with the counter, accessing the DOMNode is dynamic. You should touch the dom element above the counter.
$content = '<img width="400" height="400" src="asdf.jpg" /> AWDEQWE ASdAa <p>sdasdasdas</p> <img width="500" height="500" src="hhhh.jpg" /> <p>awedaweq</p>';
$document = new DOMDocument;
$document->loadHTML($content);
$imgs= $document->getElementsByTagName('img');
$i = $imgs->length - 1;
while ($i > -1)
{
$node = $imgs->item($i);
$link= $document->createElement('a');
$link->setAttribute('class', 'player');
$link->setAttribute('href', $node->getAttribute('src'));
$link->setAttribute('style', "display: block; width: {$node->getAttribute('width')}px; height: {$node->getAttribute('height')}px;");
$imgs->item($i)->parentNode->replaceChild($link, $node);
$i--;
}
var_dump($document->saveHTML());
I solved it that way:
$document = new DOMDocument;
$document->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));;
$imgs = $document->getElementsByTagName('img');
$i = $imgs->length - 1;
while ($i > -1) {
$image = $imgs->item($i);
$ignore = false;
$width = $image->attributes->getNamedItem('width')->value;
$height = $image->attributes->getNamedItem('height')->value;
$src = $image->attributes->getNamedItem('src')->value;
$alt = $image->attributes->getNamedItem('alt')->value;
$class = $image->attributes->getNamedItem('class')->value;
$class = str_replace( array( "alignleft", "alignright" ), array( "amp_alignleft", "amp_alignright" ), $class );
$new_img = $document->createElement('amp-img', '');
$new_img->setAttribute('src', $src);
$new_img->setAttribute('class', $class);
$new_img->setAttribute('width', $width);
$new_img->setAttribute('height', $height);
$new_img->setAttribute('alt', $alt);
$new_img->setAttribute('layout', 'responsive');
$image->parentNode->replaceChild( $new_img, $image );
$i--;
}
return $document->saveHTML()

PHP - Weird file_get_contents behavior

When I run the first code it works well. The echo works.
<?php
$html = file_get_contents('https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
But when I try the following code and running it with parameters nothing is returned:
index.php?url=https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3
<?php
$html = file_get_contents($_GET["url"]);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
Anyone got any idea?
Update:
Probally not the best and cleanest solution, but it works :)
<?
$url = urldecode($_GET['url']);
$ownerMemberId = urldecode($_GET['ownerMemberId']);
$withPictures = urldecode($_GET['withPictures']);
$page = urldecode($_GET['Page']);
$newurl = $url . "&ownerMemberId=" . $ownerMemberId .
"&withPictures=true&i18n=true&Page=" . $page;
$html = file_get_contents($newurl);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo "<img src='";
echo $image->getAttribute('src');
echo "'>";
echo "<br>";
}
?>
Please decode the url as it is sending another url.
$url = urldecode($_GET['url']);
$html = file_get_contents($url);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
Hope that works for you.

php DOMDocument extract links with anchor or alt

I which to extract all the link include on page with anchor or alt attribute on image include in the links if this one come first.
$html = 'Anchor';
Must return "lien.fr;Anchor"
$html = '<img alt="Alt Anchor">Anchor';
Must return "lien.fr;Alt Anchor"
$html = 'Anchor<img alt="Alt Anchor">';
Must return "lien.fr;Anchor"
I did:
$doc = new DOMDocument();
$doc->loadHTML($html);
$out = "";
$n = 0;
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
$href = $img_alt = $anchor = "";
$href = $element->getAttribute('href');
$n++;
if (!strrpos($href, "panier?")) {
if ($element->firstChild->nodeName == "img") {
$imgs = $element->getElementsByTagName('img');
foreach ($imgs as $img) {
if ($anchor = $img->getAttribute('alt')) {
break;
}
}
}
if (($anchor == "") && ($element->nodeValue)) {
$anchor = $element->nodeValue;
}
$out[$n]['link'] = $href;
$out[$n]['anchor'] = $anchor;
}
}
This seems to work but if there some space or indentation it doesn't
as
$html = '<a href="link.fr">
<img src="ceinture-gris" alt="alt anchor"/>
</a>';
the $element->firstChild->nodeName will be text
Something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
// Output texts that will later be joined with ';'
$out = [];
// Maximum number of items to add to $out
$max_out_items = 2;
// List of img tag attributes that will be parsed by the loop below
// (in the order specified in this array!)
$img_attributes = ['alt', 'src', 'title'];
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
if ($href = trim($element->getAttribute('href'))) {
$out []= $href;
if (count($out) >= $max_out_items)
break;
}
foreach ($element->childNodes as $child) {
if ($child->nodeType === XML_TEXT_NODE &&
$text = trim($child->nodeValue))
{
$out []= $text;
if (count($out) >= $max_out_items)
break;
} elseif ($child->nodeName == 'img') {
foreach ($img_attributes as $attr_name) {
if ($attr_value = trim($child->getAttribute($attr_name))) {
$out []= $attr_value;
if (count($out) >= $max_out_items)
goto Result;
}
}
}
}
}
Result:
echo $out = implode(';', $out);

Extract href from html page using php

I trying to extract the news headlines and the link (href) of each headline using the code bellow, but the link extraction is not working. It's only getting the headline. Please help me find out what's wrong with the code.
Link to page from which I want to get the headline and link from:
http://web.tmxmoney.com/news.php?qm_symbol=BCM
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
#$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('span');
$newstitle = $cols->item(0)->nodeValue;
$link = $cols->item(0)->nodeType === HTML_ELEMENT_NODE ? $cols->item(0)->getElementsByTagName('a')->item(0)->getAttribute('href') : '';
echo $newstitle . '<br>';
echo $link . '<br><br>';
}
?>
Thanks in advance for your help!
Try to do this:
<?php
$data= file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new DOMDocument();
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$hrefs= $xpath->query('/html/body//a');
for($i = 0; $i < $hrefs->length; $i++){
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = filter_var($url, FILTER_SANITIZE_URL);
if(!filter_var($url, FILTER_VALIDATE_URL) === false){
echo ''.$url.'<br />';
}
}
?>
I have found the solution. Here it goes:
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
#$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols1 = $row->getElementsByTagName('a');
$link = $cols1->item(0)->nodeType === XML_ELEMENT_NODE ? $cols1->item(0)->getAttribute('href') : '';
$cols2 = $row->getElementsByTagName('span');
$title = $cols2->item(0)->nodeValue;
$source = $cols2->item(1)->nodeValue;
echo $title . '<br>';
echo $source . '<br>';
echo $link . '<br><br>';
}
?>

Categories