PHP DOMDocument not replacing all images in string - php

I have a function, that should replace imgs:
$content = '<img width="500" height="500" src="hhhh.jpg" /> AWDEQWE ASdAa <p>sdasdasdas</p> <img width="500" height="500" src="hhhh.jpg" /> <p>awedaweq</p>';
$document = new DOMDocument;
$document->loadHTML($content);
$imgs= $document->getElementsByTagName('img');
foreach ($imgs as $img) {
$src= $img->getAttribute('src');
$width= $img->getAttribute('width');
$height= $img->getAttribute('height');
$link= $document->createElement('a');
$link->setAttribute('class', 'player');
$link->setAttribute('href', $src);
$link->setAttribute('style', "display: block; width: {$width}px; height: {$height}px;");
$img->parentNode->replaceChild($link, $img);
}
return $document->saveHTML();
It works fine, but only for the first image.
What is wrong with my code?

The replaceChild method call will affect the live nodeList you are iterating over, and actually removes the node from $imgs. After this mutation the loop (i.e. the iterator underlying it) loses track on where it was in the original nodeList, and so the loop exits.
The solution is to first create a copy of $imgs into a standard array, and the loop over that:
foreach ($imgs as $img) {
$images[] = $img;
}
// now proceed with the loop you really want:
foreach ($images as $img) {
// ...etc
}

Your problem is with the counter, accessing the DOMNode is dynamic. You should touch the dom element above the counter.
$content = '<img width="400" height="400" src="asdf.jpg" /> AWDEQWE ASdAa <p>sdasdasdas</p> <img width="500" height="500" src="hhhh.jpg" /> <p>awedaweq</p>';
$document = new DOMDocument;
$document->loadHTML($content);
$imgs= $document->getElementsByTagName('img');
$i = $imgs->length - 1;
while ($i > -1)
{
$node = $imgs->item($i);
$link= $document->createElement('a');
$link->setAttribute('class', 'player');
$link->setAttribute('href', $node->getAttribute('src'));
$link->setAttribute('style', "display: block; width: {$node->getAttribute('width')}px; height: {$node->getAttribute('height')}px;");
$imgs->item($i)->parentNode->replaceChild($link, $node);
$i--;
}
var_dump($document->saveHTML());

I solved it that way:
$document = new DOMDocument;
$document->loadHTML(mb_convert_encoding($content, 'HTML-ENTITIES', 'UTF-8'));;
$imgs = $document->getElementsByTagName('img');
$i = $imgs->length - 1;
while ($i > -1) {
$image = $imgs->item($i);
$ignore = false;
$width = $image->attributes->getNamedItem('width')->value;
$height = $image->attributes->getNamedItem('height')->value;
$src = $image->attributes->getNamedItem('src')->value;
$alt = $image->attributes->getNamedItem('alt')->value;
$class = $image->attributes->getNamedItem('class')->value;
$class = str_replace( array( "alignleft", "alignright" ), array( "amp_alignleft", "amp_alignright" ), $class );
$new_img = $document->createElement('amp-img', '');
$new_img->setAttribute('src', $src);
$new_img->setAttribute('class', $class);
$new_img->setAttribute('width', $width);
$new_img->setAttribute('height', $height);
$new_img->setAttribute('alt', $alt);
$new_img->setAttribute('layout', 'responsive');
$image->parentNode->replaceChild( $new_img, $image );
$i--;
}
return $document->saveHTML()

Related

PHP - Weird file_get_contents behavior

When I run the first code it works well. The echo works.
<?php
$html = file_get_contents('https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
But when I try the following code and running it with parameters nothing is returned:
index.php?url=https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3
<?php
$html = file_get_contents($_GET["url"]);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
Anyone got any idea?
Update:
Probally not the best and cleanest solution, but it works :)
<?
$url = urldecode($_GET['url']);
$ownerMemberId = urldecode($_GET['ownerMemberId']);
$withPictures = urldecode($_GET['withPictures']);
$page = urldecode($_GET['Page']);
$newurl = $url . "&ownerMemberId=" . $ownerMemberId .
"&withPictures=true&i18n=true&Page=" . $page;
$html = file_get_contents($newurl);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo "<img src='";
echo $image->getAttribute('src');
echo "'>";
echo "<br>";
}
?>
Please decode the url as it is sending another url.
$url = urldecode($_GET['url']);
$html = file_get_contents($url);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
Hope that works for you.

php DOMDocument extract links with anchor or alt

I which to extract all the link include on page with anchor or alt attribute on image include in the links if this one come first.
$html = 'Anchor';
Must return "lien.fr;Anchor"
$html = '<img alt="Alt Anchor">Anchor';
Must return "lien.fr;Alt Anchor"
$html = 'Anchor<img alt="Alt Anchor">';
Must return "lien.fr;Anchor"
I did:
$doc = new DOMDocument();
$doc->loadHTML($html);
$out = "";
$n = 0;
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
$href = $img_alt = $anchor = "";
$href = $element->getAttribute('href');
$n++;
if (!strrpos($href, "panier?")) {
if ($element->firstChild->nodeName == "img") {
$imgs = $element->getElementsByTagName('img');
foreach ($imgs as $img) {
if ($anchor = $img->getAttribute('alt')) {
break;
}
}
}
if (($anchor == "") && ($element->nodeValue)) {
$anchor = $element->nodeValue;
}
$out[$n]['link'] = $href;
$out[$n]['anchor'] = $anchor;
}
}
This seems to work but if there some space or indentation it doesn't
as
$html = '<a href="link.fr">
<img src="ceinture-gris" alt="alt anchor"/>
</a>';
the $element->firstChild->nodeName will be text
Something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
// Output texts that will later be joined with ';'
$out = [];
// Maximum number of items to add to $out
$max_out_items = 2;
// List of img tag attributes that will be parsed by the loop below
// (in the order specified in this array!)
$img_attributes = ['alt', 'src', 'title'];
$links = $doc->getElementsByTagName('a');
foreach ($links as $element) {
if ($href = trim($element->getAttribute('href'))) {
$out []= $href;
if (count($out) >= $max_out_items)
break;
}
foreach ($element->childNodes as $child) {
if ($child->nodeType === XML_TEXT_NODE &&
$text = trim($child->nodeValue))
{
$out []= $text;
if (count($out) >= $max_out_items)
break;
} elseif ($child->nodeName == 'img') {
foreach ($img_attributes as $attr_name) {
if ($attr_value = trim($child->getAttribute($attr_name))) {
$out []= $attr_value;
if (count($out) >= $max_out_items)
goto Result;
}
}
}
}
}
Result:
echo $out = implode(';', $out);

how to remove one image in string via php dom

I have an string with some <img> in it.
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img)
{
if($img->getAttribute('src') == 'pic.jpg')
{
// I want delete that picture form string
$img->parentNode->removeChild($img);
}
else
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;
In the end of function when I print $string, the target pic has been delete but for other pic, no add any class to them!
but If I remove $img->parentNode->removeChild($img); , the class will add!
what's my wrong?
EDIT
please check for this sample string:
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
You can delete nodes if you iterate backwards.
Simply change
// Forward iteration
foreach ($imgs as $img) {
to
// Reverse iteration
for($i = $imgs->length; --$i >= 0;) {
$img = $imgs->item($i);
Ref: http://php.net/manual/en/class.domnodelist.php#83390
Finaly I solved this with this change....hope help others...
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
$imgs1 = $imgs2 = array();
foreach($imgs as $img) {
if($img->getAttribute('src') == 'pic.jpg')
{
$imgs1[] = $img;
}
else
$imgs2[] = $img;
}
foreach($imgs1 as $img) {
$img->parentNode->removeChild($img);
}
foreach ($imgs2 as $img)
{
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;

How to make this PHP code run more efficiently?

PHP code:
$AutomaticallyOpenShow = $AutomaticallyOpen."/";
$images = scandir($AutomaticallyOpen,0);
$counter = 0;
foreach($images as $curimg) {
if (strpos($curimg, '.jpg')>0 || strpos($curimg, '.JPG')>0) {
if($counter==1){$ImageView_1 = $AutomaticallyOpenShow.$curimg; }
elseif($counter==2){$ImageView_2 = $AutomaticallyOpenShow.$curimg; }
elseif($counter==3){$ImageView_3 = $AutomaticallyOpenShow.$curimg; }
elseif($counter==4){$ImageView_4 = $AutomaticallyOpenShow.$curimg; }
$counter++;
}
}
HTML code:
<img src="<?php echo ImageView_1 ; ?>" width="500" height="500" />
Thanks for Kurro1 and RaggaMuffin-420 answer.
I finally made ​​the integration
PHP code:
$AutomaticallyOpenShow = $AutomaticallyOpen."/";
$images = scandir($AutomaticallyOpen,0);
$counter = 1;
foreach($images as $curimg) {
if (preg_match('/^.*\.[jpeg]{3,4}$/i', $curimg)) {
$ImageView[$counter++] = $AutomaticallyOpenShow.$curimg;
$counter++;
}
}
HTML code:
<img src="<?php echo $ImageView[????] ; ?>" width="500" height="500" />
How about using an array for storage instead of 4 variables?:
$AutomaticallyOpenShow = $AutomaticallyOpen."/";
$images = scandir($AutomaticallyOpen,0);
$counter = 0;
$ImageView = array();
foreach($images as $curimg) {
// the condition could also be optimized
if (strpos($curimg, '.jpg')>0 || strpos($curimg, '.JPG')>0) {
// write result in array at appropriate position
$ImageView[$counter++] = $AutomaticallyOpenShow.$curimg;
}
}
The html code would be like the following:
<img src="<?php echo $ImageView[0]; ?>" width="500" height="500" />
$AutomaticallyOpenShow = $AutomaticallyOpen."/";
$images = scandir($AutomaticallyOpen,0);
$counter = 1;
foreach($images as $curimg) {
if (preg_match('/^.*\.[jpeg]{3,4}$/i', $curimg)) {
if($counter >= 1 && $counter <= 4){
$varName = 'ImageView_'.$counter;
$$varName = $AutomaticallyOpenShow.$curimg;
}
$counter++;
}
}
Based on the first code you posted.
PHP code:
$AutomaticallyOpenShow = $AutomaticallyOpen."/";
$images = scandir($AutomaticallyOpen,0);
$ImageView = Array();
$counter = 0;
foreach($images as $curimg) {
if (strpos($curimg, '.jpg')>0 || strpos($curimg, '.JPG')>0) {
$ImageView[$counter++] = $AutomaticallyOpenShow.$curimg;
}
}
HTML code:
<img src="<?php echo ImageView[0] ; ?>" width="500" height="500" />

Convert Image To base64 while fetching them from other urls

I'm using a php to code to fetch images and data from the other urls
but need to convert images to base64 string..!!
the code is
<?php
function getMetaTitle($content){
$pattern = "|<[\s]*title[\s]*>([^<]+)<[\s]*/[\s]*title[\s]*>|Ui";
if(preg_match($pattern, $content, $match))
return $match[1];
else
return false;
}
function fetch_record($path)
{
$file = fopen($path, "r");
if (!$file)
{
exit("Problem occured");
}
$data = '';
while (!feof($file))
{
$data .= fgets($file, 1024);
}
return $data;
}
$url = $_POST['url'];
$data = array();
// get url title
$content = #file_get_contents($url);
$data['title'] = getMetaTitle($content);
// get url description from meta tag
$tags = #get_meta_tags($url);
$data['description'] = $tags['description'];
$string = fetch_record($url);
// fetch images
$image_regex = '/<img[^>]*'.'src=[\"|\'](.*)[\"|\']/Ui';
preg_match_all($image_regex, $content, $img, PREG_PATTERN_ORDER);
$images_array = $img[1];
$k=1;
for ($i=0;$i<=sizeof($images_array);$i++)
{
if(#$images_array[$i])
{
if(#getimagesize(#$images_array[$i]))
{
list($width, $height, $type, $attr) = getimagesize(#$images_array[$i]);
if($width > 50 && $height > 50 ){
$data['images'] = "<img src='".#$images_array[$i]."' id='".$k."' width='100%'>";
$k++;
}
}
}
}
$data['form'] = '<input type="hidden" name="images" value="'.$data['images'].'"/>
<input type="hidden" name="title" value="'.$data['title'].'"/>
<input type="hidden" name="description" value="'.$data['description'].'"/>';
$dom = new domDocument;
#$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach($images as $img)
{
$url = $img->getAttribute('src');
$alt = $img->getAttribute('alt');
$pos = strpos($url, 'http://');
if ($pos === false) {
// $data['images'] = '<img src="'.$_POST['url'].''.$url.'" title="'.$alt.'"/>';
} else {
// $data['images'] = '<img src="'.$url.'" title="'.$alt.'"/>';
}
}
echo json_encode($data);
?>
This code use images in there Standard extension on this line
$data['images'] = "<img src='".#$images_array[$i]."' id='".$k."' width='100%'>";
I want to convert them to base64 and them use
Once you have the url for the image you just need to grab it with curl and call base64_encode.
chunk_split just makes it perdy.
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
$ret_val = curl_exec($curl);
// TODO: error checking!!!
$b64_image_data = chunk_split(base64_encode($ret_val));
curl_close($curl);
Another option is to get PHP to filter the image's binary data into a Base64 encoded value with a stream conversion filter (docs).
$img_url = 'http://www.php.net/images/php.gif';
$b64_url = 'php://filter/read=convert.base64-encode/resource='.$img_url;
$b64_img = file_get_contents($b64_url);
echo $b64_img;

Categories