PHP DOMDocument parentNode->replaceChild causing foreach to skip next item - php

I am parsing html in the $content variable with the DOMDocument to replace all iframes with images. The foreach is only replacing the ODD iframes. I have removed all the code in the foreach and found the piece of code causing this is: '$iframe->parentNode->replaceChild($link, $iframe);'
Why would the foreach be skipping all of the odd iframes?
The code:
$count = 1;
$dom = new DOMDocument;
$dom->loadHTML($content);
$iframes = $dom->getElementsByTagName('iframe');
foreach ($iframes as $iframe) {
$src = $iframe->getAttribute('src');
$width = $iframe->getAttribute('width');
$height = $iframe->getAttribute('height');
$link = $dom->createElement('img');
$link->setAttribute('class', 'iframe-'.self::return_video_type($iframe->getAttribute('src')).' iframe-'.$count.' iframe-ondemand-placeholderImg');
$link->setAttribute('src', $placeholder_image);
$link->setAttribute('height', $height);
$link->setAttribute('width', $width);
$link->setAttribute('data-iframe-src', $src);
$iframe->parentNode->replaceChild($link, $iframe);
echo "here:".$count;
$count++;
}
$content = $dom->saveHTML();
return $content;
This is the problem line of code
$iframe->parentNode->replaceChild($link, $iframe);

A DOMNodeList, such as that returned from getElementsByTagName, is "live":
that is, changes to the underlying document structure are reflected in all relevant NodeList... objects
So when you remove the element (in this case by replacing it with another one) it no longer exists in the node list, and the next one in line takes its position in the index. Then when foreach hits the next iteration, and hence the next index, one will be effectively skipped.
Don't remove elements from the DOM via foreach like this.
An approach that works instead would be to use a while loop to iterate and replace until your $iframes node list is empty.
Example:
while ($iframes->length) {
$iframe = $iframes->item(0);
$src = $iframe->getAttribute('src');
$width = $iframe->getAttribute('width');
$height = $iframe->getAttribute('height');
$link = $dom->createElement('img');
$link->setAttribute('class', 'iframe-'.self::return_video_type($iframe->getAttribute('src')).' iframe-'.$count.' iframe-ondemand-placeholderImg');
$link->setAttribute('src', $placeholder_image);
$link->setAttribute('height', $height);
$link->setAttribute('width', $width);
$link->setAttribute('data-iframe-src', $src);
$iframe->parentNode->replaceChild($link, $iframe);
echo "here:".$count;
$count++;
}

Faced this issue today, and guide by the answer, i make a simple code solution for you guys
$iframes = $dom->getElementsByTagName('iframe');
for ($i=0; $i< $iframes->length; $i++) {
$iframe = $iframes->item($i);
if("condition to replace"){
// do some replace thing
$i--;
}
}
Hope this help.

Related

How to check if namespace element exists in PHP

I'm trying to check if the element media:content exists so that a thumbnail image can be shown from an rss feed but not sure how to validate it's existence.
I can get the media:content url and show the image fine but checking to see if it exists isn't working out. I've tried isset and defined but I'm clearly doing something wrong.
Main line i'm concerned about below is:
if(defined($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url')))
<?php
$feed = new DOMDocument();
$feed->load('http://rss.cnn.com/rss/cnn_topstories.rss');
$json = array();
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['item'] = array();
foreach($items as $item) {
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$text = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
if(defined($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url'))){
$image = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
}
else{
$image = '';
}
echo $image;
}
?>
When you use getElementsByTagName, this always returns a DOMNodeList. The main thing is to check if the node list has 0 elements.
So rather than defined() or isset(), use ->length...
$nodes=$domDocument->getElementsByTagName('book') ;
if ($nodes->length==0) {
// no results
}
( Example from http://php.net/manual/en/domdocument.getelementsbytagname.php)
Nigel was right, using ->length is the way to do it.
if($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
$image = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
}

PHP dealing with Invalid argument and expects parameter's

Good Evening, i am trying to teaching myself php, as i go and decided to try build something for our LAN server at work.
i have the following code that works and displays the images from a directory, i am using this as i am building a booking in system for work and naming the image by the job number.
for this i'm naming the images test but having a issue if there inst any images.
<?php
$directory = "saved_images/";
$images = glob("" . $directory . "test*.jpg");
$imgs = '';
foreach($images as $image){ $imgs[] = "$image"; }
$imgs = array_slice($imgs, 0, 20);
$result = count($imgs);
if ($result == 0)
{
$img="No Photos";
echo $img;
}
} else {
foreach ($imgs as $img) {
echo "<img src='$img' /> ";
}}
?>
the issue is, if there is not any photos i would like it to echo no photos instead of the following errors
array_slice() expects parameter 1 to be array,
refering to this line
$imgs = array_slice($imgs, 0, 20);
and
Invalid argument supplied for foreach()
refering to this line
foreach ($imgs as $img)
I have seen someone with a similar problem, but sadly they was advised to ignore the issue and turn off the error reporting, which didn't seem right, I am only asking as this inst causing any problem for the rest of the project and like to know how to fix this problem so if i encounter it again i know what to do.
All you need to do is set $imgs as an array (not a string), and check whether $imgs is empty or not. While you're at it, you will want to check if $images is empty or not, too.
$directory = "saved_images/";
$images = glob("" . $directory . "test*.jpg");
// since the entire script relies on $images not being empty,
// we should check for that to be sure before moving on
// you can also test for glob() returning FALSE on error, if you anticipate that it might
if ( ! empty($images) ) {
$imgs = []; // this should be set as an array, not a string
foreach ($images as $image)
{
$imgs[] = $image;
}
if ( empty($imgs) ) {
echo 'No Photo';
}
else {
$imgs = array_slice($imgs, 0, 20);
$result = count($imgs);
foreach ($imgs as $img)
{
echo "<img src='$img'> ";
}
}
}
else {
echo 'No images in ' . $directory . ';
}
Why are you initializing $imgs as a string?
$imgs = '';
and then treating it as an array?
foreach($images as $image){ $imgs[] = "$image"; }
If you'd initialize it as an array, e.g.
$imgs = array();
Then even if the foreach doesn't add anythign to the array, it'll still be an (empty) array when you pass it into array_slice.
Basically, you create a pizza, then wonder why PHP is complaining that it's not a chocolate cake.
wrap
$imgs = array_slice($imgs, 0, 20);
inside an if(isset()) statement, like so
if(isset($imgs)){
$imgs = array_slice($imgs, 0, 20);
}
If it still doesn't work add
&& count($imgs) > 0
to it
Edit:
And what #MarcB said, replace $imgs = ''; with $imgs = array();
You can use the php function is_array($myArrayVar) (PHP Net) to test if $img is an array before the line where you are using array_slice($imgs, 0, 20). Something like:
if (is_array($imgs)){
// $imgs is an array
} else {
//$imgs is not an array
}

Loading content from remote site doesn't work, but why?

I'm still working on this catalogue for a client, which loads images from a remote site via PHP and the Simple DOM Parser.
// Code excerpt from http://internetvolk.de/fileadmin/template/res/scrape.php, this is just one case of a select
$subcat = $_GET['subcat'];
$url = "http://pinesite.com/meubelen/index.php?".$subcat."&lang=de";
$html = file_get_html(html_entity_decode($url));
$iframe = $html->find('iframe',0);
$url2 = $iframe->src;
$html->clear();
unset($html);
$fullurl = "http://pinesite.com/meubelen/".$url2;
$html2 = file_get_html(html_entity_decode($fullurl));
$pagecount = 1;
$titles = $html2->find('.tekst');
$images = $html2->find('.plaatje');
$output='';
$i=0;
foreach ($images as $image) {
$item['title'] = $titles[$i]->find('p',0)->plaintext;
$imagePath = $image->find('img',0)->src;
$item['thumb'] = resize("http://pinesite.com".str_replace('thumb_','',$imagePath),array("w"=>225, "h"=>162));
$item['image'] = 'http://pinesite.com'.str_replace('thumb_','',$imagePath);
$fullurl2 = "http://pinesite.com/meubelen/prog/showpic.php?src=".str_replace('thumb_','',$imagePath)."&taal=de";
$html3 = file_get_html($fullurl2);
$item['size'] = str_replace(' ','',$html3->find('td',1)->plaintext);
unset($html3);
$output[] = $item;
$i++;
}
if (count($html2->find('center')) > 1) {
// ok, multi-page here, let's find out how many there are
$pagecount = count($html2->find('center',0)->find('a'))-1;
for ($i=1;$i<$pagecount; $i++) {
$startID = $i*20;
$newurl = html_entity_decode($fullurl."&beginrec=".$startID);
$html3 = file_get_html($newurl);
$titles = $html3->find('.tekst');
$images = $html3->find('.plaatje');
$a=0;
foreach ($images as $image) {
$item['title'] = $titles[$a]->find('p',0)->plaintext;
$item['image'] = 'http://pinesite.com'.str_replace('thumb_','',$image->find('img',0)->src);
$item['thumb'] = resize($item['image'],array("w"=>225, "h"=>150));
$output[] = $item;
$a++;
}
$html3->clear();
unset ($html3);
}
}
echo json_encode($output);
So what it should do (and does with some categories): Output the images, the titles and the the thumbnails from this page: http://pinesite.com
This works, for example, if you pass it a "?function=images&subcat=antiek", but not if you pass it a "?function=images&subcat=stoelen". I don't even think it's a problem with the remote page, so there has to be an error in my code.
Ehm..trying to state the obvious maybe but 'stoele'?
As it turns out, my code was completely fine, it was a missing space in the HTML of the remote site that got the Simple PHP DOM Parser to not recognize the iframe I was looking for. I fixed it on my end by running a str_replace on the code first to replace the faulty code.
I know it's a dirty solution, but it works :)

having problems passing array values

I'm building a PHP program that basically grabs only image links from my twitter feed and displays them on a page, I have 3 components that I have set up that all work fine on their own.
The first component is the twitter oauth component which grabs the tweet text and creates an array, this works fine by itself.
The second is a function that processes the tweets and only returns tweets that contain image links, this as well works fine.
The program breaks down during the third section when the links are processed and an image is displayed, I had no issues running this on its own and from my attempts to trouble shoot it appears that it breaks down at the $images(); array, as that array is empty.
I'm sure I've made a silly mistake but I've been trying to find this for over a day now and can't seem to fix it. Any help would be great! Thanks guys!
code:
<?php
if ($result['socialorigin']== "twitter"){
$twitterObj = new EpiTwitter($consumer_key, $consumer_secret);
$token = $twitterObj->getAccessToken();
$twitterObj->setToken($result['oauthtoken'], $result['oauthsecret']);
$tweets = $twitterObj->get('/statuses/home_timeline.json',array('count'=>'200'));
$all_tweets = array();
$hosts = "lockerz|yfrog|twitpic|tumblr|mypict|ow.ly|instagr";
foreach($tweets as $tweet) {
$twtext = $tweet->text;
if(preg_match("~http://($hosts)~", $twtext)){
preg_match_all("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $twtext, $matches, PREG_PATTERN_ORDER);
foreach($matches[0] as $key2 => $link){
array_push($all_tweets,"$link");
}
}
}
function height_compare($a1, $b1)
{
if ($a1 == $b1) {
return 0;
}
return ($a1 > $b1) ? -1 : 1;
}
foreach($all_tweets as $alltweet => $tlink){
$doc = new DOMDocument();
// Okay this is HTML is kind of screwy
// So we're going to supress errors
#$doc->loadHTMLFile($tlink);
// Get all images
$images_list = $doc->getElementsByTagName('img');
$images = array();
foreach($images_list as $image) {
// Get the src attribute
$image_source = $image->getAttribute('src');
if (substr($image_source,0,7)=="http://"){
$image_size_info = getimagesize($image_source);
$images[$image_source] = $image_size_info[1];
}
}
// Do a numeric sort on the height
uasort($images, "height_compare");
$tallest_image = array_slice($images, 0,1);
$mainimg = key($tallest_image);
echo "<img src='$mainimg' />";
}
print_r($all_tweets);
print_r($images);
}
Change the for loop where you fetch the actual images to move the images array OUTSIDE the for loop. This will prevent the loop from clearing it each time through.
$images = array();
foreach($all_tweets as $alltweet => $tlink){
$doc = new DOMDocument();
// Okay this is HTML is kind of screwy
// So we're going to supress errors
#$doc->loadHTMLFile($tlink);
// Get all images
$images_list = $doc->getElementsByTagName('img');
foreach($images_list as $image) {
// Get the src attribute
$image_source = $image->getAttribute('src');
if (substr($image_source,0,7)=="http://"){
$image_size_info = getimagesize($image_source);
$images[$image_source] = $image_size_info[1];
}
}
// Do a numeric sort on the height
uasort($images, "height_compare");
$tallest_image = array_slice($images, 0,1);
$mainimg = key($tallest_image);
echo "<img src='$mainimg' />";
}

Extract an attribute from a specific element in DOM

I want to be able to extract only the src of the second image in an html file. I am using the PHP DOM parser:
foreach($html->find('img[src]') as $element)
$src = $element->getAttribute('src');
echo $src;
However, I am getting the src of the last image in the page, instead of the one I am looking for.
Can I display only a specific src outside of the foreach loop?
Your loop is missing {}, it is equivalent to
foreach($html->find('img[src]') as $element) {
$src = $element->getAttribute('src');
}
echo $src;
so, the echo gets the $src after the last iteration of your loop, which is the last element.
Using the example from their website, I'd go with this (braces are key here):
$count = 1;
foreach($html->find('img') as $element) {
if ($count == 2) {
echo $element->src;
break;
}
$count += 1;
}

Categories