How to check if namespace element exists in PHP - php

I'm trying to check if the element media:content exists so that a thumbnail image can be shown from an rss feed but not sure how to validate it's existence.
I can get the media:content url and show the image fine but checking to see if it exists isn't working out. I've tried isset and defined but I'm clearly doing something wrong.
Main line i'm concerned about below is:
if(defined($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url')))
<?php
$feed = new DOMDocument();
$feed->load('http://rss.cnn.com/rss/cnn_topstories.rss');
$json = array();
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
$json['item'] = array();
foreach($items as $item) {
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
$text = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
if(defined($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url'))){
$image = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
}
else{
$image = '';
}
echo $image;
}
?>

When you use getElementsByTagName, this always returns a DOMNodeList. The main thing is to check if the node list has 0 elements.
So rather than defined() or isset(), use ->length...
$nodes=$domDocument->getElementsByTagName('book') ;
if ($nodes->length==0) {
// no results
}
( Example from http://php.net/manual/en/domdocument.getelementsbytagname.php)

Nigel was right, using ->length is the way to do it.
if($item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->length > 0){
$image = $item->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
}

Related

How do I access individual <span> for scraping web data in PHP?

I'm trying to scrape pricing data from a few home depot URLs. I'm using simple_html_dom.php which can be found here: https://simplehtmldom.sourceforge.io/
I'm having issues figuring out how to get the individual span class that I need in order to get the data I want.
Here's an image of the inspect element with the various fields I'm trying to access:
https://prnt.sc/qx5ujt
Here's the code I have so far which returns an empty array:
?<php
require 'simple_html_dom.php';
$dom = file_get_html('https://www.homedepot.com/p/Zinsco-20-Amp-1-1-2-in-2-Pole-Replacement-Thick-Circuit-Breaker-UBIZ220/100183119', false);
$answer = array();
if(empty($dom))
{
echo "EMPTY";
exit;
}
$divClass = "";
$dollars = "";
$cents = "";
$i = 0;
foreach($dom->find('price__wrapper') as $divClass)
{
foreach($divClass->find('span[class=price__dollars]') as $dollars) //dollars
{
$answer[$i]['dollars'] = $dollars->plaintext;
}
foreach($divClass->find('span[class=price__cents]') as $cents) //cents
{
$answer[$i]['cents'] = $cents->plaintext;
}
$i++;
}
print_r($answer);
exit;
?>
If the HTML really looks always like in the screenshot, than you can go on the content attribute of the parent span element.
This you can archive like below (untested example).
// Get the parent span element with ID ajaxPrice
$element = $dom->find('span#ajaxPrice',0);
// Check if attribute "content" exists
if(isset($element->content)) {
$price = $element->content; // This is the price as string value
$priceFloat = floatval($price); // This is the price as float value
// Splitting price in dollar and cent
$price = explode('.', $price);
$dollar = $price[0];
$cent = $price[1];
}

Fetch images URL from XML

I am using the below code to fetch the $movie->id from the response XML
<?php
$movie_name='Dabangg 2';
$url ='http://api.themoviedb.org/2.1/Movie.search/en/xml/accd3ddbbae37c0315fb5c8e19b815a5/%22Dabangg%202%22';
$xml = simplexml_load_file($url);
$movies = $xml->movies->movie;
foreach ($movies as $movie){
$arrMovie_id = $movie->id;
}
?>
the response xml structure is
How to fetch image URL with thumb size?
See the below an easy way to get only specific images.
$xml = simplexml_load_file($url);
$images = $xml->xpath("//image");
//echo "<pre>";print_r($images);die;
foreach ($images as $image){
if($image['size'] == "thumb"){
echo "URL:".$image['url']."<br/>";
echo "SIZE:".$image['size']."<br/>";
echo "<hr/>";
}
}
Use the attributes() method of SimpleXmlElement.
Example:
$imageAttributes = $movie->images[0]->attributes();
$size = $imageAttributes['size'];
See the documentation at: http://www.php.net/manual/en/simplexmlelement.attributes.php
EDIT: select only URL attributes with size = "thumb" and type = "poster":
$urls = $xml->xpath("//image[#size='thumb' and #type='poster']/#url");
if you expect only 1 url, do:
$url = (string)$xml->xpath("//image[#size='thumb' and #type='poster']/#url")[0];
echo $url;
working live demo: http://codepad.viper-7.com/wdmEay

Loading content from remote site doesn't work, but why?

I'm still working on this catalogue for a client, which loads images from a remote site via PHP and the Simple DOM Parser.
// Code excerpt from http://internetvolk.de/fileadmin/template/res/scrape.php, this is just one case of a select
$subcat = $_GET['subcat'];
$url = "http://pinesite.com/meubelen/index.php?".$subcat."&lang=de";
$html = file_get_html(html_entity_decode($url));
$iframe = $html->find('iframe',0);
$url2 = $iframe->src;
$html->clear();
unset($html);
$fullurl = "http://pinesite.com/meubelen/".$url2;
$html2 = file_get_html(html_entity_decode($fullurl));
$pagecount = 1;
$titles = $html2->find('.tekst');
$images = $html2->find('.plaatje');
$output='';
$i=0;
foreach ($images as $image) {
$item['title'] = $titles[$i]->find('p',0)->plaintext;
$imagePath = $image->find('img',0)->src;
$item['thumb'] = resize("http://pinesite.com".str_replace('thumb_','',$imagePath),array("w"=>225, "h"=>162));
$item['image'] = 'http://pinesite.com'.str_replace('thumb_','',$imagePath);
$fullurl2 = "http://pinesite.com/meubelen/prog/showpic.php?src=".str_replace('thumb_','',$imagePath)."&taal=de";
$html3 = file_get_html($fullurl2);
$item['size'] = str_replace(' ','',$html3->find('td',1)->plaintext);
unset($html3);
$output[] = $item;
$i++;
}
if (count($html2->find('center')) > 1) {
// ok, multi-page here, let's find out how many there are
$pagecount = count($html2->find('center',0)->find('a'))-1;
for ($i=1;$i<$pagecount; $i++) {
$startID = $i*20;
$newurl = html_entity_decode($fullurl."&beginrec=".$startID);
$html3 = file_get_html($newurl);
$titles = $html3->find('.tekst');
$images = $html3->find('.plaatje');
$a=0;
foreach ($images as $image) {
$item['title'] = $titles[$a]->find('p',0)->plaintext;
$item['image'] = 'http://pinesite.com'.str_replace('thumb_','',$image->find('img',0)->src);
$item['thumb'] = resize($item['image'],array("w"=>225, "h"=>150));
$output[] = $item;
$a++;
}
$html3->clear();
unset ($html3);
}
}
echo json_encode($output);
So what it should do (and does with some categories): Output the images, the titles and the the thumbnails from this page: http://pinesite.com
This works, for example, if you pass it a "?function=images&subcat=antiek", but not if you pass it a "?function=images&subcat=stoelen". I don't even think it's a problem with the remote page, so there has to be an error in my code.
Ehm..trying to state the obvious maybe but 'stoele'?
As it turns out, my code was completely fine, it was a missing space in the HTML of the remote site that got the Simple PHP DOM Parser to not recognize the iframe I was looking for. I fixed it on my end by running a str_replace on the code first to replace the faulty code.
I know it's a dirty solution, but it works :)

having problems passing array values

I'm building a PHP program that basically grabs only image links from my twitter feed and displays them on a page, I have 3 components that I have set up that all work fine on their own.
The first component is the twitter oauth component which grabs the tweet text and creates an array, this works fine by itself.
The second is a function that processes the tweets and only returns tweets that contain image links, this as well works fine.
The program breaks down during the third section when the links are processed and an image is displayed, I had no issues running this on its own and from my attempts to trouble shoot it appears that it breaks down at the $images(); array, as that array is empty.
I'm sure I've made a silly mistake but I've been trying to find this for over a day now and can't seem to fix it. Any help would be great! Thanks guys!
code:
<?php
if ($result['socialorigin']== "twitter"){
$twitterObj = new EpiTwitter($consumer_key, $consumer_secret);
$token = $twitterObj->getAccessToken();
$twitterObj->setToken($result['oauthtoken'], $result['oauthsecret']);
$tweets = $twitterObj->get('/statuses/home_timeline.json',array('count'=>'200'));
$all_tweets = array();
$hosts = "lockerz|yfrog|twitpic|tumblr|mypict|ow.ly|instagr";
foreach($tweets as $tweet) {
$twtext = $tweet->text;
if(preg_match("~http://($hosts)~", $twtext)){
preg_match_all("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $twtext, $matches, PREG_PATTERN_ORDER);
foreach($matches[0] as $key2 => $link){
array_push($all_tweets,"$link");
}
}
}
function height_compare($a1, $b1)
{
if ($a1 == $b1) {
return 0;
}
return ($a1 > $b1) ? -1 : 1;
}
foreach($all_tweets as $alltweet => $tlink){
$doc = new DOMDocument();
// Okay this is HTML is kind of screwy
// So we're going to supress errors
#$doc->loadHTMLFile($tlink);
// Get all images
$images_list = $doc->getElementsByTagName('img');
$images = array();
foreach($images_list as $image) {
// Get the src attribute
$image_source = $image->getAttribute('src');
if (substr($image_source,0,7)=="http://"){
$image_size_info = getimagesize($image_source);
$images[$image_source] = $image_size_info[1];
}
}
// Do a numeric sort on the height
uasort($images, "height_compare");
$tallest_image = array_slice($images, 0,1);
$mainimg = key($tallest_image);
echo "<img src='$mainimg' />";
}
print_r($all_tweets);
print_r($images);
}
Change the for loop where you fetch the actual images to move the images array OUTSIDE the for loop. This will prevent the loop from clearing it each time through.
$images = array();
foreach($all_tweets as $alltweet => $tlink){
$doc = new DOMDocument();
// Okay this is HTML is kind of screwy
// So we're going to supress errors
#$doc->loadHTMLFile($tlink);
// Get all images
$images_list = $doc->getElementsByTagName('img');
foreach($images_list as $image) {
// Get the src attribute
$image_source = $image->getAttribute('src');
if (substr($image_source,0,7)=="http://"){
$image_size_info = getimagesize($image_source);
$images[$image_source] = $image_size_info[1];
}
}
// Do a numeric sort on the height
uasort($images, "height_compare");
$tallest_image = array_slice($images, 0,1);
$mainimg = key($tallest_image);
echo "<img src='$mainimg' />";
}

Retrieving single node value from a nodelist

I'm having difficulty extracting a single node value from a nodelist.
My code takes an xml file which holds several fields, some containing text, file paths and full image names with extensions.
I run an expath query over it, looking for the node item with a certain id. It then stores the matched node item and saves it as $oldnode
Now my problem is trying to extract a value from that $oldnode. I have tried to var_dump($oldnode) and print_r($oldnode) but it returns the following: "object(DOMElement)#8 (0) { } "
Im guessing the $oldnode variable is an object, but how do I access it?
I am able to echo out the whole node list by using: echo $oldnode->nodeValue;
This displays all the nodes in the list.
Here is the code which handles the xml file. line 6 is the line in question...
$xpathexp = "//item[#id=". $updateID ."]";
$xpath = new DOMXpath($xml);
$nodelist = $xpath->query($xpathexp);
if((is_null($nodelist)) || (! is_numeric($nodelist))) {
$oldnode = $nodelist->item(0);
echo $oldnode->nodeValue;
//$imgUpload = strchr($oldnode->nodeValue, ' ');
//$imgUpload = strrchr($imgUpload, '/');
//explode('/',$imgUpload);
//$imgUpload = trim($imgUpload);
$newItem = new DomDocument;
$item_node = $newItem ->createElement('item');
//Create attribute on the node as well
$item_node ->setAttribute("id", $updateID);
$largeImageText = $newItem->createElement('largeImgText');
$largeImageText->appendChild( $newItem->createCDATASection($largeImgText));
$item_node->appendChild($largeImageText);
$urlANode = $newItem->createElement('urlA');
$urlANode->appendChild( $newItem->createCDATASection($urlA));
$item_node->appendChild($urlANode);
$largeImg = $newItem->createElement('largeImg');
$largeImg->appendChild( $newItem->createCDATASection($imgUpload));
$item_node->appendChild($largeImg);
$thumbnailTextNode = $newItem->createElement('thumbnailText');
$thumbnailTextNode->appendChild( $newItem->createCDATASection($thumbnailText));
$item_node->appendChild($thumbnailTextNode);
$urlB = $newItem->createElement('urlB');
$urlB->appendChild( $newItem->createCDATASection($urlA));
$item_node->appendChild($urlB);
$thumbnailImg = $newItem->createElement('thumbnailImg');
$thumbnailImg->appendChild( $newItem->createCDATASection(basename($_FILES['thumbnailImg']['name'])));
$item_node->appendChild($thumbnailImg);
$newItem->appendChild($item_node);
$newnode = $xml->importNode($newItem->documentElement, true);
// Replace
$oldnode->parentNode->replaceChild($newnode, $oldnode);
// Display
$xml->save($xmlFileData);
//header('Location: index.php?a=112&id=5');
Any help would be great.
Thanks
Wasn't it supposed to be echo $oldnode->firstChild->nodeValue;? I remember this because technically you need the value from the text node.. but I might be mistaken, it's been a while. You could give it a try?
After our discussion in the comments on this answer, I came up with this solution. I'm not sure if it can be done cleaner, perhaps. But it should work.
$nodelist = $xpath->query($xpathexp);
if((is_null($nodelist)) || (! is_numeric($nodelist))) {
$oldnode = $nodelist->item(0);
$largeImg = null;
$thumbnailImg = null;
foreach( $oldnode->childNodes as $node ) {
if( $node->nodeName == "largeImg" ) {
$largeImg = $node->nodeValue;
} else if( $node->nodeName == "thumbnailImg" ) {
$thumbnailImg = $node->nodeValue;
}
}
var_dump($largeImg);
var_dump($thumbnailImg);
}
You could also use getElementsByTagName on the $oldnode, then see if it found anything (and if a node was found, $oldnode->getElementsByTagName("thumbnailImg")->item(0)->nodeValue). Which might be cleaner then looping through them.

Categories