Laravel: Parsing XML with SimpleXML namespace issue [duplicate] - php
This question has two parts.
Part 1. Yesterday I had some code which would echo the entire content of the XML from an RSS feed. Then I deleted it from my php document, saved over it, and I am totally kicking myself.
I believe the syntax went something like this:
$xml = simplexml_load_file($url);
echo $xml;
I tried that again and it is not working, so apparently I forgot the correct syntax and could use your help, dear stackoverflow question answerers.
I keep trying to figure out what I was doing and I am unable to find an example on Google or the PHP site. I tried the print_r($url); command, and it gives me what appears to be an atomized version of the feed. I want the whole string, warts and all. I realize that I could just type the RSS link into the window and see it, but it was helpful to have it on my PHP page as I am coding and noding.
Part 2 The main reason I wanted to reconstruct this is because I am trying to parse nodes off a blog RSS in order to display it on a webpage hosted on a private domain. I posted a dummy blog and discovered an awkward formatting glitch when I failed to add a title to one of the dummy posts.
So what does one do in this situation? I tried a little:
if(entry->title == "")
{$entryTitle = "untitled";}
That did not work at all.
Here's my entire php script for the handling of the blog:
<?php
/*create variables*/
$subtitle ="";
$entryTitle="";
$html = "";
$pubDate ="";
/*Store RSS feed address in new variable*/
$url = "http://www.blogger.com/feeds/6552111825067891333/posts/default";
/*Retrieve BLOG XML and store it in PHP object*/
$xml = simplexml_load_file($url);
print_r($xml);
/*Parse blog subtitle into HTML and echo it on the page*/
$subtitle .= "<h2 class='blog'>" . $xml->subtitle . "</h2><br />";
echo $subtitle;
/*Go through all the entries and parse them into HTML*/
foreach($xml->entry as $entry){
/*retrieve publication date*/
$xmlDate = $entry->published;
/*Convert XML timestamp into PHP timestamp*/
$phpDate = new DateTime(substr($xmlDate,0,19));
/*Format PHP timestamp to something humans understand*/
$pubDate .= $phpDate->format('l\, F j\, Y h:i A');
if ($entry->title == "")
{
$entryTitle .= "Untitled";
}
echo $entry->title;
/*Pick through each entry and parse each XML tree node into an HTML ready blog post*/
$html .= "<h3 class='blog'>".$entry->title . "<span class='pubDate'> | " .$pubDate . "</span></h3><p class='blog'>" . $entry->content . "</p>";
/*Print the HTML to the web page*/
echo $html;
/*Set the variables back to empty strings so they do not repeat data upon reiteration*/
$html = "";
$pubDate = "";
}
?>
According to the php manual:
$xml = new SimpleXMLElement($string);
.
.
.
then if you want to echo the result:
echo $xml->asXML();
or save the xml to a file:
$xml->asXML('blog.xml');
References
http://php.net/manual/fr/simplexmlelement.asxml.php
http://spotlesswebdesign.com/blog.php?id=14
Part 1
This is still not exactly what I wanted, but rather a very tidy and organized way of echoing the xml data:
$url = "http://www.blogger.com/feeds/6552111825067891333/posts/default";
$xml = simplexml_load_file($url);
echo '<pre>';
print_r($xml);
Part 2
I had to get firephp running so I could see exactly what elements php was encountering when it reached an entry without a blog title. Ultimately it is an empty array. Therefore, the simple:
if(empty($entry->title))
works perfectly. For string comparison, I found that you can simply cast it as a string. For my purposes, that was unnecessary.
The simplexml_load_file returns an SimpleXMLElement, so:
print_r($xml);
will show its minor objects and arrays.
After your tweaks you can call $xml->asXML("filename.xml"); as #Tim Withers pointed out.
Part 1: echo $xml->asXML(); - http://www.php.net/manual/en/simplexmlelement.asxml.php
Part 2: php SimpleXML check if a child exists
$html .= "<h3 class='blog'>".($entry->title!=null?$entry->title:'No Title')
. "<span class='pubDate'> | " .$pubDate . "</span></h3><p class='blog'>"
. $entry->content . "</p>";
Note I would probably load the url like this:
$feedUrl = 'http://www.blogger.com/feeds/6552111825067891333/posts/default';
$rawFeed = file_get_contents($feedUrl);
$xml = new SimpleXmlElement($rawFeed);
Based on your comment in regards to part 1, I am not sure if the XML is being loaded completely. If you try loading it this way, it should display all the XML data.
Related
Retrieving Barometric and Other Climate Data Using simple_html_dom.php
I want to periodically (once a day or so) collect the barometric pressure reading for various USA weather stations. Using simple_html_dom.php I can scrape the entire page of this site, for example (https://www.localconditions.com/weather-alliance-nebraska/69301/). However, I don't know how to then parse this down to just the barometric pressure reading: in this case "30.26". Here's the code that grabs all the html. Obviously the find('Barometer') element isn't working. <?php // example of how to use basic selector to retrieve HTML contents include('simple_html_dom.php'); // get DOM from URL or file $html = file_get_html('https://www.localconditions.com/weather-alliance-nebraska/69301/'); // find all span tags with class=gb1 foreach($html->find('strong') as $e) echo $e->outertext . '<HR>'; // get an element representing the second paragraph $element = $html->find("Barometer"); echo $e->outertext . '<br>'; // extract text from HTML echo $html->plaintext; ?> Any advise on how to parse this? Thanks!
As mentioned by #bato3 in his comment, queries like this are far better handled with xpath. Unfortunately, neither DOMDocument nor simplexml (which I usually use to parse xml/html) could digest the html of this site (at least not when I tried). So we have to do it with simple_html_dom and resort to (somewhat inelegant) CSS selectors and string manipulation: $dest = $html->find("//div[class='col-sm-6 col-md-6'] > p:has(> strong)"); foreach($dest as $e) { $target = $e->innertext; if (strpos($target, "Barometer")!== false){ $pressure = explode(" ", $target); echo $pressure[2]; } } Output: 30.25 inHg.
PHP, Convert iTunes RSS to JSON
I am trying to utilize simplexml to convert an iTunes RSS Feed to JSON so I can better parse it. The issue I am having is that it is not coming back as correctly formatted JSON. $feed_url = 'https://podcasts.subsplash.com/c2yjpyh/podcast.rss'; $feed_contents = file_get_contents($feed_url); $xml = simplexml_load_string($feed_contents); $podcasts = json_decode(json_encode($xml)); print_r($podcasts); Is there a better way to be attempting this to get the correct result?
Thanks to IMSoP for pointing me in the right direction! This took a bit of studying but the solution ends up being very simple! Instead of trying to convert to a JSON format, just use SimpleXML. However, due to the namespaces, it does require an additional line to map the itunes: prefix. So in my iTunes feed rss, the following line exists: xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd So we just reference this to make accessing the values very easy. Here is a quick example: $rss = simplexml_load_file('https://podcasts.example.com/podcast.rss'); foreach ($rss->channel->item as $item){ // Now we define the map for the itunes: namespace $itunes = $item->children('http://www.itunes.com/dtds/podcast-1.0.dtd'); // This is a value WITHOUT the itunes: namespace $title = $item->title; // This is a value WITH the itunes: namespace $author = $itunes->author; echo $title . '<br>'; echo $author . '<br>'; } The other little issue that I ran into is getting attributes such as the url for images and audio links. That is accomplished by using the attributes() function like so: // Access attributes WITH itunes: namespace $image = $itunes->image->attributes(); // Access attributes WITHOUT itunes: namespace $audio = $item->enclosure->attributes(); // To echo these we simple add the desired attribute in `[]`: echo $image['href'] . '<br>'; echo $audio['url'] . '<br>';
getting xml from URL into variable
I am trying to get an xml feed from a url http://api.eve-central.com/api/marketstat?typeid=1230®ionlimit=10000002 but seem to be failing miserably. I have tried new SimpleXMLElement, file_get_contents and http_get Yet none of these seem to echo a nice XML feed when I either echo or print_r. The end goal is to eventually parse this data but getting it into a variable would sure be a nice start. I have attached my code below. This is contained within a loop and $typeID does in fact give the correct ID as seen above $url = 'http://api.eve-central.com/api/marketstat?typeid='.$typeID.'®ionlimit=10000002'; echo $url."<br />"; $xml = new SimpleXMLElement($url); print_r($xml); I should state that the other strange thing I am seeing is that when I echo $url, i get http://api.eve-central.com/api/marketstat?typeid=1230®ionlimit=10000002 the ® is the registered trademark symbol. I am unsure if this is "feature" in my browser, or a "feature" in my code
Try the following: <?php $typeID = 1230; // set feed URL $url = 'http://api.eve-central.com/api/marketstat?typeid='.$typeID.'®ionlimit=10000002'; echo $url."<br />"; // read feed into SimpleXML object $sxml = simplexml_load_file($url); // then you can do var_dump($sxml); // And now you'll be able to call `$sxml->marketstat->type->buy->volume` as well as other properties. echo $sxml->marketstat->type->buy->volume; // And if you want to fetch multiple IDs: foreach($sxml->marketstat->type as $type){ echo $type->buy->volume . "<br>"; } ?>
You need to fetch the data from the URL in order to make an XML object. $url = 'http://api.eve-central.com/api/marketstat?typeid='.$typeID.'®ionlimit=10000002'; $xml = new SimpleXMLElement(file_get_contents($url)); // pre tags to format nicely echo '<pre>'; print_r($xml); echo '</pre>';
Parsing XML with PHP (simplexml)
Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this. The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?! $file = fopen("compress.zlib://$url", 'r'); $xmlstr = file_get_contents($url); $xml = new SimpleXMLElement($url,null,true); foreach($xml as $name) { echo "{$name->awCat}\r\n"; } Many, many thanks in advance, Chris PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer: First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all) $merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE); To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site): foreach ($merchantProductFeed->merchant->prod as $prod) { echo $prod->cat->awCat , PHP_EOL; } or you can use an XPath query to get at the wanted elements directly $xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE); foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) { echo $awCat, PHP_EOL; } Live Demo Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them. Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML. For additional examples see A simple program to CRUD node and node values of xml file and PHP Manual - SimpleXml Basic Examples
Try this... $url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/"; $zd = gzopen($url, "r"); $data = gzread($zd, 1000000); gzclose($zd); if ($data !== false) { $xml = simplexml_load_string($data); foreach ($xml->merchant->prod as $pr) { echo $pr->cat->awCat . "<br>"; } }
<?php $xmlstr = file_get_contents("compress.zlib://$url"); $xml = simplexml_load_string($xmlstr); // you can transverse the xml tree however you want foreach ($xml->merchant->prod as $line) { // $line->cat->awCat -> you can use this } more information here
Use print_r($xml) to see the structure of the parsed XML feed. Then it becomes obvious how you would traverse it: foreach ($xml->merchant->prod as $prod) { print $prod->pId; print $prod->text->name; print $prod->cat->awCat; # <-- which is what you wanted print $prod->price->buynow; }
$url = 'you url here'; $f = gzopen ($url, 'r'); $xml = new SimpleXMLElement (fread ($f, 1000000)); foreach($xml->xpath ('//prod') as $name) { echo (string) $name->cat->awCatId, "\r\n"; }
Retrieve XML from a third party page in PHP
I need read in and parse data from a third party website which sends XML data. All of this needs to be done server side. What is the best way to do this using PHP?
You can obtain the remote XML data with, e.g. $xmldata = file_get_contents("http://www.example.com/xmldata"); or with curl. Then use SimpleXML, DOM, whatever.
A good way of parsing XML is often to use XPP (XML Pull Parsing) librairy, PHP has an implementation of it, it's called XMLReader. http://php.net/manual/en/class.xmlreader.php
I would suggest you to use DOMDocument (PHP inline built class) A simple example of its power could be the following code: /*********************************************************************************************** Takes the RSS news feeds found at $url and prints them as HTML code. Each news is rendered in a <div class="rss"> block in the order: date + title + description. ***********************************************************************************************/ function Render($url, $max_feeds = 1000) { $doc = new DOMDocument(); if(#$doc->load($url, LIBXML_NOCDATA|LIBXML_NOBLANKS)) { $feed_count = 0; $items = $doc->getElementsByTagName("item"); //echo $items->length; //DEBUG foreach($items as $item) { if($feed_count > $max_feeds) break; //Unfortunately inside <item> node elements are not always in same order, therefor we have to call many times getElementsByTagName //WARNING: using iconv function instead of utf8_decode because this last one did not convert properly some characters like apostrophe 0x19 from techsport.it feeds. $title = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("title")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT" $description = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("description")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT" $link = iconv('UTF-8', 'CP1252', $item->getElementsByTagName("link")->item(0)->firstChild->textContent); //can use "CP1252//TRANSLIT" //pubDate tag is not mandatory in RSS [RSS2 spec: http://cyber.law.harvard.edu/rss/rss.html] $pub_date = $item->getElementsByTagName("pubDate"); $date_html = ""; //play with date here if you want echo "<div class='rss'>\n<p class='title'><a href='" . $link . "'>" . $title . "</a></p>\n<p class='description'>" . $description . "</p>\n</div>\n\n"; $feed_count++; } } else echo "<div class='rss'>Service not available.</div>"; }
I have been using simpleXML for a while.