PHP parsing a georss namespace with simpleXML

PHP parsing a georss namespace with simpleXML - php

Trying to parse out lat/lon from a google maps rss feed:
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
echo $loc[0]->title;
echo $loc[0]->point;
The title shows up alright, but point gives me nothing. Each node looks like this:
<item>
<guid isPermaLink="false">0004740950fd067393eb4</guid>
<pubDate>Sun, 20 Sep 2009 21:47:49 +0000</pubDate>
<title>Big Wong King Restaurant</title>
<description><![CDATA[<div dir="ltr">$4.99 full meals!</div>]]></description>
<author>neufuture</author>
<georss:point>
40.716236 -73.998413
</georss:point>
<georss:elev>0.000000</georss:elev>
</item>

<?php
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
foreach ($loc->children('http://www.georss.org/georss') as $geo) {
echo $geo;
}
?>

Related

How do I get the child nodes of this RSS feed?

How can I get the contest logo and start date from this RSS feed? I can get the dc:modified child for example but always get a blank for anything from dc:dataset.
My code:
$feed_url = 'https://www.website.com/?call_custom_simple_rss=1&csrp_post_type=contest&csrp_posts_per_page=2&csrp_show_meta=1';
$feed = file_get_contents($feed_url);
$rss = simplexml_load_string($feed);
foreach($rss->channel->item as $entry) {
echo $entry->children("dc", true)->modified . "<br>";
echo $entry->children("dc", true)->dataset->contest_logo . "<br>";
echo $entry->children("dc", true)->dataset->start_date . "<br>";
}
The RSS feed:
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:wp="http://wordpress.org/export/1.2/" xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" version="2.0">
<channel>
<title>RSS Title</title>
<description>A website</description>
<lastBuildDate>Wed, 17 Feb 2021 15:03:03 +0000</lastBuildDate>
<item>
<title>
<![CDATA[ Photography Awards ]]>
</title>
<link>
<![CDATA[ /contests/photography-awards/ ]]>
</link>
<pubDate>Mon, 11 Jan 2021 13:52:27 -0600</pubDate>
<dc:identifier>619116</dc:identifier>
<dc:modified>2021-02-09 07:50:10</dc:modified>
<dc:created unix="1610373147">2021-01-11 13:52:27</dc:created>
<dc:dataset>
<contest_logo>
<![CDATA[ 619130 ]]>
</contest_logo>
<start_date>
<![CDATA[ 20210110 ]]>
</start_date>
</dc:dataset>
</item>
</channel>
</rss>

The contest_logo and start_date are in the empty namespace. You have to switch back. Additionally it is not good to reply on namespace prefixes defined in the document. Use the namespace URI (for example defined as mapping array in your code).
$rss = simplexml_load_string($feed);
$xmlns = [
'dc' => 'http://purl.org/dc/elements/1.1/'
];
foreach($rss->channel->item as $entry) {
echo $entry->children($xmlns['dc'])->modified . "<br>";
echo $entry->children($xmlns['dc'])->dataset->children('')->contest_logo . "<br>";
echo $entry->children($xmlns['dc'])->dataset->children('')->start_date . "<br>";
}
Output:
2021-02-09 07:50:10<br>
619130
<br>
20210110
<br>
In DOM you would register an alias on the Xpath processor and use it in the expressions. Here is a demo:
$document = new DOMDocument();
$document->loadXML($feed);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');
foreach ($xpath->evaluate('/rss/channel/item') as $entry) {
echo $xpath->evaluate('string(dc:modified)', $entry). "<br>";
echo $xpath->evaluate('string(dc:dataset/contest_logo)', $entry). "<br>";
echo $xpath->evaluate('string(dc:dataset/start_date)', $entry). "<br>";
}

Another alternative - use xpath:
echo $rss->xpath('//dc:dataset/contest_logo')[0] . "\r\n";
echo $rss->xpath('//dc:modified')[0] . "\r\n";
echo $rss->xpath('//start_date')[0] . "\r\n";
Output:
619130
2021-02-09 07:50:10
20210110

How to get first image from a tumlbr rss feed in PHP

0Here is the relevant part of my rss feed:
<channel>
<description></description>
<title>Untitled</title>
<generator>Tumblr (3.0; #xxx)</generator>
<link>http://xxx.tumblr.com/</link>
<item>
<title>Title</title>
<description><figure><img src="https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg"/></figure></description>
<link>http://xxx.tumblr.com/post/99569244093</link>
<guid>http://xxx.tumblr.com/post/99569244093</guid>
<pubDate>Thu, 09 Oct 2014 11:19:33 -0400</pubDate>
</item>
</channel>
Using the answer from other questions on here I tried this:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$imgs = $feed->channel->item[0]->description->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
};
This is returning an empty array for $imgs
Does it have something to do with the tags being < > etc?
and if so what can I do?

You can get it from the description, which seems to include a HTML image tag for the image, by using a simple regular expression with preg_match:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$img = (string)$feed->channel->item[0]->description;
if (preg_match('/src="(.*?)"/', $img, $matches)) {
$src = $matches[1];
echo "src = $src", PHP_EOL;
}
Output:
src = http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

Before you can use xapth() on the description, you need to create a new XML document out of it:
$url = "http://xxx.tumblr.com/rss";
$desc = simplexml_load_file($url)->xpath('//item/description[1]')[0];
$src = simplexml_load_string("<x>$desc</x>")->xpath('//img/#src')[0];
echo $src;
Output:
http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

I'm not sure if you can use this approach - as already mentioned by kjhughes as comment, your input XML does not contain any img element. But it's possible to retrieve the image source using XPath substring-functions:
substring-before(substring-after(substring-after(//item/description[contains(.,'img')],
'src='),'"'),'"')
Result:
https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg

Parsing MRSS with PHP - Child elements

I know child elements have been discussed a lot, but I've gone through the helpful answers to related questions and can't seem to get it working (new to coding, so bear with me).
Here's what I'm working with:
rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:bc="http://www.brightcove.tv/link" xmlns:dcterms="http://purl.org/dc/terms/" version="2.0">
<channel>
<title>Search Videos By Criteria</title>
<link>...</link>
<description/>
<copyright>Copyright 2014</copyright>
<lastBuildDate>Thu, 25 Sep 2014 13:29:49 -0700</lastBuildDate>
<generator>http://www.brightcove.com/?v=1.0</generator>
<item>
<title>5 best guards in Lakers history</title>
<link/>
<description>...</description>
<guid>video3805826070001</guid>
<pubDate>Thu, 25 Sep 2014 05:11:39 -0700</pubDate>
<media:content duration="121" medium="video" type="video/mp4" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805837947001_5-BEST-GUARDS-IN-LAKERS--HISTORY-final.mp4?videoId=3805826070001"/>
<media:group>...</media:group>
<media:keywords>jerry west,derek fisher,Gail Goodrich,losangeleslakers,SMGV,USA Today Sports,Kobe Bryant,video big board,sports,basketball,lakers,magic johnson,nba
</media:keywords>
<media:thumbnail height="90" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805822421001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="120"/>
<media:thumbnail height="360" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805709286001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="480"/>
<bc:titleid>3805826070001</bc:titleid>
<bc:duration>121</bc:duration>
<dcterms:valid/>
<bc:accountid>44854217001</bc:accountid>
</item>
I'm using the following SimpleXML_Parser script to pull most of the info out that I need:
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;/* The code below starting with $html is where you setup how the parsed data will look on the webpage */
$html .= "<div><h3>$title</h3><br/>$description<p><br/>$pubDate<p><br/>$link<p><br/>$titleid<p><br/></div><iframe width='580' height='360' src='http://link.brightcove.com/services/player/bcpid3742068445001?bckey=/*deleted API key&bctid=$titleid' frameborder='0'></iframe><hr/>";}
echo $html;/* tutorial for this script is here https://www.youtube.com/watch?v=4ZLZkdiKGE0 */?>
What I need to be able to parse out of the feed is the string of number assigned to "titleid"
I have tried adding in variations on approaches for pulling out child elements, such as:
$titleid = $xml->children(‘media’, true)->div->children(‘bc’, true)->div[$i]->titled;
But not having any luck. I'm sure it's something obvious to a seasoned developer, but again, I'm a newbie.
Any suggestions?
Thanks for any help!

To parse MRSS properly you need first to put the getNamespaces to true.
Then select the namespace $xml->channel->item[$i]->children($namespaces['bc']) finaly you can extract the wanted value from it in your case id
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
$namespaces = $xml->getNamespaces(true); // get namespaces
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;
$titleid = $xml->channel->item[$i]->children($namespaces['bc'])->titleid;
echo $title_group .'<br>';
}

Displaying XML feed contents

Using simpleXML, how do you display the contents of a feed. I mean actually display the XML on a page so that I can see the schema?

In my example i will use XML file like:
<?xml version="1.0" encoding="UTF-8"?>
<module>
<name>Menus</name>
<folder>menus</folder>
<install>install.php</install>
</module>
You can use to load xml file in a variable:
$mod = simplexml_load_file($XMLfile);
and call fields:
echo $mod->module->name;
echo $mod->module->folder;
echo $mod->module->install;
Hope this helps.

$xml = simplexml_load_file("URL GOES HERE");
echo "<pre>";
print_r($xml);
echo "</pre>";
Will print a readable array of the object.
And a VALID RSS feed has this schema:
foreach($xml->channel->item as $items){
$title = $items->title;
$link = $items->link;
$description = $items->description;
$pubDate = strtotime($items->pubDate);
$pubDate = date('Y-m-d H:i:s', $pubDate);
}
Along with a few other attributes.

PHP and XML parsing

I'm struggling to parse an XML file in PHP:
Here's the XML
<rss xmlns:ac="http://palm.com/app.catalog.rss.extensions" version="2.0">
<channel>
<title>Device App Updates for US</title>
<link>http://www.palm.com</link>
<description>Updates</description>
<language>en-US</language>
<pubDate>Mon, 04 Jan 2010 18:23:51 -0800</pubDate>
<lastBuildDate>Mon, 04 Jan 2010 18:23:51 -0800</lastBuildDate>
<ac:distributionChannel>Device</ac:distributionChannel>
<ac:countryCode>US</ac:countryCode>
<item>
<title><![CDATA[My App]]></title>
<link>http://developer.palm.com/appredirect/?packageid=com.palm.myapp</link>
<description><![CDATA[My fun app.]]></description>
<pubDate>2009-12-21 21:00:58</pubDate>
<guid>334.232</guid>
<ac:total_downloads>1234</ac:total_downloads>
<ac:total_comments>12</ac:total_comments>
<ac:country>US</ac:country>
</item>
</channel>
</rss>
My Problem is; when I use:
$strURL = "http://developer.palm.com/rss/D/appcatalog.update.rss.xml";
$ch = curl_init($strURL);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
print_r($doc);
I'm not able to display the values? The one I'm most interested in is <ac:total_downloads>1000</ac> but I don't seem to be able to parse it.
What am I doing wrong?
Many thanks

You don't need to use curl to retrieve that file, SimpleXML can fetch external resources.
Use children() to access namespaced nodes. Here's how to do it:
$rss = simplexml_load_file($strURL);
$ns = 'http://palm.com/app.catalog.rss.extensions';
foreach ($rss->channel->item as $item)
{
echo 'Title: ', $item->title, "\n";
echo 'Downloads: ', $item->children($ns)->total_downloads, "\n\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP parsing a georss namespace with simpleXML - php

<?php $file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe"; $xml = simplexml_load_file($file); $loc = $xml->channel->item; foreach ($loc->children('http://www.georss.org/georss') as $geo) { echo $geo; } ?>

Related

How do I get the child nodes of this RSS feed?

How to get first image from a tumlbr rss feed in PHP

Parsing MRSS with PHP - Child elements

Displaying XML feed contents

PHP and XML parsing

Categories

Resources