Parsing MRSS with PHP - Child elements - php

I know child elements have been discussed a lot, but I've gone through the helpful answers to related questions and can't seem to get it working (new to coding, so bear with me).
Here's what I'm working with:
rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:bc="http://www.brightcove.tv/link" xmlns:dcterms="http://purl.org/dc/terms/" version="2.0">
<channel>
<title>Search Videos By Criteria</title>
<link>...</link>
<description/>
<copyright>Copyright 2014</copyright>
<lastBuildDate>Thu, 25 Sep 2014 13:29:49 -0700</lastBuildDate>
<generator>http://www.brightcove.com/?v=1.0</generator>
<item>
<title>5 best guards in Lakers history</title>
<link/>
<description>...</description>
<guid>video3805826070001</guid>
<pubDate>Thu, 25 Sep 2014 05:11:39 -0700</pubDate>
<media:content duration="121" medium="video" type="video/mp4" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805837947001_5-BEST-GUARDS-IN-LAKERS--HISTORY-final.mp4?videoId=3805826070001"/>
<media:group>...</media:group>
<media:keywords>jerry west,derek fisher,Gail Goodrich,losangeleslakers,SMGV,USA Today Sports,Kobe Bryant,video big board,sports,basketball,lakers,magic johnson,nba
</media:keywords>
<media:thumbnail height="90" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805822421001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="120"/>
<media:thumbnail height="360" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805709286001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="480"/>
<bc:titleid>3805826070001</bc:titleid>
<bc:duration>121</bc:duration>
<dcterms:valid/>
<bc:accountid>44854217001</bc:accountid>
</item>
I'm using the following SimpleXML_Parser script to pull most of the info out that I need:
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;/* The code below starting with $html is where you setup how the parsed data will look on the webpage */
$html .= "<div><h3>$title</h3><br/>$description<p><br/>$pubDate<p><br/>$link<p><br/>$titleid<p><br/></div><iframe width='580' height='360' src='http://link.brightcove.com/services/player/bcpid3742068445001?bckey=/*deleted API key&bctid=$titleid' frameborder='0'></iframe><hr/>";}
echo $html;/* tutorial for this script is here https://www.youtube.com/watch?v=4ZLZkdiKGE0 */?>
What I need to be able to parse out of the feed is the string of number assigned to "titleid"
I have tried adding in variations on approaches for pulling out child elements, such as:
$titleid = $xml->children(‘media’, true)->div->children(‘bc’, true)->div[$i]->titled;
But not having any luck. I'm sure it's something obvious to a seasoned developer, but again, I'm a newbie.
Any suggestions?
Thanks for any help!

To parse MRSS properly you need first to put the getNamespaces to true.
Then select the namespace $xml->channel->item[$i]->children($namespaces['bc']) finaly you can extract the wanted value from it in your case id
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
$namespaces = $xml->getNamespaces(true); // get namespaces
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;
$titleid = $xml->channel->item[$i]->children($namespaces['bc'])->titleid;
echo $title_group .'<br>';
}

Related

Parse media:content from RSS feed with PHP

I am trying to parse media:content from RSS with PHP and then show it using HTML.
I went through numerous posts on the same topic but since i'm a beginner I couldn't figure it out because the codes were different from mine.
Currently I dont’ have any line that is trying to get the image from xml.
<?php
$html = "";
$url = "url.rss";
$xml = simplexml_load_file($url);
$namespaces = $xml->getNamespaces(true);
for($i = 0; $i < 50; $i++){
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$author = $xml->channel->item[$i]->author;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<p>$pubDate</p>";
$html .= "<p>$author</p><hr>";
}
echo $html;
?>
This is the info I need from the XML file:
<media:content url="www.image.jpg" medium="image" type="image/jpeg" width="850" height="425" />
Thanks!
I expect the PHP file to show the media file.
Can you please give us more info?
What is the value of '$xml' after you run simplexml_load_file? (did you get the correct data?)
What error message did you get?
EDIT - according to your comment
try using
$xml->channel->item[$i]->children('media', true)->content->attributes();
The 'media' inside children is the namespace for the 'content' element.
While the boolean 'true' variable tells the parser to refer the 'media' as a namespace

Trouble creating a valid RSS feed in PHP

I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";
Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();
The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.

Wordpress Feeds RSS XML - PHP media

I have this code but I don't know how I get the media
Please someone could help me?
I need show the image of post on my website
Below my PHP and XML for Wordpress Feeds
PHP
$html = "";
$url = "https://intercambioemgalway.com.br/feed/";
$xml = simplexml_load_file( $url, 'SimpleXMLElement', LIBXML_NOCDATA );
for ($i = 0; $i < 5; $i++){
$img = $xml->channel->item->media;
$html .= "$img";
}
echo $html;
XML
<item>
<title>Qual escola vou estudar?</title>
<link>https://intercambioemgalway.com.br/2017/09/11/qual-escola-estudar/</link>
<comments>https://intercambioemgalway.com.br/2017/09/11/qual-escola-estudar/#respond</comments>
<pubDate>Mon, 11 Sep 2017 17:36:59 +0000</pubDate>
<dc:creator><![CDATA[admin]]></dc:creator>
<category><![CDATA[Meu intercâmbio]]></category>
<guid isPermaLink="false">https://intercambioemgalway.com.br/?p=179</guid>
<description><![CDATA[<p>Hoje vou contar um pouquinho da escola que vou estudar em Galway, a Atlantic Language.</p>]]></description>
<wfw:commentRss>https://intercambioemgalway.com.br/2017/09/11/qual-escola-estudar/feed/</wfw:commentRss>
<slash:comments>0</slash:comments>
<media:content url="https://intercambioemgalway.com.br/wp-content/uploads/2017/09/Untitled-1.jpg" type="image/jpeg" medium="image" width="900" height="600">
<media:title type="plain">
<![CDATA[atlantic-language-galway]]>
</media:title>
<media:thumbnail url="https://intercambioemgalway.com.br/wp-content/uploads/2017/09/Untitled-1-150x150.jpg" width="150" height="150" />
<media:description type="plain">
<![CDATA[]]>
</media:description>
<media:copyright>
admin
</media:copyright>
</media:content>
Each $item is of type SimpleXMLElement.
Now you can loop the children using the namespace http://search.yahoo.com/mrss/ and then you can access the attributes.
For example:
foreach($xml->channel->item as $item) {
foreach($item->children("http://search.yahoo.com/mrss/") as $media) {
$img = (string)$media->attributes()->url;
}
}

How to get first image from a tumlbr rss feed in PHP

0Here is the relevant part of my rss feed:
<channel>
<description></description>
<title>Untitled</title>
<generator>Tumblr (3.0; #xxx)</generator>
<link>http://xxx.tumblr.com/</link>
<item>
<title>Title</title>
<description><figure><img src="https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg"/></figure></description>
<link>http://xxx.tumblr.com/post/99569244093</link>
<guid>http://xxx.tumblr.com/post/99569244093</guid>
<pubDate>Thu, 09 Oct 2014 11:19:33 -0400</pubDate>
</item>
</channel>
Using the answer from other questions on here I tried this:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$imgs = $feed->channel->item[0]->description->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
};
This is returning an empty array for $imgs
Does it have something to do with the tags being < > etc?
and if so what can I do?
You can get it from the description, which seems to include a HTML image tag for the image, by using a simple regular expression with preg_match:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$img = (string)$feed->channel->item[0]->description;
if (preg_match('/src="(.*?)"/', $img, $matches)) {
$src = $matches[1];
echo "src = $src", PHP_EOL;
}
Output:
src = http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg
Before you can use xapth() on the description, you need to create a new XML document out of it:
$url = "http://xxx.tumblr.com/rss";
$desc = simplexml_load_file($url)->xpath('//item/description[1]')[0];
$src = simplexml_load_string("<x>$desc</x>")->xpath('//img/#src')[0];
echo $src;
Output:
http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg
I'm not sure if you can use this approach - as already mentioned by kjhughes as comment, your input XML does not contain any img element. But it's possible to retrieve the image source using XPath substring-functions:
substring-before(substring-after(substring-after(//item/description[contains(.,'img')],
'src='),'"'),'"')
Result:
https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg

PHP parsing a georss namespace with simpleXML

Trying to parse out lat/lon from a google maps rss feed:
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
echo $loc[0]->title;
echo $loc[0]->point;
The title shows up alright, but point gives me nothing. Each node looks like this:
<item>
<guid isPermaLink="false">0004740950fd067393eb4</guid>
<pubDate>Sun, 20 Sep 2009 21:47:49 +0000</pubDate>
<title>Big Wong King Restaurant</title>
<description><![CDATA[<div dir="ltr">$4.99 full meals!</div>]]></description>
<author>neufuture</author>
<georss:point>
40.716236 -73.998413
</georss:point>
<georss:elev>0.000000</georss:elev>
</item>
<?php
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
foreach ($loc->children('http://www.georss.org/georss') as $geo) {
echo $geo;
}
?>

Categories