Parse media:content from RSS feed with PHP

Parse media:content from RSS feed with PHP - php

I am trying to parse media:content from RSS with PHP and then show it using HTML.
I went through numerous posts on the same topic but since i'm a beginner I couldn't figure it out because the codes were different from mine.
Currently I dont’ have any line that is trying to get the image from xml.
<?php
$html = "";
$url = "url.rss";
$xml = simplexml_load_file($url);
$namespaces = $xml->getNamespaces(true);
for($i = 0; $i < 50; $i++){
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$author = $xml->channel->item[$i]->author;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<p>$pubDate</p>";
$html .= "<p>$author</p><hr>";
}
echo $html;
?>
This is the info I need from the XML file:
<media:content url="www.image.jpg" medium="image" type="image/jpeg" width="850" height="425" />
Thanks!
I expect the PHP file to show the media file.

Can you please give us more info?
What is the value of '$xml' after you run simplexml_load_file? (did you get the correct data?)
What error message did you get?
EDIT - according to your comment
try using
$xml->channel->item[$i]->children('media', true)->content->attributes();
The 'media' inside children is the namespace for the 'content' element.
While the boolean 'true' variable tells the parser to refer the 'media' as a namespace

Related

Trouble creating a valid RSS feed in PHP

I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";

Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();

The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.

How to get first image from a tumlbr rss feed in PHP

0Here is the relevant part of my rss feed:
<channel>
<description></description>
<title>Untitled</title>
<generator>Tumblr (3.0; #xxx)</generator>
<link>http://xxx.tumblr.com/</link>
<item>
<title>Title</title>
<description><figure><img src="https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg"/></figure></description>
<link>http://xxx.tumblr.com/post/99569244093</link>
<guid>http://xxx.tumblr.com/post/99569244093</guid>
<pubDate>Thu, 09 Oct 2014 11:19:33 -0400</pubDate>
</item>
</channel>
Using the answer from other questions on here I tried this:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$imgs = $feed->channel->item[0]->description->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
};
This is returning an empty array for $imgs
Does it have something to do with the tags being < > etc?
and if so what can I do?

You can get it from the description, which seems to include a HTML image tag for the image, by using a simple regular expression with preg_match:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$img = (string)$feed->channel->item[0]->description;
if (preg_match('/src="(.*?)"/', $img, $matches)) {
$src = $matches[1];
echo "src = $src", PHP_EOL;
}
Output:
src = http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

Before you can use xapth() on the description, you need to create a new XML document out of it:
$url = "http://xxx.tumblr.com/rss";
$desc = simplexml_load_file($url)->xpath('//item/description[1]')[0];
$src = simplexml_load_string("<x>$desc</x>")->xpath('//img/#src')[0];
echo $src;
Output:
http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

I'm not sure if you can use this approach - as already mentioned by kjhughes as comment, your input XML does not contain any img element. But it's possible to retrieve the image source using XPath substring-functions:
substring-before(substring-after(substring-after(//item/description[contains(.,'img')],
'src='),'"'),'"')
Result:
https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg

Parsing MRSS with PHP - Child elements

I know child elements have been discussed a lot, but I've gone through the helpful answers to related questions and can't seem to get it working (new to coding, so bear with me).
Here's what I'm working with:
rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:bc="http://www.brightcove.tv/link" xmlns:dcterms="http://purl.org/dc/terms/" version="2.0">
<channel>
<title>Search Videos By Criteria</title>
<link>...</link>
<description/>
<copyright>Copyright 2014</copyright>
<lastBuildDate>Thu, 25 Sep 2014 13:29:49 -0700</lastBuildDate>
<generator>http://www.brightcove.com/?v=1.0</generator>
<item>
<title>5 best guards in Lakers history</title>
<link/>
<description>...</description>
<guid>video3805826070001</guid>
<pubDate>Thu, 25 Sep 2014 05:11:39 -0700</pubDate>
<media:content duration="121" medium="video" type="video/mp4" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805837947001_5-BEST-GUARDS-IN-LAKERS--HISTORY-final.mp4?videoId=3805826070001"/>
<media:group>...</media:group>
<media:keywords>jerry west,derek fisher,Gail Goodrich,losangeleslakers,SMGV,USA Today Sports,Kobe Bryant,video big board,sports,basketball,lakers,magic johnson,nba
</media:keywords>
<media:thumbnail height="90" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805822421001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="120"/>
<media:thumbnail height="360" url="http://videos.usatoday.net/Brightcove2/29906170001/2014/09/29906170001_3805709286001_Screen-Shot-2014-09-25-at-8-06-28-AM.jpg?pubId=29906170001" width="480"/>
<bc:titleid>3805826070001</bc:titleid>
<bc:duration>121</bc:duration>
<dcterms:valid/>
<bc:accountid>44854217001</bc:accountid>
</item>
I'm using the following SimpleXML_Parser script to pull most of the info out that I need:
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;/* The code below starting with $html is where you setup how the parsed data will look on the webpage */
$html .= "<div><h3>$title</h3><br/>$description<p><br/>$pubDate<p><br/>$link<p><br/>$titleid<p><br/></div><iframe width='580' height='360' src='http://link.brightcove.com/services/player/bcpid3742068445001?bckey=/*deleted API key&bctid=$titleid' frameborder='0'></iframe><hr/>";}
echo $html;/* tutorial for this script is here https://www.youtube.com/watch?v=4ZLZkdiKGE0 */?>
What I need to be able to parse out of the feed is the string of number assigned to "titleid"
I have tried adding in variations on approaches for pulling out child elements, such as:
$titleid = $xml->children(‘media’, true)->div->children(‘bc’, true)->div[$i]->titled;
But not having any luck. I'm sure it's something obvious to a seasoned developer, but again, I'm a newbie.
Any suggestions?
Thanks for any help!

To parse MRSS properly you need first to put the getNamespaces to true.
Then select the namespace $xml->channel->item[$i]->children($namespaces['bc']) finaly you can extract the wanted value from it in your case id
<?php
$html = "";
$url = "http://api.brightcove.com/services/library?command=search_videos&any=tag:NBA&output=mrss&media_delivery=http&sort_by=CREATION_DATE:DESC&token=NU-nMdtzfF8z9NNinlAgM4c9S-9BBfKpm6gFISdwyk-AnQ84efFBbQ..";
$xml = simplexml_load_file($url);
$namespaces = $xml->getNamespaces(true); // get namespaces
for($i = 0; $i < 80; $i++){
$title = $xml->channel->item[$i]->video;
$link = $xml->channel->item[$i]->link;
$title = $xml->channel->item[$i]->title;
$pubDate = $xml->channel->item[$i]->pubDate;
$description = $xml->channel->item[$i]->description;
$titleid = $xml->channel->item[$i]->children($namespaces['bc'])->titleid;
echo $title_group .'<br>';
}

Issues with API's XML PHP parsing

I am using an API however the way that they setup their returned XML is incorrect so I am needing to come up with a solution for parsing it. i am unable to convert to JSON (my preferred return method) because they don't support it. Below I have listed my XML and PHP.
XML Returned by API
<?xml version="1.0" encoding="utf-8"?>
<interface-response>
<Domain>example.com</Domain>
<Code>211</Code>
<Domain>example.net</Domain>
<Code>210</Code>
<Domain>example.org</Domain>
<Code>211</Code>
</interface-response>
Each Code is for the previous domain. I have no idea how to tie these two together and still be able to loop through all of the results returned. There will essentially be one Domain and one Code returned for each Top Level Domain, so a lot of results.
PHP code so far:
<?php
$xml = new SimpleXMLElement($data);
$html .= '<table>';
foreach($xml->children() as $children){
$html .= '<tr>';
$html .= '<td>'.$xml->Domain.'</td>';
if($xml->Code == 211){
$html .= '<td>This domain is not avaliable.</td>';
}elseif($xml->Code == 210){
$html .= '<td>This domain is avaliable.</td>';
}else{
$html .= '<td>I have no idea.</td>';
}
$html .= '<tr>';
}
$html .= '</table>';
echo $html;
?>

If you don't want to deal with crappy XML (I'm not saying XML is crappy in general, but this one is) you could consider something like this:
<?php
$responses = [];
$responses['210'] = 'This domain is avaliable.';
$responses['211'] = 'This domain is not avaliable.';
$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<interface-response>
<Domain>example.com</Domain>
<Code>211</Code>
<Domain>example.net</Domain>
<Code>210</Code>
<Domain>example.org</Domain>
<Code>211</Code>
</interface-response>
XML;
$data = (array) simplexml_load_string($xml);
$c = count($data['Domain']);
for($i = 0; $i < $c; $i++)
{
echo $data['Domain'][$i], PHP_EOL;
echo array_key_exists($data['Code'][$i], $responses) ? $responses[$data['Code'][$i]] : 'I have no idea', PHP_EOL;
}
Output
example.com
This domain is not avaliable.
example.net
This domain is avaliable.
example.org
This domain is not avaliable.

Parsing XML with multiple namespaces in PHP

I have XML in the following form that I want to parse with PHP (I can't change the format of the XML). Neither SimpleXML nor DOM seem to handle the different namespaces - can anyone give me sample code? The code below gives no results.
<atom:feed>
<atom:entry>
<atom:id />
<otherns:othervalue />
</atom:entry>
<atom:entry>
<atom:id />
<otherns:othervalue />
</atom:entry>
</atom:feed>
$doc = new DOMDocument();
$doc->load($url);
$entries = $doc->getElementsByTagName("atom:entry");
foreach($entries as $entry) {
$id = $entry->getElementsByTagName("atom:id");
echo $id;
$othervalue = $entry->getElementsByTagName("otherns:othervalue");
echo $othervalue;
}

I just want to post with an answer to this awful question. Sorry.
Namespaces are irrelavent with DOM - I just wasn't getting the nodeValue from the Element.
$doc = new DOMDocument();
$doc->load($url);
$feed = $doc->getElementsByTagName("entry");
foreach($feed as $entry) {
$id = $entry->getElementsByTagName("id")->item(0)->nodeValue;
echo $id;
$id = $entry->getElementsByTagName("othervalue")->item(0)->nodeValue;
echo $othervalue;
}

You need to register your name spaces. Otherwise simplexml will ignore them.
This bit of code I got from the PHP manual and I used in my own project
$xmlsimple = simplexml_load_string('YOUR XML');
$namespaces = $xmlsimple->getNamespaces(true);
$extensions = array_keys($namespaces);
foreach ($extensions as $extension )
{
$xmlsimple->registerXPathNamespace($extension,$namespaces[$extension]);
}
After that you use xpath on $xmlsimple

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parse media:content from RSS feed with PHP - php

Related

Trouble creating a valid RSS feed in PHP

How to get first image from a tumlbr rss feed in PHP

Parsing MRSS with PHP - Child elements

Issues with API's XML PHP parsing

Parsing XML with multiple namespaces in PHP

Categories

Resources