Trying to parse an iTunes Atom feed with a PHP script. If you visit the iTunes RSS Generator, you can generate an Atom feed like this:
http://itunes.apple.com/us/rss/topsongs/limit=10/genre=16/explicit=true/xml
which gives an iTunes RSS feed result like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns:im="http://itunes.apple.com/rss" xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<id>http://itunes.apple.com/us/rss/topsongs/limit=10/genre=16/explicit=true/xml</id><title>iTunes Store: Top Songs in Soundtrack</title><updated>2012-04-01T07:22:41-07:00</updated><link rel="alternate" type="text/html" href="http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewTop?id=17&popId=1"/><link rel="self" href="http://itunes.apple.com/us/rss/topsongs/limit=10/genre=16/explicit=true/xml"/><icon>http://phobos.apple.com/favicon.ico</icon><author><name>iTunes Store</name><uri>http://www.apple.com/itunes/</uri></author><rights>Copyright 2008 Apple Inc.</rights>
<entry>
<updated>2012-04-01T07:22:41-07:00</updated>
<id im:id="509605055">http://itunes.apple.com/us/album/eyes-open/id509605019?i=509605055&uo=2</id>
<title>Eyes Open - Taylor Swift</title>
<im:name>Eyes Open</im:name>
<link rel="alternate" type="text/html" href="http://itunes.apple.com/us/album/eyes-open/id509605019?i=509605055&uo=2"/>
<im:contentType term="Music" label="Music"><im:contentType term="Track" label="Track"/></im:contentType>
<category term="Soundtrack" scheme="http://itunes.apple.com/us/genre/music-soundtrack/id16?uo=2" label="Soundtrack"/>
<link title="Preview" rel="enclosure" type="audio/x-m4a" href="http://a2.mzstatic.com/us/r1000/116/Music/88/70/a6/mzi.gcauwkkw.aac.p.m4a" im:assetType="preview"><im:duration>30000</im:duration></link>
<im:artist href="http://itunes.apple.com/us/artist/taylor-swift/id159260351?uo=2">Taylor Swift</im:artist>
<im:price amount="1.29000" currency="USD">$1.29</im:price>
<im:image height="55">http://a3.mzstatic.com/us/r1000/069/Music/v4/15/59/19/15591949-a525-99e8-0c50-45697b0ec78b/UMG_cvrart_00602527969206_01_RGB72_1200x1200_12UMGIM10247.55x55-70.jpg</im:image>
<im:image height="60">http://a5.mzstatic.com/us/r1000/069/Music/v4/15/59/19/15591949-a525-99e8-0c50-45697b0ec78b/UMG_cvrart_00602527969206_01_RGB72_1200x1200_12UMGIM10247.60x60-50.jpg</im:image>
<im:image height="170">http://a3.mzstatic.com/us/r1000/069/Music/v4/15/59/19/15591949-a525-99e8-0c50-45697b0ec78b/UMG_cvrart_00602527969206_01_RGB72_1200x1200_12UMGIM10247.170x170-75.jpg</im:image>
<rights>2012 Universal Republic Records, a division of UMG Recordings, Inc.</rights>
<im:releaseDate label="March 20, 2012">2012-03-20T00:00:00-07:00</im:releaseDate>
<im:collection><im:name>The Hunger Games (Songs from District 12 and Beyond)</im:name><link rel="alternate" type="text/html" href="http://itunes.apple.com/us/album/hunger-games-songs-from-district/id509605019?uo=2"/><im:contentType term="Music" label="Music"><im:contentType term="Album" label="Album"/></im:contentType></im:collection>
(etc...)
With the PHP script, I'm able to get results for things like the title, id, im:image for each [entry] to use in the script output. What I need to get is the url from one of the link entries. Specially I need the url from the "Preview" link:
<link title="Preview" rel="enclosure" type="audio/x-m4a" href="http://a2.mzstatic.com/us/r1000/116/Music/88/70/a6/mzi.gcauwkkw.aac.p.m4a" im:assetType="preview"><im:duration>30000</im:duration></link>
In this case, we would need the a2.mzstatic.com/us/r1000/116/Music/88/70/a6/mzi.gcauwkkw.aac.p.m4a link for use in the script results for each of the 10 entries.
How do I capture that href for the .m4a audio file "Preview" link in the above Atom feed?
Here is a portion of the PHP script where we get the contents of the iTunes Atom url, cycle through the 10 results, and generate HTML for each entry via $rssresults that is called in a site template.
$string = file_get_contents('http://itunes.apple.com/us/rss/topsongs/limit=10/genre=16/explicit=true/xml');
// Remove the colon ":" in the <xxx:yyy> to be <xxxyyy>
$string = preg_replace("/(<\/?)(\w+):([^>]*>)/", "$1$2$3", $string);
if ($f = #fopen($cache_file, 'w')) {
fwrite ($f, $string, strlen($string));
fclose($f);
}
}
$xml = simplexml_load_string($string);
// Output
$rssresults = '';
$count = 1;
$max = 11;
foreach ($xml->entry as $val) {
if ($count < $max) {
$rssresults .= '
<img src="'.$val->imimage[2].'" alt="'.$val->title.'">
// .m4a preview url?
<div><a href=" ">Preview</div>
<div><strong>'.$count.'. '.$val->title.'</strong></div>
<div> from '.$val->imcollection->imname.'</div>;
}
$count++;
}
Any ideas on how to add the ".m4a preview url" to the above script for each entry?
Appreciate any help.
In your foreach loop try $val->link[1]["href"] would give you the URL
foreach ($xml->entry as $val) {
// echo the link of Preview
echo $val->link[1]["href"];
}
Explanation:
As there are multiple link entry you can access them by array index. So index 1 is used to access the second link entry. Each attribute of an Element can be accessed by its name as a key to the element. Hence $val->link[1]["href"] would give you http://a2.mzstatic.com/us/r1000/116/Music/88/70/a6/mzi.gcauwkkw.aac.p.m4a
Viper-7
Related
I'm using PHP to process XML information. How can I get from the XML youtube the video id?
Question I have:
> $vid['title'] = $video->title; $vid['date'] = $video->updated;
that works. Only I also want to be getting the video id
$vid['id'] = ?
I use this XML as example , off course I use the real feed.
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:yt="http://www.youtube.com/xml/schemas/2015" xmlns:media="http://search.yahoo.com/mrss/" xmlns="http://www.w3.org/2005/Atom">
<link rel="self" href="http://www.youtube.com/feeds/videos.xml?channel_id=UCCXoCcu9Rp7NPbTzIvogpZg&orderby=published"/>
<id>yt:channel:UCCXoCcu9Rp7NPbTzIvogpZg</id>
<yt:channelId>UCCXoCcu9Rp7NPbTzIvogpZg</yt:channelId>
<title>Fox Business</title>
<link rel="alternate" href="https://www.youtube.com/channel/UCCXoCcu9Rp7NPbTzIvogpZg"/>
<author>
<name>Fox Business</name>
<uri>https://www.youtube.com/channel/UCCXoCcu9Rp7NPbTzIvogpZg</uri>
</author>
<published>2008-02-04T12:35:54+00:00</published>
<entry>
<id>yt:video:yt9cwC3bySI</id>
<yt:videoId>yt9cwC3bySI</yt:videoId>
(1) For <yt:videoId> You can try:
$vid['id'] = $video->{'yt:videoId'}
(2) For <id> You can try:
$vid['title'] = $video->id;
Or else finally try...
Read not as XML but as a string of text. Use the PHP String functions to extract the text that exists between the <yt:videoId> and </yt:videoId>.
I am implementing Youtube push notification and implemented webhook. Youtube gives updates in the form of atom feed. My problem is i can't parse that feed.
This is the XML:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:yt="http://www.youtube.com/xml/schemas/2015">
<link rel="hub" href="https://pubsubhubbub.appspot.com" />
<link rel="self" href="https://www.youtube.com/xml/feeds/videos.xml?channel_id=UCaNoTnXcQQt3ody_cLZSihw" />
<title>YouTube video feed</title>
<updated>2018-03-01T07:21:59.144766801+00:00</updated>
<entry>
<id>yt:video:vNQyYJqFopE</id>
<yt:videoId>vNQyYJqFopE</yt:videoId>
<yt:channelId>UCaNoTnXcQQt3ody_cLZSihw</yt:channelId>
<title>Test Video 4</title>
<link rel="alternate" href="https://www.youtube.com/watch?v=vNQyYJqFopE" />
<author>
<name>Testing</name>
<uri>https://www.youtube.com/channel/UCaNoTnXcQQt3ody_cLZSihw</uri>
</author>
<published>2018-03-01T07:21:48+00:00</published>
<updated>2018-03-01T07:21:59.144766801+00:00</updated>
</entry>
<?php
$xml = '<?xml versio......';
$obj = simplexml_load_string($xml);
echo '<pre>';print_r($obj);echo '</pre>';
Screenshot
How to get the value of yt:videoId element. I am new to PHP, if I did anything wrong please correct me.
It seems the XML elements containing the yt namespace (e.g. <yt:videoId>) are not being parsed by simplexml_load_string. I don't know why but in your case the video id is also present in the <id> element you just need to extract the last value or simply cut of yt:video: in front of it. That is at least an easy workaround.
Also it works if you use a direct XPath to the <yt:videoId> element like this:
echo $obj->xpath('//yt:videoId')[0];
// output: vNQyYJqFopE
XPath always returns an array so you need to get the first element with [0].
Try this (updated)
$str = $obj->entry->id;
echo substr($str, strpos($str, "video:")+ 6);
Get the channel
$chan = $obj->entry->author->uri;
echo substr($chan , strpos($chan , "channel/")+ 8);
I'm trying to optimize a little php app I wrote for parsing YouTube profiles. Enter a YouTube user name and the app returns a simple listing of the number of videos uploaded, favorites, subscribers and so on by parsing the XML returned by a gdata query for the account profile.
For example this XML:
<?xml version="1.0" encoding="UTF-8" ?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:gd="http://schemas.google.com/g/2005" xmlns:yt="http://gdata.youtube.com/schemas/2007" gd:etag="W/"C0EHSX47eCp7I2A9Wh5aEU4."">
<id>tag:youtube.com,2008:user:H5m_qmnr3dHOO8x7m7dtvw</id>
<published>2006-06-15T22:59:11.000Z</published>
<updated>2014-01-07T21:53:58.000Z</updated>
<category scheme="http://schemas.google.com/g/2005#kind" term="http://gdata.youtube.com/schemas/2007#userProfile" />
<category scheme="http://gdata.youtube.com/schemas/2007/channeltypes.cat" term="DIRECTOR" />
<title>epontius</title>
<summary>channel page of epontius</summary>
<link rel="alternate" type="text/html" href="https://www.youtube.com/channel/UCH5m_qmnr3dHOO8x7m7dtvw" />
<link rel="self" type="application/atom+xml" href="https://gdata.youtube.com/feeds/api/users/H5m_qmnr3dHOO8x7m7dtvw?v=2" />
<author>
<name>epontius</name>
<uri>https://gdata.youtube.com/feeds/api/users/epontius</uri>
<yt:userId>H5m_qmnr3dHOO8x7m7dtvw</yt:userId>
</author>
<yt:channelId>UCH5m_qmnr3dHOO8x7m7dtvw</yt:channelId>
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.subscriptions" href="https://gdata.youtube.com/feeds/api/users/epontius/subscriptions?v=2" countHint="161" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.liveevent" href="https://gdata.youtube.com/feeds/api/users/epontius/live/events?v=2" countHint="0" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.favorites" href="https://gdata.youtube.com/feeds/api/users/epontius/favorites?v=2" countHint="73" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.contacts" href="https://gdata.youtube.com/feeds/api/users/epontius/contacts?v=2" countHint="184" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.inbox" href="https://gdata.youtube.com/feeds/api/users/epontius/inbox?v=2" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.playlists" href="https://gdata.youtube.com/feeds/api/users/epontius/playlists?v=2" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.uploads" href="https://gdata.youtube.com/feeds/api/users/epontius/uploads?v=2" countHint="26" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.newsubscriptionvideos" href="https://gdata.youtube.com/feeds/api/users/epontius/newsubscriptionvideos?v=2" />
<gd:feedLink rel="http://gdata.youtube.com/schemas/2007#user.recentactivity" href="https://gdata.youtube.com/feeds/api/users/epontius/events?v=2" />
<yt:googlePlusUserId>105085469892080308187</yt:googlePlusUserId>
<yt:location>US</yt:location>
<yt:statistics lastWebAccess="1970-01-01T00:00:00.000Z" subscriberCount="245" videoWatchCount="0" viewCount="0" totalUploadViews="68815" />
<media:thumbnail url="https://yt3.ggpht.com/-N_Et9Qg1APc/AAAAAAAAAAI/AAAAAAAAAAA/VIuQ_GuzA0Q/s88-c-k-no/photo.jpg" />
<yt:userId>H5m_qmnr3dHOO8x7m7dtvw</yt:userId>
<yt:username display="epontius">epontius</yt:username>
</entry>
The app as it stands works fine, but every so often YouTube changes the order of some elements or adds/removes items in the lines, so the app returns incorrect data as I am accessing the 'countHint' attribute.
For example:
$uploadscount = $gd->feedLink[6]->attributes();
$uploads = $uploadscount['countHint'];
echo 'Number of uploads: ' . '<span class="datatext">' . $uploads . '</span>' . '<br />';
Which would return 26 in this case. But if the number or order of the feedLink lines changes, I'd get incorrect information or an error since the index number of the feedLink is hard coded.
Each feedLink seems to have a unique rel= attribute and I was hoping to be able to use xpath and some sort of loop like a foreach to search for a specific rel value (ie. rel = "http://gdata.youtube.com/schemas/2007#user.uploads" ) and then be able to grab its countHint attribute value to assign it to a variable or at least grab its node index number (ie 6 in the case of the uploads) to then access the appropriate countHint attribute. And then repeat this for each of the feedLink lines and attributes for the data I want to grab.
That way it will be more accurate and dynamic in the event these feedLink lines are modified.
I just can't get my head around how to do it. The feedLink elements are empty elements in a different namespace (gd) and there are multiples which makes using xpath kind of confusing for me. I keep returning empty values and getting lost.
Any suggestions would be appreciated.
Ok. Think I'm getting somewhere thanks to suggestions.
foreach ($gd->feedLink as $feedLink) {
$attributes = $feedLink->attributes();
if (strpos($attributes['rel'], '#user.uploads')) {
$uploads = $attributes['countHint'];
}
elseif (strpos($attributes['rel'], '#user.favorites')) {
$favs = $attributes['countHint'];
}
elseif (strpos($attributes['rel'], '#user.subscriptions')) {
$subscriptions = $attributes['countHint'];
}
elseif (strpos($attributes['rel'], '#user.liveevent')) {
$liveevents = $attributes['countHint'];
}
elseif (strpos($attributes['rel'], '#user.contacts')) {
$friends = $attributes['countHint'];
}
}
That will return the proper values I'm looking for, but I'm worried now that I'm doing extra processing doing the loop, since I would assume each loop tests each line regardless of whether it has already found that value in a previous loop?
You're on the right path with using foreach to parse through the XML data. I would just do a strpos() on each feedlink until I found the uploads element. Then would I set $uploadscount.
Something like this, maybe:
foreach ($gd->feedLink as $feedLink) {
$attributes = $feedlink->attributes();
if (strpos($attributes['rel'], '#user.uploads')) {
$uploadscount = $attributes;
break;
}
continue;
}
First, I am a php newbie. I have looked at the question and solution here. For my needs however, the parsing does not go deep enough into the various articles.
A small sampling of my rss feed reads like this:
<channel>
<atom:link href="http://mywebsite.com/rss" rel="self" type="application/rss+xml" />
<title>My Web Site</title>
<description>My Feed</description>
<link>http://mywebsite.com/</link>
<image>
<url>http://mywebsite.com/views/images/banner.jpg</url>
<title>My Title</title>
<link>http://mywebsite.com/</link>
<description>Visit My Site</description>
</image>
<item>
<title>Article One</title>
<guid isPermaLink="true">http://mywebsite.com/details/e8c5106</guid>
<link>http://mywebsite.com/geturl/e8c5106</link>
<comments>http://mywebsite.com/details/e8c5106#comments</comments>
<pubDate>Wed, 09 Jan 2013 02:59:45 -0500</pubDate>
<category>Category 1</category>
<description>
<![CDATA[<div>
<img src="http://mywebsite.com/myimages/1521197-main.jpg" width="120" border="0" />
<ul><li>Poster: someone's name;</li>
<li>PostDate: Tue, 08 Jan 2013 21:49:35 -0500</li>
<li>Rating: 5</li>
<li>Summary:Lorem ipsum dolor </li></ul></div><div style="clear:both;">]]>
</description>
</item>
<item>..
The image links that I want to parse out are the ones way inside each Item > Description
The code in my php file reads:
<?php
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');
$imgs = $xml->xpath('/item/description/img');
foreach($imgs as $image) {
echo $image->src;
}
?>
Can someone please help me figure out how to configure the php code above?
Also a very newbie question... once I get the resulting image urls, how can I display the images in a row on my html?
Many thanks!!!
Hernando
The <img> tags inside that RSS feed are not actually elements of the XML document, contrary to the syntax highlighting on this site - they are just text inside the <description> element which happen to contain the characters < and >.
The string <![CDATA[ tells the XML parser that everything from there until it encounters ]]> is to be treated as a raw string, regardless of what it contains. This is useful for embedding HTML inside XML, since the HTML tags wouldn't necessarily be valid XML. It is equivalent to escaping the whole HTML (e.g. with htmlspecialchars) so that the <img> tags would look like <img>. (I went into more technical details on another answer.)
So to extract the images from the RSS requires two steps: first, get the text of each <description>, and second, find all the <img> tags in that text.
$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');
$descriptions = $xml->xpath('//item/description');
foreach ( $descriptions as $description_node ) {
// The description may not be valid XML, so use a more forgiving HTML parser mode
$description_dom = new DOMDocument();
$description_dom->loadHTML( (string)$description_node );
// Switch back to SimpleXML for readability
$description_sxml = simplexml_import_dom( $description_dom );
// Find all images, and extract their 'src' param
$imgs = $description_sxml->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
}
}
I don't have much experience with xPath, but you could try the following:
$imgs = $xml->xpath('item//img');
This will select all img-elements which are inside item-elements, regardless if there are other elements inbetween. Removing the leading slash will search for item anywhere in the documet, not just from the root. Otherwise, you'd need something like /rss/channel/item....
As for displaying the images: Just output <img>-tags followed by line-breaks, like so:
foreach($imgs as $image) {
echo '<img src="' . $image->src . '" /><br />';
}
The preferred way would be to use CSS instead of <br>-tags, but I think they are simpler for a start.
I have a little site in PHP (that is more static that dynamic, anyway). I see multiple sites publishes RSS fluxes or whatever the name is.
I wonder if there a possibility to load a such RSS flux, filter it by the site keywords theme and display it like a little thematic news column using PHP.
Is this task complicated? With what should I start? I am completely new in the field, so sorry if such a questions are already answered.
Update: There were a few errors in this answer's code. Thanks to #obelizsk for identifying them, I've updated the answer since.
Given a RSS:
<rss version="2.0">
<channel>
<title>(Title)</title>
<description>(Description)</description>
<link>http://www.link.to/the/feed/</link>
<item>
<title> New RSS Creation Tool </title>
<description> FeedForAll generates rss feeds so webmasters do not need to struggle with feed creation </description>
<link> http://www.feedforall.com </link>
<pubDate> Aug, 22 2004 00:12:30 EST </pubDate>
<category> software </category>
</item>
</channel>
</rss>
You could scan the title, description and/or category tags for keywords you're interested in.
Let's say you have an array in your PHP script, i.e.
$keywords = array("php", "mysql", "open source");
Then, using SimpleXML you can parse the RSS feed:
function has_keywords($haystack, $wordlist)
{
$found = false;
foreach ($wordlist as $w)
{
if (stripos($haystack, $w) !== false) {
$found = true;
break;
}
}
return $found;
}
$rss = simplexml_load_file("http://www.mywebsite.com/my/rss/feed/");
foreach ($rss->channel->item as $i)
{
if (
has_keywords($i->title, $keywords)
|| has_keywords($i->description, $keywords)
|| has_keywords($i->category)
)
{
$news[] = array
(
"title" => $i->title,
"description" => $i->description,
"link" => $i->link
);
}
}
This will provide you with an array $news, populated with the data you've selected through the keyword check.
You can render this with any HTML code you want, by simply iterating through $news.
On a side note, you can also perform the same task with JavaScript and XMLHttpRequest, with no need of PHP intervention. Load the feed as an XML and go with the same procedure. To render the data, you can use document.createElement() to append the child nodes containing the information you've requested.