I am using PHP and simpleXML to read the following rss feed:
http://feeds.bbci.co.uk/news/england/rss.xml
I can get most of the information I want like so:
$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');
echo '<h1>'. $rss->channel->title . '</h1>';
foreach ($rss->channel->item as $item) {
echo '<h2>' . $item->title . "</h2>";
echo "<p>" . $item->pubDate . "</p>";
echo "<p>" . $item->description . "</p>";
}
But how would I output the thumbnail image that is in the following tag:
<media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/51078000/jpg/_51078953_226alanpotbury.jpg"/>
As you already know, SimpleXML lets you select an node's child using the object property operator -> or a node's attribute using the array access ['name']. It's great, but the operation only works if what you select belongs to the same namespace.
If you want to "hop" from a namespace to another, you can use the children() or attributes() methods. In your case, this is made a bit trickier because you have <item/> in the global namespace, the node you're looking for is in the "media" namespace* and then the attributes are in the global namespace again (they are not prefixed.) So using the normal object/array notation you'll have to "hop" twice:
foreach ($rss->channel->item as $item)
{
// we load the attributes into $thumbAttr
// you can either use the namespace prefix
$thumbAttr = $item->children('media', true)->thumbnail->attributes();
// or preferably the namespace name, read note below for an explanation
$thumbAttr = $item->children('http://search.yahoo.com/mrss/')->thumbnail->attributes();
echo $thumbAttr['url'];
}
*Note
I refer to the namespace as the "media" namespace but that's not really correct. The namespace name is http://search.yahoo.com/mrss/, and "media" is just a prefix, some sort of alias if you will. What's important to keep in mind is that http://search.yahoo.com/mrss/ is the real name of the namespace. At some point, your RSS provider might decide to change the prefix to, say, "yahoo" and your script will stop working if your script refers to the "media" prefix. However, if you use the namespace name, it will keep working no matter the prefix.
SimpleXML is pretty bad at handling namespaces. You have two choices: The simplest hack is to simply read the contents of the feed into a string and replace the namespaces;
$feed = file_get_contents('http://feeds.bbci.co.uk/news/england/rss.xml');
$feed = str_replace('<media:', '<', $feed);
$rss = simplexml_load_string($feed);
...
Now you can access the element thumbnail directly.
The more elegant (not really) method is to find out what URI the namespace uses. If you look at the source code for http://feeds.bbci.co.uk/news/england/rss.xml you see that it points to http://search.yahoo.com/mrss/.
Now you can use this URI in the children() method of a SimpleXMLElement to get the contents of the media:thumbnail element;
$rss = simplexml_load_file('http://feeds.bbci.co.uk/news/england/rss.xml');
foreach ($rss->channel->item as $item) {
$media = $item->children('http://search.yahoo.com/mrss/');
...
}
Related
I am trying to utilize simplexml to convert an iTunes RSS Feed to JSON so I can better parse it. The issue I am having is that it is not coming back as correctly formatted JSON.
$feed_url = 'https://podcasts.subsplash.com/c2yjpyh/podcast.rss';
$feed_contents = file_get_contents($feed_url);
$xml = simplexml_load_string($feed_contents);
$podcasts = json_decode(json_encode($xml));
print_r($podcasts);
Is there a better way to be attempting this to get the correct result?
Thanks to IMSoP for pointing me in the right direction! This took a bit of studying but the solution ends up being very simple! Instead of trying to convert to a JSON format, just use SimpleXML. However, due to the namespaces, it does require an additional line to map the itunes: prefix.
So in my iTunes feed rss, the following line exists: xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd So we just reference this to make accessing the values very easy. Here is a quick example:
$rss = simplexml_load_file('https://podcasts.example.com/podcast.rss');
foreach ($rss->channel->item as $item){
// Now we define the map for the itunes: namespace
$itunes = $item->children('http://www.itunes.com/dtds/podcast-1.0.dtd');
// This is a value WITHOUT the itunes: namespace
$title = $item->title;
// This is a value WITH the itunes: namespace
$author = $itunes->author;
echo $title . '<br>';
echo $author . '<br>';
}
The other little issue that I ran into is getting attributes such as the url for images and audio links. That is accomplished by using the attributes() function like so:
// Access attributes WITH itunes: namespace
$image = $itunes->image->attributes();
// Access attributes WITHOUT itunes: namespace
$audio = $item->enclosure->attributes();
// To echo these we simple add the desired attribute in `[]`:
echo $image['href'] . '<br>';
echo $audio['url'] . '<br>';
I wrote code that get data from http://ax.itunes.apple.com/WebObjects/MZStoreServices.woa/ws/RSS/topsongs/limit=10/xml
for($i = 0; $i < 10; $i++){
$title = $xml->entry[$i]->title; // work
$name = $xml->entry[$i]->im:name; //does not work
$html .= "<br />$title<hr />";} echo $html;
The problem is im .I cannot get data of it . How can I solve it ?
You can not access <im:name> in that way.
im: is a NameSpace prefix: to work with namespaced tags you have to use a specific syntax.
You can retrieve all document’s Namespaces URI using:
$namespaces = $xml->getDocNamespaces();
and you will obtain this array:
Array
(
[im] => http://itunes.apple.com/rss
[] => http://www.w3.org/2005/Atom
)
Each array key is the Namespace prefix, each array value is the Namespace URI. The URI with an empty key represents global document Namespace URI.
SimpleXML has not the best syntax to work with Namespaces (IMO DOMDocument is a better choice). In your case, you have to do in this way:
$children = $xml->entry[$i]->children( 'im', True );
echo $children->artist . '<br>';
echo $children->name . '<br>';
Result for ->entry[0]:
Prince & The Revolution
Purple Rain
The ->children second parameter True means “regard first parameter as prefix”; as alternative, you can use complete Namespace URI in this way:
$children = $xml->entry[$i]->children( 'http://itunes.apple.com/rss' );
... ♪ I only wanted to see you laughing in the purple rain ♪ ...
I'm trying to process an RSS feed using PHP and there are some tags such as 'itunes:image' which I need to process. The code I'm using is below and for some reason these elements are not returning any value. The output is length is 0.
How can I read these tags and get their attributes?
$f = $_REQUEST['feed'];
$feed = new DOMDocument();
$feed->load($f);
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent
$arrt = $item->getElementsByTagName('itunes:image');
print_r($arrt);
}
getElementsByTagName is specified by DOM, and PHP is just following that. It doesn't consider namespaces. Instead, use getElementsByTagNameNS, which requires the full namespace URI (not the prefix). This appears to be http://www.itunes.com/dtds/podcast-1.0.dtd*. So:
$img = $item->getElementsByTagNameNS('http://www.itunes.com/dtds/podcast-1.0.dtd', 'image');
// Set preemptive fallback, then set value if check passes
urlImage = '';
if ($img) {
$urlImage = $img->getAttribute('href');
}
Or put the namespace in a constant.
You might be able to get away with simply removing the prefix and getting all image tags of any namespace with getElementsByTagName.
Make sure to check whether a given item has an itunes:image element at all (example now given); in the example podcast, some don't, and I suspect that was also giving you trouble. (If there's no href attribute, getAttribute will return either null or an empty string per the DOM spec without erroring out.)
*In case you're wondering, there is no actual DTD file hosted at that location, and there hasn't been for about ten years.
<?php
$rss_feed = simplexml_load_file("url link");
if(!empty($rss_feed)) {
$i=0;
foreach ($rss_feed->channel->item as $feed_item) {
?>
<?php echo $rss_feed->children('itunes', true)->image->attributes()->href;?>
<?php
}
?>
When trying to parse an XML document in PHP, nothing is returned.
The XML document im trying to use:
http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml
The code I have tried:
$player = simplexml_load_file('http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml');
foreach ($player->PlayerName as $playerInfo) {
echo $playerInfo['firstName'];
}
I have also tried:
$player = simplexml_load_file('http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml');
echo "Name: " . $player->PlayerName[0]['firstName'];
What do I need to change for the attributes to show?
You might try to print_r the whole data youself and finally find what you need:
var_dump($player->Player->PlayerName->Attrib['value']->__toString())
//⇒ string(7) "Daniele"
To list all "values" (firstname, lastname,...) you need list all children and their attributes:
$xml = simplexml_load_file('http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml');
foreach ($xml as $player) {
foreach ($player->PlayerName->children() as $attrib) {
echo $attrib['name'] . ': ' . $attrib['value'] . PHP_EOL;
}
}
Output:
firstName: Daniele
lastName: Viola
commonName: Viola D.
commentaryName:
This does not work, since you are trying to access an attribute and not a node value.
You might also run into problems, because the xml is not "valid" for simple xml. See my blogpost about the issues with parsing xml with php here http://dracoblue.net/dev/gotchas-when-parsing-xml-html-with-php/
If you use my Craur ( https://github.com/DracoBlue/Craur ) library instead, it will look like this:
$xml_string = file_get_contents('http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml');
$craur = Craur::createFromXml($xml_string);
echo $craur->get('Player.PlayerName.Attrib#value'); // works since the first attrib entry is the name
If you want to be sure about the attribute (or select another one) use:
$xml_string = file_get_contents('http://cdn.content.easports.com/media2011/fifa11zoneplayer/25068538/632A0001_10_ZONE_PLAYER_iUa.xml');
$craur = Craur::createFromXml($xml_string);
foreach ($craur->get('Player.PlayerName.Attrib[]') as $attribute)
{
if ($attribute->get('#name') == 'firstName')
{
echo $attribute->get('#value');
}
}
I'm trying to read a large xml file (about 40 MB), and use this data for update the db of my application.
It seems i've found a good compromise in terms of elapsed time/memory using XMLReader and simplexml_import_dom() but i can't get the value of attributes with colon in their name... for example <g:attr_name>.
If i simply use $reader->read() function for each "product" node i can retrive the value as $reader->value, but if i expand() the node and copy it with $doc->importNode this attributes are ignored.
$reader = new XMLReader();
$reader->open(__XML_FILE__);
$doc = new DOMDocument;
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if($reader->localName=="product"){
$node = simplexml_import_dom($doc->importNode($reader->expand(), true));
echo $node->attr_name."<br><br>";
$reader->next('product');
}
}
}
Probably i miss something... any advice would be really appriciated!
Thanks.
Attributes with colons in their name have a namespace.
The part before the colon is a prefix that is registered to some namespace (usually in the root node). To access the namespaced attributes of a SimpleXmlElement you have to pass the namespace to the attributes() method:
$attributes = $element->attributes('some-namespace'); // or
$attributes = $element->attributes('g', TRUE); // and then
echo $attributes['name'];
The same applies to element children of a node. Access them via the childrens() method
$children = $element->children('some-namespace'); // or
$children = $element->children('g', TRUE); // and then
echo $children->elementName;
On a sidenote, if you want to import this data to your database, you can also try to do so directly:
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html#xml-5.1-importing