PHP xml array parsing data - php

Hey all i have this type of XML i am trying to get data from. This is just a snip of the large XML code:
<entry>
<id>http://www.google.com/calendar/feeds/[Letters/numbers here]group.calendar.google.com/public/basic/[Letters/numbers here]</id>
<published>2013-08-01T13:40:24.000Z</published>
<updated>2013-08-01T13:40:24.000Z</updated>
<title type='html'>[Title Here]</title>
<summary type='html'>When: Tue Sep 24, 2013 7am</summary>
<content type='html'>When: Tue Sep 24, 2013 7am
<br />Event Status: confirmed
</content>
<link rel='alternate' type='text/html' href='https://www.google.com/calendar/event?eid=[Letters/numbers here]' title='alternate'/>
<link rel='self' type='application/atom+xml' href='https://www.google.com/calendar/feeds/[Letters/numbers here]group.calendar.google.com/public/basic/[Letters/numbers here]'/>
<author>
<name>[email here]</name>
<email>[email here]</email>
</author>
</entry>
etc... etc....
Currently i can get both published and updated just fine by doing the following:
<?php
$url = strtolower($_GET['url']);
$doc = new DOMDocument();
$doc->load('http://www.google.com/calendar/feeds/[number/letters here].calendar.google.com/public/basic');
$entries = $doc->getElementsByTagName("entry");
foreach ($entries as $entry) {
$tmpPublished = $entry->getElementsByTagName("published");
$published = $tmpPublished->item(0)->nodeValue;
$tmpUpdated = $entry->getElementsByTagName("updated");
$updated = $tmpUpdated->item(0)->nodeValue;
}
?>
However i am unsure as to how to get the inner data from within the parent array - that being link in this case.
So i need to get
link->href
I would imagine it would be:
$tmpLink = $entry->getElementsByTagName("link");
$link = $tmpLink->item( 2 )->nodeValue;
Any help would be great!

you can use:
$links = $doc->getElementsByTagName("link");
foreach ($links as $link) {
$href = $link->getAttribute("href");
}
if you want to get href... hope that I understood what you wanted :)

You can do this with simplexml_load_string like following codes:
$entries = simplexml_load_string($string);
foreach ($entries as $entry) {
echo $entry->published;
echo $entry->updated;
foreach($entry->link as $link)
{
echo $link->attributes()->type;
echo $link->attributes()->rel;
}
}

Related

Read colon tags values XML PHP

I've already read those topics:
PHP library for parsing XML with a colons in tag names? and
Simple XML - Dealing With Colons In Nodes but i coundt implement those solutions.
<item>
<title> TITLE </title>
<itunes:author> AUTHOR </itunes:author>
<description> TEST </description>
<itunes:subtitle> TEST </itunes:subtitle>
<itunes:summary> TEST </itunes:summary>
<itunes:image href="yoyoyoyo.jpg"/>
<pubDate> YESTERDAY </pubDate>
<itunes:block>no</itunes:block>
<itunes:explicit>no</itunes:explicit>
<itunes:duration>99:99:99</itunes:duration>
<itunes:keywords>key, words</itunes:keywords>
</item>
I want to get only itunes:duration and itunes:image. Here is my code:
$result = simplexml_load_file("http://blablabla.com/feed.xml");
$items = $result->xpath("//item");
foreach ($items as $item) {
echo $item->title;
echo $item->pubDate;
}
I tried using children() method but when i try to print_r it it says that the node no longer exists.
You should use the children() on the $item element to get it's child-elements:
$str =<<< END
<item>
<title> TITLE </title>
<itunes:author> AUTHOR </itunes:author>
<description> TEST </description>
<itunes:subtitle> TEST </itunes:subtitle>
<itunes:summary> TEST </itunes:summary>
<itunes:image href="yoyoyoyo.jpg"/>
<pubDate> YESTERDAY </pubDate>
<itunes:block>no</itunes:block>
<itunes:explicit>no</itunes:explicit>
<itunes:duration>99:99:99</itunes:duration>
<itunes:keywords>key, words</itunes:keywords>
</item>
END;
$result = #simplexml_load_string($str);
$items = $result->xpath("//item");
foreach ($items as $item) {
echo $item->title . "\n";
echo $item->pubDate . "\n";
echo $item->children()->{'itunes:duration'} . "\n";
}
Output:
TITLE
YESTERDAY
99:99:99
Here goes my alternative solution if Dekel's dont work for someone.
Using method getNamespaces
$result = simplexml_load_file("http://blablabla.com/feed.xml");
$items = $result->xpath("//item");
foreach ($items as $item)
{
$itunesSpace = $item->getNameSpaces(true);
$nodes = $item->children($itunesSpace['itunes']);
//TEST
echo $nodes->subtitle
//99:99:99
echo $nodes->duration
//If you want the image Href
$imageAux = $nodes->image->attributes();
//yoyoyoyo.jpg
echo $imageAux['href'];
}

Parse feed media group with array of children

I have this XML feed:
<item>
<title>Title</title>
<media:group>
<media:content url="http://example.it/image.jpg" type="image/jpeg">
<media:thumbail url="http://example.it/image.jpg" type="image/png"/>
<media:credit>Credit</media:credit>
</media:content>
<media:content url="http://example.it/image2.jpg" type="image/jpeg">
<media:thumbail url="http://example.it/image2.jpg" type="image/png"/>
<media:credit>Credit2</media:credit>
</media:content>
</media:group>
</item>
This is my PHP code for read it:
$rss = new SimpleXMLElement($url);
foreach ($rss->channel->item as $item) {
$title = $item->title;
}
No problem reading "title" item, but how can I read "url", "thumbnail", "credit" for each media:content?
-------SOLVED-------
$rss = new SimpleXMLElement($url);
foreach ($rss->channel->item as $item) {
$title = $item->title;
$gallerie = $item->children('http://search.yahoo.com/mrss/')->group->content;
foreach($gallerie as $g) {
echo $g->attributes()['url'] ."<br/>";
}
}

PHP reading RSS feed gets error on the third link in a node

I am reading a RSS feed and each node has 3 links:
<link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='alternate' type='text/html' href='http://misterika.blogspot.com/2016/04/blog-post_11.html?showComment=1460801110852#c1280042367141045524' title=''/>
I read the "href" attribute with this:
'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href')
There is no problem when I use item(0) for the first link, there is no problem when I use item(1) for the second link but when I use item(2) for the third link I get this error:
Fatal error: Call to a member function getAttribute() on a non-object
Any idea how can I solve it?
Here is my full code:
<?php
$rss = new DOMDocument();
$rss->load('http://misterika.blogspot.com/feeds/comments/default');
$feed = array();
foreach ($rss->getElementsByTagName('entry') as $node) {
$item = array (
'title' => $node->getElementsByTagName('name')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('content')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(2)->getAttribute('href'),
'date' => $node->getElementsByTagName('published')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
$description = $feed[$x]['desc'];
$date = date('l F d, Y', strtotime($feed[$x]['date']));
echo '<p><strong>'.$title.'</strong><br />';
echo '<small><em>Posted on '.$date.'</em></small></p>';
echo '<p>'.$link.'</p>';
echo '<p>'.$description.'</p>';
}
?>
It's Working when I tested with the below sample snippet.
<?php
$xml = "<root><entry><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='alternate' type='text/html' href='http://misterika.blogspot.com/2016/04/blog-post_11.html?showComment=1460801110852#c1280042367141045524' title=''/></entry>
<entry><link rel='edit' type='application/atom+xml' href='http://google.com/'/>
<link rel='self' type='application/atom+xml' href='http://jenson.in/'/></entry></root>";
$node = new DOMDocument;
$node->loadXML($xml);
foreach($node->getElementsByTagName("entry") as $entry)
{
$link = $entry->getElementsByTagName("link");
echo $node->getElementsByTagName('link')->item(0)->getAttribute('href')."<br/>";
echo $node->getElementsByTagName('link')->item(1)->getAttribute('href')."<br/>";
//Below code checks if third link exists or not.
echo ($link->length > 2)?$node->getElementsByTagName('link')->item(2)->getAttribute('href'):"No alternate link!"."<br/>";
}
?>
See Demo
UPDATE:
In your Feed XML, There is no 3rd link after http://misterika.blogspot.com/2016/03/blog-post_20.html?showComment=1462627509971#c2966841279736454385 Only 2 links available in that entry node. That's why you're getting error.
EDIT
After looking at the URL you provided I made adjustments to the code using DOMXPath, like this:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$rss = file_get_contents('http://misterika.blogspot.com/feeds/comments/default');
$doc->loadXML($rss);
$xpath = new DOMXpath($doc);
$xpath->registerNameSpace('atom', 'http://www.w3.org/2005/Atom');
$links = $xpath->query('/atom:feed/atom:entry/atom:link[#href]');
foreach ($links as $link) {
$node = $link->nodeName;
$href = $link->getAttribute('href');
echo "{$node} - {$href}\n";
}
The key here is to register the default namespace in order for the code to work.

Parsing xml with simplexml_load

I am trying to parse an xml but I get a problem while I am trying to fetch image url.
My xml is:
<entry>
<title>The Title</title>
<id>http://example.com/post/367327.html</id>
<summary>Some extra text</summary>
<link rel="enclosure" href="http://example.com/photos/f_0px_30px/image687.jpg" type="image/jpeg" length="" />
</entry>
So far I am using the code below to fetch the other data:
$url = "http://msdssite.com/feeds/xml/myxml.xml";
$xml = simplexml_load_file($url);
foreach($xml->entry as $PRODUCT)
{
$my_title = trim($PRODUCT->title);
$url = trim($PRODUCT->id);
$myimg = $PRODUCT->link;
}
How can I parse the href from this: <link rel="enclosure" href="http://example.com/photos/f_0px_30px/image687.jpg" type="image/jpeg" length="" />
Since it seems that your entries can contain several link tags, you need to check that the type attribute has the value image/jpeg to be sure to obtain a link to an image:
ini_set("display_errors", "On");
$feedURL = 'http://OLDpost.gr/feeds/xml/category-takhs-xatzhs.xml';
$feed = simplexml_load_file($feedURL);
$results = array();
foreach($feed->entry as $entry) {
$result = array('title' => (string)$entry->title,
'url' => (string)$entry->id);
$links = $entry->link;
foreach ($links as $link) {
$linkAttr = $link->attributes();
if (isset($linkAttr['type']) && $linkAttr['type']=='image/jpeg') {
$result['img'] = (string)$linkAttr['href'];
break;
}
}
$results[] = $result;
}
print_r($results);
Note that using simplexml like that (the foreach loop to find the good link tag) isn't very handy. It's better to use an XPath query:
foreach($feed->entry as $entry) {
$entry->registerXPathNamespace('e', 'http://www.w3.org/2005/Atom');
$results[] = array(
'title' => (string)$entry->title,
'url' => (string)$entry->id,
'img' => (string)$entry->xpath('e:link[#type="image/jpeg"]/#href')[0][0]
);
}
If that's the exact XML, actually there is no need for a foreach. Try this:
$xml = simplexml_load_file($url);
$my_title = (string) $xml->title;
$myimg = (string) $xml->link->attributes()['href']; // 5.4 or above
echo $myimg; // http://example.com/photos/f_0px_30px/image687.jpg
Try:
foreach($xml->entry as $PRODUCT)
{
$my_title = trim($PRODUCT->title[0]);
$url = trim($PRODUCT->id[0]);
$myimg = $PRODUCT->link[0];
}

PHP SimpleXml - Retrieving attributes of namespaced children

I'm parsing an external Atom feed, some entries have a collection of namespaced children - I'm failing to retrieve attributes from those children. Abbreviated example:
$feed = <<<EOD
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:ai="http://activeinterface.com/thincms/2012">
<entry>
<title>Some Title</title>
<ai:image>path/to/some/image</ai:image>
<ai:ocurrence dateid="20120622" date="Fri, June 22, 2012" time="6:00 pm" />
<ai:ocurrence dateid="20120720" date="Fri, July 20, 2012" time="6:00 pm" />
</entry>
</feed>
EOD;
$xml = new SimpleXmlElement($feed);
foreach ($xml->entry as $entry){
echo $entry->title;
$namespaces = $entry->getNameSpaces(true);
$ai = $entry->children($namespaces['ai']);
echo $ai->image;
foreach($ai->ocurrence as $o){
echo $o['date'];
}
}
Everything but the attribute retrieval of the namespaced children works fine - if the children's tagnames aren't namespaced, it works fine. If grabbing the node value (rather than an attribute), even if namespaced, it works fine. What am I missing?
Try this
$xml = new SimpleXmlElement($feed);
foreach ($xml->entry as $entry)
{
$namespaces = $entry->getNameSpaces(true);
$ai = $entry->children($namespaces['ai']);
foreach ($ai->ocurrence as $o)
{
$date=$o->attributes();
echo $date['date'];
echo "<br/>";
}
}
don't know why, but apparently array access won't work here... need the attributes method:
echo $o->attributes()->date;

Categories