I am trying to parse an xml but I get a problem while I am trying to fetch image url.
My xml is:
<entry>
<title>The Title</title>
<id>http://example.com/post/367327.html</id>
<summary>Some extra text</summary>
<link rel="enclosure" href="http://example.com/photos/f_0px_30px/image687.jpg" type="image/jpeg" length="" />
</entry>
So far I am using the code below to fetch the other data:
$url = "http://msdssite.com/feeds/xml/myxml.xml";
$xml = simplexml_load_file($url);
foreach($xml->entry as $PRODUCT)
{
$my_title = trim($PRODUCT->title);
$url = trim($PRODUCT->id);
$myimg = $PRODUCT->link;
}
How can I parse the href from this: <link rel="enclosure" href="http://example.com/photos/f_0px_30px/image687.jpg" type="image/jpeg" length="" />
Since it seems that your entries can contain several link tags, you need to check that the type attribute has the value image/jpeg to be sure to obtain a link to an image:
ini_set("display_errors", "On");
$feedURL = 'http://OLDpost.gr/feeds/xml/category-takhs-xatzhs.xml';
$feed = simplexml_load_file($feedURL);
$results = array();
foreach($feed->entry as $entry) {
$result = array('title' => (string)$entry->title,
'url' => (string)$entry->id);
$links = $entry->link;
foreach ($links as $link) {
$linkAttr = $link->attributes();
if (isset($linkAttr['type']) && $linkAttr['type']=='image/jpeg') {
$result['img'] = (string)$linkAttr['href'];
break;
}
}
$results[] = $result;
}
print_r($results);
Note that using simplexml like that (the foreach loop to find the good link tag) isn't very handy. It's better to use an XPath query:
foreach($feed->entry as $entry) {
$entry->registerXPathNamespace('e', 'http://www.w3.org/2005/Atom');
$results[] = array(
'title' => (string)$entry->title,
'url' => (string)$entry->id,
'img' => (string)$entry->xpath('e:link[#type="image/jpeg"]/#href')[0][0]
);
}
If that's the exact XML, actually there is no need for a foreach. Try this:
$xml = simplexml_load_file($url);
$my_title = (string) $xml->title;
$myimg = (string) $xml->link->attributes()['href']; // 5.4 or above
echo $myimg; // http://example.com/photos/f_0px_30px/image687.jpg
Try:
foreach($xml->entry as $PRODUCT)
{
$my_title = trim($PRODUCT->title[0]);
$url = trim($PRODUCT->id[0]);
$myimg = $PRODUCT->link[0];
}
Related
I convert an atom feed into RSS using atom2rss.xsl. Works fine.
Then, using DOMDocument, I try to get the post title and URL:
$feed = new DOMDocument();
$feed->loadHTML('<?xml encoding="utf-8" ?>' . $html);
if (!empty($feed) && is_object($feed) ) {
foreach ($feed->getElementsByTagName("item") as $item){
echo 'url: '. $item->getElementsByTagName("link")->item(0)->nodeValue;
echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
}
return;
}
But the post URL is empty.
See this eval which contains HTML. What am I doing wrong? I suspect I am not getting the link tag properly via $item->getElementsByTagName("link")->item(0)->nodeValue.
I think the problem is that there are several <link> elements in each item and the one (I think) your interested in is the one with rel="self" as an attribute. The quickest way (without messing around with XPath) is to loop over each <link> element checking for the right rel value and then take the href attribute from that...
if (!empty($feed) && is_object($feed) ) {
foreach ($feed->getElementsByTagName("item") as $item){
$url = "";
// Look for the 'right' link tag and extract URL from that
foreach ( $item->getElementsByTagName("link") as $link ) {
if ( $link->getAttribute("rel") == "self" ) {
$url = $link->getAttribute("href");
break;
}
}
echo 'url: '. $url;
echo 'title'. $item->getElementsByTagName("title")->item(0)->nodeValue;
}
return;
}
which gives...
url: https://www.blogger.com/feeds/2984353310628523257/posts/default/1947782625877709813titleExtraordinary Genius - Cp274
function get_links($link)
{
$ret = array();
$dom = new DOMDocument();
#$dom->loadHTML(file_get_contents($link));
$dom->preserveWhiteSpace = false;
$links = $dom->getElementsByTagName('a');
foreach ($links as $tag){
$ret[$tag->getAttribute('href')] = $tag->childNodes->item(0)->nodeValue;
}
return $ret;
}
print_r(get_links('http://www.google.com'));
OR u can use DOMXpath
$html = file_get_contents('http://www.google.com');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// take all links
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo $url.'
';
I am reading a RSS feed and each node has 3 links:
<link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='alternate' type='text/html' href='http://misterika.blogspot.com/2016/04/blog-post_11.html?showComment=1460801110852#c1280042367141045524' title=''/>
I read the "href" attribute with this:
'link' => $node->getElementsByTagName('link')->item(0)->getAttribute('href')
There is no problem when I use item(0) for the first link, there is no problem when I use item(1) for the second link but when I use item(2) for the third link I get this error:
Fatal error: Call to a member function getAttribute() on a non-object
Any idea how can I solve it?
Here is my full code:
<?php
$rss = new DOMDocument();
$rss->load('http://misterika.blogspot.com/feeds/comments/default');
$feed = array();
foreach ($rss->getElementsByTagName('entry') as $node) {
$item = array (
'title' => $node->getElementsByTagName('name')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('content')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(2)->getAttribute('href'),
'date' => $node->getElementsByTagName('published')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
$description = $feed[$x]['desc'];
$date = date('l F d, Y', strtotime($feed[$x]['date']));
echo '<p><strong>'.$title.'</strong><br />';
echo '<small><em>Posted on '.$date.'</em></small></p>';
echo '<p>'.$link.'</p>';
echo '<p>'.$description.'</p>';
}
?>
It's Working when I tested with the below sample snippet.
<?php
$xml = "<root><entry><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2202110476673931679/6339893542751280730/comments/default/1280042367141045524'/>
<link rel='alternate' type='text/html' href='http://misterika.blogspot.com/2016/04/blog-post_11.html?showComment=1460801110852#c1280042367141045524' title=''/></entry>
<entry><link rel='edit' type='application/atom+xml' href='http://google.com/'/>
<link rel='self' type='application/atom+xml' href='http://jenson.in/'/></entry></root>";
$node = new DOMDocument;
$node->loadXML($xml);
foreach($node->getElementsByTagName("entry") as $entry)
{
$link = $entry->getElementsByTagName("link");
echo $node->getElementsByTagName('link')->item(0)->getAttribute('href')."<br/>";
echo $node->getElementsByTagName('link')->item(1)->getAttribute('href')."<br/>";
//Below code checks if third link exists or not.
echo ($link->length > 2)?$node->getElementsByTagName('link')->item(2)->getAttribute('href'):"No alternate link!"."<br/>";
}
?>
See Demo
UPDATE:
In your Feed XML, There is no 3rd link after http://misterika.blogspot.com/2016/03/blog-post_20.html?showComment=1462627509971#c2966841279736454385 Only 2 links available in that entry node. That's why you're getting error.
EDIT
After looking at the URL you provided I made adjustments to the code using DOMXPath, like this:
$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$rss = file_get_contents('http://misterika.blogspot.com/feeds/comments/default');
$doc->loadXML($rss);
$xpath = new DOMXpath($doc);
$xpath->registerNameSpace('atom', 'http://www.w3.org/2005/Atom');
$links = $xpath->query('/atom:feed/atom:entry/atom:link[#href]');
foreach ($links as $link) {
$node = $link->nodeName;
$href = $link->getAttribute('href');
echo "{$node} - {$href}\n";
}
The key here is to register the default namespace in order for the code to work.
I have a problem with an rss feed in php. I want do get the img-url from "enclosure" but it´s not working.
My code just now:
$rss = simplexml_load_file($url);
$i = 0;
if($rss)
{
$items = $rss->channel->item;
foreach($items as $item)
{
$title = $item->title;
$link = $item->link;
$published_on = $item->pubDate;
$phpDate = strtotime($published_on);
$enclosure = $item['enclosure'][0]['url'];
From the RSS:
<enclosure url="http://www.svenskafans.com/image/7/141433/Snalla-Pelle-stanna-i-Gefle.jpg" lenght="51265" type="image/jpeg" />
Important to note is that sometimes there is not enclosure-tag with so it must work even if it is missing.
Thanks!
Best Regards
Charles
What about :
$rss=simplexml_load_file('http://www.svenskafans.com/rss/team/77.aspx');
foreach ($rss->channel->item as $item) {
if (isset($item->enclosure)) {
echo $item->enclosure['url'].'<br>';
}
}
outputs :
http://www.svenskafans.com/image/7/393988/Bilder-fran-tifot-for-Hugo-och-Bernhard.jpg
http://www.svenskafans.com/image/7/141433/Snalla-Pelle-stanna-i-Gefle.jpg
http://www.svenskafans.com/image/7/392527/Efter-Gefle-Elfsborg-En-skitmatch-i-regnet-gav-5-insikter.jpg
http://www.svenskafans.com/image/7/363552/Infor-Gefle-IF-IF-Elfsborg.jpg
http://www.svenskafans.com/image/7/211783/Gefles-Silly-Season-2013-2014-Angekeepern-Lloyd-Saxton-provtranar-med-Gefle.jpg
http://www.svenskafans.com/image/7/363058/Gefle-Panelen-17-Pensionera-Hugos-och-Bernhards-trojnummer.jpg
http://www.svenskafans.com/image/7/328214/Kungsbacksv-24-17-Hoppas-Hugo-satter-en-straff-mot-Elfsborg-i-89e-minuten.jpg
http://www.svenskafans.com/image/7/192682/Intervju-med-Daniel-Bernhardsson-Gefle-har-en-ljus-framtid.jpg
http://www.svenskafans.com/image/7/74875/Besked-idag-Bade-Bernhard-och-Hugo-spelar-sin-sista-match-i-Gefle-IF-pa-sondag.jpg
http://www.svenskafans.com/image/7/343968/Overraskande-piggt-Gefle-nar-Oremo-och-Jawo-natade.jpg
http://www.svenskafans.com/image/7/330399/Tack-AIK-nu-klart-till-100-att-Gefle-spelar-i-Allsvenskan-2014.jpg
http://www.svenskafans.com/image/7/363552/Rosta-fram-Gefles-MVP-2013.jpg
http://www.svenskafans.com/image/7/220468/Par-Asp-berattar-om-tiden-i-Gefle-roligaste-matchen-och-om-att-spela-med-Guidetti.jpg
I am parsing the following RSS feed (relevant part shown)
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
<prx:proxy>...</prx:proxy>
<prx:proxy>...</prx:proxy>
<pubDate>xxx</pubDate>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
Using the php code
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
var_dump($item); // Ok, Everything marked with xxx
var_dump($item->title); // Ok, title
foreach($item->proxy() as $entry) {
var_dump($entry); //empty
}
}
While I can access everything marked with xxx, I cannot access anything inside prx:proxy - mainly because : cannot be present in valid php varnames.
The question is how to reach prx:ip, as example.
Thanks!
Take a look at SimpleXMLElement::children, you can access the namespaced elements with that.
For example: -
<?php
$xml = '<xml xmlns:prx="http://example.org/">
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
</item>
</xml>';
$sxe = new SimpleXMLElement($xml);
foreach($sxe->item as $item)
{
$proxy = $item->children('prx', true)->proxy;
echo $proxy->ip; //101.226.74.169
}
Anthony.
I would just strip out the "prx:"...
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_rss = str_replace('prx:', '', $proxylist_rss);
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
foreach($item->proxy as $entry) {
var_dump($entry);
}
}
http://phpfiddle.org/main/code/jsz-vga
Try it like this:
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$feed = simplexml_load_string($proxylist_rss);
$ns=$feed->getNameSpaces(true);
foreach ($feed->channel->item as $item){
var_dump($item);
var_dump($item->title);
$proxy = $item->children($ns["prx"]);
$proxy = $proxy->proxy;
foreach ($proxy as $key => $value){
var_dump($value);
}
}
I have the following rss format, and i can't eject the 'content:encoded' value.
<item>
<title>some title</title>
<link>some link</link>
<pubDate>Sat, 07 Apr 2012 5:07:00 -0700</pubDate>
<content:encoded><![CDATA[this value]]></content:encoded>
</item>
i wrote this function, everything works well except the 'content:encoded' field that give me this error: 'Notice: Trying to get property of non-object'
function rssReader($url) {
$doc = new DOMDocument();
$doc->load($url);
$fields = array('title', 'description', 'link', 'pubDate', 'content:encoded');
$nodes = array();
foreach ($doc->getElementsByTagName('item') as $node) {
$item = array();
var_export($node, true);
foreach ($fields as $field)
$item[$field] = $node->getElementsByTagName($field)->item(0)->nodeValue;
$nodes[] = $item;
}
return $nodes;
}
You need to use getElementsByTagNameNS instead of getElementsByTagName for 'content:encoded' tag:
foreach ($fields as $field){
if( $field == 'content:encoded' ){
$item[$field] = $node->getElementsByTagNameNS('contentNamespaceURI','encoded')->item(0)->nodeValue;
}else{
$item[$field] = $node->getElementsByTagName($field)->item(0)->nodeValue;
}
}
You could find 'contentNamespaceURI' in rss. There must be something like:
xmlns:content="contentNamespaceURI"
The Tag name here is "encoded".
Just use
$content => $node->getElementsByTagName('encoded')->item(0)->nodeValue