How to get first image from a tumlbr rss feed in PHP - php

0Here is the relevant part of my rss feed:
<channel>
<description></description>
<title>Untitled</title>
<generator>Tumblr (3.0; #xxx)</generator>
<link>http://xxx.tumblr.com/</link>
<item>
<title>Title</title>
<description><figure><img src="https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg"/></figure></description>
<link>http://xxx.tumblr.com/post/99569244093</link>
<guid>http://xxx.tumblr.com/post/99569244093</guid>
<pubDate>Thu, 09 Oct 2014 11:19:33 -0400</pubDate>
</item>
</channel>
Using the answer from other questions on here I tried this:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$imgs = $feed->channel->item[0]->description->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
};
This is returning an empty array for $imgs
Does it have something to do with the tags being < > etc?
and if so what can I do?

You can get it from the description, which seems to include a HTML image tag for the image, by using a simple regular expression with preg_match:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$img = (string)$feed->channel->item[0]->description;
if (preg_match('/src="(.*?)"/', $img, $matches)) {
$src = $matches[1];
echo "src = $src", PHP_EOL;
}
Output:
src = http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

Before you can use xapth() on the description, you need to create a new XML document out of it:
$url = "http://xxx.tumblr.com/rss";
$desc = simplexml_load_file($url)->xpath('//item/description[1]')[0];
$src = simplexml_load_string("<x>$desc</x>")->xpath('//img/#src')[0];
echo $src;
Output:
http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg

I'm not sure if you can use this approach - as already mentioned by kjhughes as comment, your input XML does not contain any img element. But it's possible to retrieve the image source using XPath substring-functions:
substring-before(substring-after(substring-after(//item/description[contains(.,'img')],
'src='),'"'),'"')
Result:
https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg

Related

Trouble creating a valid RSS feed in PHP

I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";
Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();
The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.

RSS feed not returning anything with PHP?

I am trying to parse this feed with PHP. This is the structure of feed:
<item>
<title> ... TITLE ... </title>
<link> ... LINK .... </link>
<comments> .. COMMENTS .. </comments>
.... More tags here ....
<description><![CDATA[.. HTML ...]]></description>
</item>
This is my PHP code:
$rss = new DOMDocument();
$rss->loadHTML($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
but it echoes nothing. I have tried using cURL but even then I can't echo the description tag.
What do I need to change in this code for it to work? Please let me know If I need to post the code of alternate cURL method.
loadHTML is used to load html content, to read rss use below solution
Method 1
$feed_url = 'http://thechive.com/feed/';
$rss = new DOMDocument();
$rss->load($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
Method 2
$feed_url = 'http://thechive.com/feed/';
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
foreach($x->channel->item as $entry) {
echo $entry->description;
}
Hope it will help you...

PHP parsing a georss namespace with simpleXML

Trying to parse out lat/lon from a google maps rss feed:
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
echo $loc[0]->title;
echo $loc[0]->point;
The title shows up alright, but point gives me nothing. Each node looks like this:
<item>
<guid isPermaLink="false">0004740950fd067393eb4</guid>
<pubDate>Sun, 20 Sep 2009 21:47:49 +0000</pubDate>
<title>Big Wong King Restaurant</title>
<description><![CDATA[<div dir="ltr">$4.99 full meals!</div>]]></description>
<author>neufuture</author>
<georss:point>
40.716236 -73.998413
</georss:point>
<georss:elev>0.000000</georss:elev>
</item>
<?php
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
foreach ($loc->children('http://www.georss.org/georss') as $geo) {
echo $geo;
}
?>

Extract content from MediaWiki API call (XML, cURL)

URL:
http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml
This outputs something like:
<api><parse><text xml:space="preserve">text...</text></parse></api>
How do I get just the content between <text xml:space="preserve"> and </text>?
I used curl to fetch all the content from this URL. So this gives me:
$html = curl_exec($curl_handle);
What's the next step?
Use PHP DOM to parse it. Do it like this:
//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');
//display what you need:
echo $nodes->item(0)->nodeValue;
This outputs:
text...

XML Parsing Error

here i am creating xml file dynamically at run time but i m getting error
XML Parsing Error: junk after document element
Location: http://localhost/tam/imagedata.php?imageid=8
Line Number 9, Column 1:
^
$id=$_GET['imageid'];
$dom = new DomDocument('1.0');
$query="select * from tbl_image_gallery where imageId='$id'";
$select=mysql_query($query);
while($res=mysql_fetch_array($select))
{
$content = $dom->appendChild($dom->createElement('content'));
$image = $content->appendChild($dom->createElement('image'));
$small_image_path = $image->appendChild($dom->createElement('small_image_path'));
$small_image_path->appendChild($dom->createTextNode("load/images/small/".$res['image']));
$big_image_path = $image->appendChild($dom->createElement('big_image_path'));
$big_image_path->appendChild($dom->createTextNode("load/images/big/".$res['image']));
$description = $image->appendChild($dom->createElement('description'));
$description->appendChild($dom->createTextNode($res['description']));
$dom->formatOutput = true;
}
echo $test1 = $dom->saveXML();
and xml format is
<?xml version="1.0"?>
<content>
<image>
<small_image_path>load/images/small/1.jpg</small_image_path>
<big_image_path>load/images/big/1.jpg</big_image_path>
<description>hgjghj</description>
</image>
<image><small_image_path>load/images/small/2.jpg</small_image_path><big_image_path>load/images/big/2.jpg</big_image_path><description>fgsdfg</description></image><image><small_image_path>load/images/small/3.jpg</small_image_path><big_image_path>load/images/big/3.jpg</big_image_path><description>sdfgsdfg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/avatar.jpg</small_image_path><big_image_path>load/images/big/avatar.jpg</big_image_path><description></description></image></content>
Can it be that you are posting html code into the description field?
Could be usefull to add a CDataSection instead of a TextNode
$cdata = $dom->createCDATASection($res['description']);
$image->appendChild($cdata);

Categories