0Here is the relevant part of my rss feed:
<channel>
<description></description>
<title>Untitled</title>
<generator>Tumblr (3.0; #xxx)</generator>
<link>http://xxx.tumblr.com/</link>
<item>
<title>Title</title>
<description><figure><img src="https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg"/></figure></description>
<link>http://xxx.tumblr.com/post/99569244093</link>
<guid>http://xxx.tumblr.com/post/99569244093</guid>
<pubDate>Thu, 09 Oct 2014 11:19:33 -0400</pubDate>
</item>
</channel>
Using the answer from other questions on here I tried this:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$imgs = $feed->channel->item[0]->description->xpath('//img');
foreach($imgs as $image) {
echo (string)$image['src'];
};
This is returning an empty array for $imgs
Does it have something to do with the tags being < > etc?
and if so what can I do?
You can get it from the description, which seems to include a HTML image tag for the image, by using a simple regular expression with preg_match:
$content = file_get_contents("http://xxx.tumblr.com/rss");
$feed = new SimpleXmlElement($content);
$img = (string)$feed->channel->item[0]->description;
if (preg_match('/src="(.*?)"/', $img, $matches)) {
$src = $matches[1];
echo "src = $src", PHP_EOL;
}
Output:
src = http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg
Before you can use xapth() on the description, you need to create a new XML document out of it:
$url = "http://xxx.tumblr.com/rss";
$desc = simplexml_load_file($url)->xpath('//item/description[1]')[0];
$src = simplexml_load_string("<x>$desc</x>")->xpath('//img/#src')[0];
echo $src;
Output:
http://40.media.tumblr.com/58d24c3009638514325b113859ba369f/tumblr_nk0mwfhKXU1sl87kjo1_500.jpg
I'm not sure if you can use this approach - as already mentioned by kjhughes as comment, your input XML does not contain any img element. But it's possible to retrieve the image source using XPath substring-functions:
substring-before(substring-after(substring-after(//item/description[contains(.,'img')],
'src='),'"'),'"')
Result:
https://31.media.tumblr.com/c78c7t3abd23423549d3bb0f705/tumblr_inline_nkp9z234d0uj.jpg
Related
I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";
Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();
The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.
I am trying to parse this feed with PHP. This is the structure of feed:
<item>
<title> ... TITLE ... </title>
<link> ... LINK .... </link>
<comments> .. COMMENTS .. </comments>
.... More tags here ....
<description><![CDATA[.. HTML ...]]></description>
</item>
This is my PHP code:
$rss = new DOMDocument();
$rss->loadHTML($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
but it echoes nothing. I have tried using cURL but even then I can't echo the description tag.
What do I need to change in this code for it to work? Please let me know If I need to post the code of alternate cURL method.
loadHTML is used to load html content, to read rss use below solution
Method 1
$feed_url = 'http://thechive.com/feed/';
$rss = new DOMDocument();
$rss->load($feed_url);
foreach ($rss->getElementsByTagName('item') as $node) {
$description = $node->getElementsByTagName('description')->item(0)->nodeValue;
echo $description;
}
Method 2
$feed_url = 'http://thechive.com/feed/';
$content = file_get_contents($feed_url);
$x = new SimpleXmlElement($content);
foreach($x->channel->item as $entry) {
echo $entry->description;
}
Hope it will help you...
Trying to parse out lat/lon from a google maps rss feed:
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
echo $loc[0]->title;
echo $loc[0]->point;
The title shows up alright, but point gives me nothing. Each node looks like this:
<item>
<guid isPermaLink="false">0004740950fd067393eb4</guid>
<pubDate>Sun, 20 Sep 2009 21:47:49 +0000</pubDate>
<title>Big Wong King Restaurant</title>
<description><![CDATA[<div dir="ltr">$4.99 full meals!</div>]]></description>
<author>neufuture</author>
<georss:point>
40.716236 -73.998413
</georss:point>
<georss:elev>0.000000</georss:elev>
</item>
<?php
$file = "http://maps.google.com/maps/ms?ie=UTF8&hl=en&vps=1&jsv=327b&msa=0&output=georss&msid=217909142388190116501.000473ca1b7eb5750ebfe";
$xml = simplexml_load_file($file);
$loc = $xml->channel->item;
foreach ($loc->children('http://www.georss.org/georss') as $geo) {
echo $geo;
}
?>
URL:
http://en.wikipedia.org/w/api.php?action=parse&prop=text&page=Lost_(TV_series)&format=xml
This outputs something like:
<api><parse><text xml:space="preserve">text...</text></parse></api>
How do I get just the content between <text xml:space="preserve"> and </text>?
I used curl to fetch all the content from this URL. So this gives me:
$html = curl_exec($curl_handle);
What's the next step?
Use PHP DOM to parse it. Do it like this:
//you already have input text in $html
$html = '<api><parse><text xml:space="preserve">text...</text></parse></api>';
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('text');
//display what you need:
echo $nodes->item(0)->nodeValue;
This outputs:
text...
here i am creating xml file dynamically at run time but i m getting error
XML Parsing Error: junk after document element
Location: http://localhost/tam/imagedata.php?imageid=8
Line Number 9, Column 1:
^
$id=$_GET['imageid'];
$dom = new DomDocument('1.0');
$query="select * from tbl_image_gallery where imageId='$id'";
$select=mysql_query($query);
while($res=mysql_fetch_array($select))
{
$content = $dom->appendChild($dom->createElement('content'));
$image = $content->appendChild($dom->createElement('image'));
$small_image_path = $image->appendChild($dom->createElement('small_image_path'));
$small_image_path->appendChild($dom->createTextNode("load/images/small/".$res['image']));
$big_image_path = $image->appendChild($dom->createElement('big_image_path'));
$big_image_path->appendChild($dom->createTextNode("load/images/big/".$res['image']));
$description = $image->appendChild($dom->createElement('description'));
$description->appendChild($dom->createTextNode($res['description']));
$dom->formatOutput = true;
}
echo $test1 = $dom->saveXML();
and xml format is
<?xml version="1.0"?>
<content>
<image>
<small_image_path>load/images/small/1.jpg</small_image_path>
<big_image_path>load/images/big/1.jpg</big_image_path>
<description>hgjghj</description>
</image>
<image><small_image_path>load/images/small/2.jpg</small_image_path><big_image_path>load/images/big/2.jpg</big_image_path><description>fgsdfg</description></image><image><small_image_path>load/images/small/3.jpg</small_image_path><big_image_path>load/images/big/3.jpg</big_image_path><description>sdfgsdfg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/avatar.jpg</small_image_path><big_image_path>load/images/big/avatar.jpg</big_image_path><description></description></image></content>
Can it be that you are posting html code into the description field?
Could be usefull to add a CDataSection instead of a TextNode
$cdata = $dom->createCDATASection($res['description']);
$image->appendChild($cdata);