I am adding an RSS feed to my website. I created the RSS.xml index file and next I want to display its contents in a nicely formatted way in a webpage.
Using PHP, I can do this:
$index = file_get_contents ($path . 'RSS.xml');
echo $index;
But all that does is dump the contents as a long stream of text with the tags removed.
I know that treating RSS.xml as a link, like this:
<a href="../blogs/RSS.xml">
<img src="../blogs/feed-icon-16.gif">Blog Index
</a>
causes my browser to parse and display it in a reasonable way when the user clicks on the link. However I want to embed it directly in the web page and not make the user go through another click.
What is the proper way to do what I want?
Use the following code:
include_once('Simple/autoloader.php');
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->enable_cache(false);
$feed->set_output_encoding('utf-8');
$feed->init();
$i=0;
$items = $feed->get_items();
foreach ($items as $item) {
$i++;
/*You are getting title,description,date of your rss by the following code*/
$title = $item->get_title();
$url = $item->get_permalink();
$desc = $item->get_description();
$date = $item->get_date();
}
Download the Simple folder data from : https://github.com/jewelhuq/Online-News-Grabber/tree/master/worldnews/Simple
Hope it will work for you. There $url mean your rss feed url. If you works then response.
Turns out, it's simple by using the PHP xml parer function:
$xml = simplexml_load_file ($path . 'RSS.xml');
$channel = $xml->channel;
$channel_title = $channel->title;
$channel_description = $channel->description;
echo "<h1>$channel_title</h1>";
echo "<h2>$channel_description</h2>";
foreach ($channel->item as $item)
{
$title = $item->title;
$link = $item->link;
$descr = $item->description;
echo "<h3><a href='$link'>$title</a></h3>";
echo "<p>$descr</p>";
}
Related
I am trying this code to get all images src from the link (https://www.vfmii.com/exc/aspquery?command=invoke&ipid=HL26423&ids=42337&RM=N)
But it is showing nothing to me. Can you suggest me some better way?
<?php
include_once 'simple_html_dom.php';
$html = file_get_html('https://www.vfmii.com/exc/aspquery?command=invoke&ipid=HL26423&ids=42337&RM=N');
// Find all images
foreach($html->find('img') as $element) {
echo $element->src. "<br>";
}
?>
Content is loaded using XHR. But you can get the JSON :
$js = file_get_contents('https://www.vfmii.com/exc/aspquery?command=invoke&ipid=HL26423&ids=42337&RM=N&out=json&lang=en');
$json = substr($js,8,-2) ;
$data = json_decode($json, true);
// print_r(array_keys($data)) ;
// example :
foreach ($data['rcoData'] as $rcoData) {
if (isset($rcoData['encodings'])) {
$last = end($rcoData['encodings'])['url'] ;
echo $last ;
}
}
The website in question that you're trying to scrape has content that is loaded through javascript after load. "PHP Simple HTML DOM Parser" can only get content that is on the page statically on load.
My scraping code works for just about every site i've come accross while testing... except for nytimes.com articles. I use ajax with the following PHP code (i've left out some details to focus on my specific problem):
$link = "http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp";
$article = new DOMDocument;
$article->loadHTMLFile($link);
//generate image array
$images = $article->getElementsByTagName("img");
foreach ($images as $image) {
$source = $image->getAttribute("src");
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
My problem is that the main images on nytimes pages don't even seem to get picked up by the getElementsByTagName. Pinterest finds a way to scrape the main images from this site for example: http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp whereas I cannot. Any suggestions?
OK. So this is what I tried so far as I found your question interesting.
When I do this on browser console using jQuery, I do get results on images. My query was
var a= new Array();
$('img[src]').each(function(){ a.push($(this).attr('src'));});
console.log(a);
Also see screenshot of results
Note that console.log(arrayname) work in Chrome browser.
So ideally your code must work. Please consider adding a is_null check like I've done.
Below is the code where I try loading the URL using a different approach(perhaps better too) and get the root cause of why you get only single image of NYT logo.
The resultant HTML screenshot is attached .
<?php
$html = file_get_contents("http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp");
echo $html;
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
#$doc->loadHTML("<html><body>".$html."</body></html>");
$xpath = new DOMXpath($doc);
$images = $xpath->query("//*/img");
if (!is_null($images)) {
echo sizeof($images);
foreach ($images as $image) {
$source = $image->getAttribute('src');
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
}
?>
You can't get the content via feed unless you are authenticated.
You can try-
To use context parameter in file_get_contents method
You can try consuming the RSS/ATOM feeds of the article.
you download the page as HTML and then load it in file_get_contents methods. PS: This works.
I'm trying to read the xml information that tumblr provides to create a kind of news feed off the tumblr, but I'm very stuck.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts[0] AS $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
Which always breaks as soon as it tries to go through the nodes. So basically "tumblr->posts;....ect" is displayed on my html page.
I've tried saving the information as a local xml file. I've tried using different ways to create the simplexml object, like loading it as a string (probably a silly idea). I double checked that my webhosting was running PHP5. So basically, I'm stuck on why this wouldn't be working.
EDIT: Ok I tried changing from where I started (back to the original way it was, starting from tumblr was just another (actually silly) way to try to fix it. It still breaks right after the first ->, so displays "posts[0] AS $post....ect" on screen.
This is the first thing I've ever done in PHP so there might be something obvious that I should have set up beforehand or something. I don't know and couldn't find anything like that though.
This should work :
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if ( !$xml ){
exit('Failed to retrieve data.');
}else{
foreach ( $xml->posts[0] AS $post){
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo $title;
echo '<p>'.$small_post.'</p>';
echo '<hr>';
}
}
First thing in you code is that you used root element that should not be used.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts->post as $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
$xml->posts returns you the posts nodes, so if you want to iterate the post nodes you should try $xml->posts->post, which gives you the ability to iterate through the post nodes inside the first posts node.
Also as Needhi pointed out you shouldn't pass through the root node (tumblr), because $xml represents itself the root node. (So I fixed my answer).
I just can't seem to be able to solve this. I want to get the media:thumbnail from an RSS file (http://feeds.bbci.co.uk/news/rss.xml).
I did some research and tried to incorporate insights from
https://stackoverflow.com/questions/6707315/getting-xml-attribute-from-mediathumbnail-in-bbc-rss-feed
and from other sources.
This is what I got:
$source_link = "http://feeds.bbci.co.uk/news/rss.xml";
$source_xml = simplexml_load_file($source_link);
$namespace = "http://search.yahoo.com/mrss/";
foreach ($source_xml->channel->item as $rss) {
$title = $rss->title;
$description = $rss->description;
$link = $rss->link;
$date_raw = $rss->pubDate;
$date = date("Y-m-j G:i:s", strtotime($date_raw));
$image = $rss->attributes($namespace);
print_r($image);
}
When I run the script, all I see is a white page. If I echo or print_r any of the other variables, then it works like a charm. It's just the $image one which poses problems. Why isn't this working? Thx for any help!
OK, it works now. I replaced
$image = $rss->attributes($namespace);
with
$image = $rss->children($namespace)->thumbnail[1]->attributes();
$image_link = $image['url'];
and it works like a charm now.
Base from this blog, with post title Processing media:thumbnail in RSS feeds with php.
The solution that I found works best simply loads the xml file as a string, then find and replace 'media:thumbnail' with a correctly formatted 'thumbnail' and lastly convert it back to xml with simplexml_load_string:
$xSource = 'http://feeds.bbci.co.uk/news/rss.xml';
$xsourcefile = file_get_contents( $xSource );
$xsourcefile = str_replace("media:thumbnail","thumbnail",$xsourcefile);
$xml = simplexml_load_string( $xsourcefile );
echo $row['xtitle'] . '<BR>';
foreach ($xml->channel->item as $item) {
echo ':' . $item->title . '<BR>';
echo ':' . $item->thumbnail['url'] . '<BR>';
}
$image = $rss->attributes($namespace);
This says "Give me all attributes of this <item> element which are in the media namespace". There are no attributes on the item element (much less any in the media namespace), so this returns nothing.
You want this:
$firstimage = $rss->children($namespace)->thumbnail[0];
BTW, when you use SimpleXML you need to be careful to cast your SimpleXMLElements to string when you need the text value of the element. Something like $rss->title is a SimpleXMLElement, not a string.
I want to pull the news feed from "http://rapfix.mtv.com/feed" for a website that I'm creating. I have everything working other than, being able to pull the URL location of the image for each article.
In this feed, the image URL is showing up like this in the code:
<media:content url="http://rapfix.mtv.com/wp-content/uploads/2011/05/tyler-handcuff.jpg" type="image/jpeg" height="300" width="575">
<media:text type="plain"><![CDATA[tyler-handcuff]]></media:text>
</media:content>
I've read from another stackoverflow question, that you're able to pull information from the node using something like this:
$item_pic = $article->getElementsByTagNameNS('http://purl.org/rss/1.0/modules/content/', 'content')->item(0);
But now, I'm trying to get the "URL" attribute out of it. Here's a look of my code:
$xml=("http://rapfix.mtv.com/feed");
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$x = $xmlDoc->getElementsByTagName('item');
foreach($x as $article){
$item_title = $article->getElementsByTagName('title')->item(0)->nodeValue;
$item_link = $article->getElementsByTagName('link')->item(0)->nodeValue;
$item_desc = $article->getElementsByTagName('description')->item(0)->nodeValue;
$item_pic = $article->getElementsByTagNameNS('http://purl.org/rss/1.0/modules/content/', 'content')->item(0);
echo ("<strong><a href='".$item_link."' target='_blank'>".$item_title."</a></strong><br />");
echo ("<div><div class='FloatLeft'><img src='".$item_pic."' width='100' height='100'/></div><div class='FloatLeft'>".$item_desc." - <a href='".$item_link."' target='_blank'>Read More</a></div>^");
}
Any ideas on how to get this done?
The namespace for your target element is media. The element name is content. The Namespace URL for the media namespace is http://search.yahoo.com/mrss/. Thus:
foreach($x as $article)
{
$nlContent = $article->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content');
if( $nlContent->length > 0 )
$item_pic = $nlContent->item(0)->getAttribute('url');
else
$item_pic = '/images/noimageavailable.jpg';
echo $item_pic . "\n";
}