How to display RSS feeds from other sites - php

I have been researching this topic for a few days now and i'm still non the wiser as on how to do it.
I want to get an RSS feed from forexfactory.com to my website, i want to do some formatting on whats happening and i also want the latest information from them (Although those last two points can wait as long as i have some more or feed running).
Preferably I'd like to develop this from the ground up if anyone knows of a tutorial or something i could use?
If not i will settle for using a third party API or something like that as long as i get to do some of the work.
I'm not sure what it is but there is something about RSS that i'm not getting so if anyone knows of any good, probably basic tutorials that would help me out a lot. It's kind of hard going through page after page of google searches.
Also i'm not to fussed on the language it's outputted in Javascript, PHP or HTML will be great though.
Thanks for the help.

It looks like SimplePie may be what you are looking for. It's a very basic RSS plugin which is quite easy to use and is customisable too. You can download it from the website.
You can use it at it's bare bones or you can delve deeper in to the plugin if you wish. Here's a demo on their website.

index.php
include('rss_class.php');
$feedlist = new rss($feed_url);
echo $feedlist->display(2,"Feed Title");
rss_class.php
<?php
class rss {
var $feed;
function rss($feed){
$this->feed = $feed;
}
function parse(){
$rss = simplexml_load_file($this->feed);
//print_r($rss);die; /// Check here for attributes
$rss_split = array();
foreach ($rss->channel->item as $item) {
$title = (string) $item->title;
$link = (string) $item->link;
$pubDate = (string) $item->pubDate;
$description = (string) $item->description;
$image = $rss->channel->item->enclosure->attributes();
$image_url = $image['url'];
$rss_split[] = '
<li>
<h5>'.$title.'</h5>
<span class="dateWrap">'.$pubDate.'</span>
<p>'.$description.'</p>
Read Full Story
</li>
';
}
return $rss_split;
}
function display($numrows,$head){
$rss_split = $this->parse();
$i = 0;
$rss_data = '<h2>'.$head.'</h2><ul class="newsBlock">';
while($i<$numrows){
$rss_data .= $rss_split[$i];
$i++;
}
$trim = str_replace('', '',$this->feed);
$user = str_replace('&lang=en-us&format=rss_200','',$trim);
$rss_data.='</ul>';
return $rss_data;
}
}
?>

I didn't incorporate the < TABLE > tags as there might be more than one article that you would like to display.
class RssFeed
{
public $rss = "";
public function __construct($article)
{
$this->rss = simplexml_load_file($article, 'SimpleXMLElement', LIBXML_NOERROR | LIBXML_NOWARNING);
if($this->rss != false)
{
printf("<TR>\r\n");
printf("<TD>\r\n");
printf("<h3>%s</h3>\r\n", $this->rss->channel->title);
printf("</TD></TR>\r\n");
foreach($this->rss->channel->item as $value)
{
printf("<TR>\r\n");
printf("<TD id=\"feedmiddletd\">\r\n");
printf("<A target=\"_blank\" HREF=\"%s\">%s</A><BR/>\r\n", $value->link, $value->title);
printf($value->description);
printf("</TD></TR>\r\n");
}
}
}
}

Related

make relative links into absolute links, with simplehtmldom

I have a script that I'm trying to finish with simple_html_dom
I can scrape the web-pages I want, the links are invalid. I want to make the links valid, so I’ve been trying different things, and not getting it to work.
I can get it to scrape, or to fix links from a previously saved page, but I cant seem to scrape the links, and fix the links so they reference the correct domain.
I might be misusing or misunderstanding how to use simplehtmldom's "save" function.
Here is what Ive got right now:
<?php
include 'simple_html_dom.php';
$file1 = "http://www.indeed.com/jobs?q=Electrician&l=maine";
$file2 = "http://www.indeed.com/jobs?q=Electronic&l=maine";
$file3 = "http://www.indeed.com/jobs?q=Electronics+Tech&l=maine";
$file4 = "http://www.indeed.com/jobs?q=Helpdesk&l=maine";
$file5 = "http://www.indeed.com/jobs?q=Trades&l=maine";
$SEARCH = array($file1, $file2, $file3, $file4, $file5);
//Fix links
$domain = "http://www.indeed.com";
$rep['/href="(?!https?:\/\/)(?!data:)(?!#)/'] = 'href="'.$domain;
$rep['/src="(?!https?:\/\/)(?!data:)(?!#)/'] = 'src="'.$domain;
$rep['/#import[\n+\s+]"\//'] = '#import "'.$domain;
$rep['/#import[\n+\s+]"\./'] = '#import "'.$domain;
//Find this: data-tn-component="organicJob"
//<div class=" row result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
$html = new simple_html_dom();
for ($i = 0; $i<6; $i++)
{
$html->load_file($SEARCH[$i]);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
$str = $html->save($div);
$output = preg_replace(array_keys($rep), array_values($rep), $str);
echo $output->innertext . "\n";
}
}
?>
how can I scrape the pages, and fix the links to point to the correct domain?

Display rss xml index in html

I am adding an RSS feed to my website. I created the RSS.xml index file and next I want to display its contents in a nicely formatted way in a webpage.
Using PHP, I can do this:
$index = file_get_contents ($path . 'RSS.xml');
echo $index;
But all that does is dump the contents as a long stream of text with the tags removed.
I know that treating RSS.xml as a link, like this:
<a href="../blogs/RSS.xml">
<img src="../blogs/feed-icon-16.gif">Blog Index
</a>
causes my browser to parse and display it in a reasonable way when the user clicks on the link. However I want to embed it directly in the web page and not make the user go through another click.
What is the proper way to do what I want?
Use the following code:
include_once('Simple/autoloader.php');
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->enable_cache(false);
$feed->set_output_encoding('utf-8');
$feed->init();
$i=0;
$items = $feed->get_items();
foreach ($items as $item) {
$i++;
/*You are getting title,description,date of your rss by the following code*/
$title = $item->get_title();
$url = $item->get_permalink();
$desc = $item->get_description();
$date = $item->get_date();
}
Download the Simple folder data from : https://github.com/jewelhuq/Online-News-Grabber/tree/master/worldnews/Simple
Hope it will work for you. There $url mean your rss feed url. If you works then response.
Turns out, it's simple by using the PHP xml parer function:
$xml = simplexml_load_file ($path . 'RSS.xml');
$channel = $xml->channel;
$channel_title = $channel->title;
$channel_description = $channel->description;
echo "<h1>$channel_title</h1>";
echo "<h2>$channel_description</h2>";
foreach ($channel->item as $item)
{
$title = $item->title;
$link = $item->link;
$descr = $item->description;
echo "<h3><a href='$link'>$title</a></h3>";
echo "<p>$descr</p>";
}

Replacing link with plain text with php simple html dom

I have a program that removes certain pages from a web; i want to then traverse the remaining pages and "unlink" any links to those removed pages. I'm using simplehtmldom. My function takes a source page ($source) and an array of pages ($skipList). It finds the links, and I'd like to then manipulate the dom to convert the element into the $link->innertext, but I don't know how. Any help?
function RemoveSpecificLinks($source, $skipList) {
// $source is the html source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->href = ''; // Should convert to simple text element
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}
I have never used simplehtmldom, but this is what I think should solve your problem:
function RemoveSpecificLinks($source, $skipList) {
// $source is the HTML source file;
// $skipList is an array of link destinations (hrefs) that we want unlinked
$docHtml = file_get_contents($source);
$htmlObj = str_get_html($docHtml);
$links = $htmlObj->find('a');
if (isset($links)) {
foreach ($links as $link) {
if (in_array($link->href, $skipList)) {
$link->outertext = $link->plaintext; // THIS SHOULD WORK
// IF THIS DOES NOT WORK TRY:
// $link->outertext = $link->innertext;
}
}
}
$docHtml = $htmlObj->save();
$htmlObj->clear();
unset($htmlObj);
return($docHtml);
}
Please provide me some feedback as if this worked or not, also specifying which method worked, if any.
Update: Maybe you would prefer this:
$link->outertext = $link->href;
This way you get the link displayed, but not clickable.

How to get 5 day Yahoo weather forecast

I've been working on a weather feed for my website.
I'm currently only able to get forecasts for the next 2 days. I want forecasts for the next 5 days.
Here's my code:
$ipaddress = $_SERVER['REMOTE_ADDR'];
$locationstr = "http://api.locatorhq.com/?user=MYAPIUSER&key=MYAPIKEY&ip=".$ipaddress."&format=xml";
$xml = simplexml_load_file($locationstr);
$city = $xml->city;
switch ($city)
{
case "Pretoria":
$loccode = "SFXX0044";
$weatherfeed = file_get_contents("http://weather.yahooapis.com/forecastrss?p=".$loccode."&u=c");
if (!$weatherfeed) die("weather check failed, check feed URL");
$weather = simplexml_load_string($weatherfeed);
readWeather($loccode);
break;
}
function readWeather($loccode)
{
$doc = new DOMDocument();
$doc->load("http://weather.yahooapis.com/forecastrss?p=".$loccode."&u=c");
$channel = $doc->getElementsByTagName("channel");
$arr;
foreach($channel as $ch)
{
$item = $ch->getElementsByTagName("item");
foreach($item as $rcvd)
{
$desc = $rcvd->getElementsByTagName("description");
$_SESSION["weather"] = $desc->item(0)->nodeValue;
}
}
}
I'd like to direct your attention to the lines that query for the weather:
$doc = new DOMDocument();
$doc->load("http://weather.yahooapis.com/forecastrss?p=".$loccode."&u=c");
// url resolves to http://weather.yahooapis.com/forecastrss?p=SFXX0044&u=c in this case
Searching google, I found this link which suggested I use this url instead:
$doc->load("http://xml.weather.yahoo.com/forecastrss/SFXX0044_c.xml");
While this also works and I see a 5 day forecast in the XML file, I still only see 2 days forecast on my site.
I have a feeling this is because I'm leveraging the channel child element found in the RSS feed, while the XML feed has no such child element.
If anyone can provide any insight here, I would really appreciate it.
This is what I get for asking questions too early...
As I was looking over my code again, I noticed that I had the yahooapis URL referenced twice: once in the switch, and again in readWeather.
Having removed the redundant reference and updating the url as per the thread mentioned, I see that it does work now.
See updated code for reference:
switch ($city)
{
case "Pretoria":
$loccode = "SFXX0044";
readWeather($loccode);
break;
}
function readWeather($loccode)
{
$doc = new DOMDocument();
$doc->load("http://xml.weather.yahoo.com/forecastrss/".$loccode."_c.xml");
$channel = $doc->getElementsByTagName("channel");
$arr;
foreach($channel as $ch)
{
$item = $ch->getElementsByTagName("item");
foreach($item as $rcvd)
{
$desc = $rcvd->getElementsByTagName("description");
$_SESSION["weather"] = $desc->item(0)->nodeValue;
}
}
}

PHP SimpleXML Breaking when trying to traverse nodes

I'm trying to read the xml information that tumblr provides to create a kind of news feed off the tumblr, but I'm very stuck.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts[0] AS $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
Which always breaks as soon as it tries to go through the nodes. So basically "tumblr->posts;....ect" is displayed on my html page.
I've tried saving the information as a local xml file. I've tried using different ways to create the simplexml object, like loading it as a string (probably a silly idea). I double checked that my webhosting was running PHP5. So basically, I'm stuck on why this wouldn't be working.
EDIT: Ok I tried changing from where I started (back to the original way it was, starting from tumblr was just another (actually silly) way to try to fix it. It still breaks right after the first ->, so displays "posts[0] AS $post....ect" on screen.
This is the first thing I've ever done in PHP so there might be something obvious that I should have set up beforehand or something. I don't know and couldn't find anything like that though.
This should work :
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if ( !$xml ){
exit('Failed to retrieve data.');
}else{
foreach ( $xml->posts[0] AS $post){
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo $title;
echo '<p>'.$small_post.'</p>';
echo '<hr>';
}
}
First thing in you code is that you used root element that should not be used.
<?php
$request_url = 'http://candybrie.tumblr.com/api/read?type=post&start=0&num=5&type=text';
$xml = simplexml_load_file($request_url);
if (!$xml)
{
exit('Failed to retrieve data.');
}
else
{
foreach ($xml->posts->post as $post)
{
$title = $post->{'regular-title'};
$post = $post->{'regular-body'};
$small_post = substr($post,0,320);
echo .$title.;
echo '<p>'.$small_post.'</p>';
}
}
?>
$xml->posts returns you the posts nodes, so if you want to iterate the post nodes you should try $xml->posts->post, which gives you the ability to iterate through the post nodes inside the first posts node.
Also as Needhi pointed out you shouldn't pass through the root node (tumblr), because $xml represents itself the root node. (So I fixed my answer).

Categories