Lazy loading a RSS feed in PHP

Lazy loading a RSS feed in PHP - php

My Wordpress website is slowed down by a plugin i wrote myself. This plugin loads events from a RSS feed on a different site.
I have disabled the plugin and it made a difference of 20 in Google Page Speed.
How can i lazy load a RSS feed in Ajax or Javascript?
The code i have:
$rss = simplexml_load_file(get_option('capu_url'));
foreach ($rss->channel->item as $item) {
echo '<h4>' . $item->title . "</h4>";
//echo "<p>" . $item->description . "</p>";
$dom = new DOMDocument;
$dom->strictErrorChecking = FALSE ;
libxml_use_internal_errors(true);
$dom->loadHTML($item->description);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//ul[#class="ee-event-datetimes-ul"]'); // get <ul>'s with class 'up'
foreach($nodes as $node) { // loops through each <ul>
foreach($node->getElementsByTagName('li') as $li) { // loops through the <li>'s
echo $li->nodeValue . "<br/>\n"; // echo's the <li> elements

Depending on the access you have to the server, consider running a cron job, for example once an hour to retrieve the RSS feed, convert it to JSON and save it as a file somewhere within the web root.
Now you can use JavaScript to retrieve the feed asynchronously from your server as a JSON data file and display it on your site.
If you don't have shell access, try to use the wp_cron function in Wordpress.

Related

How to access an HTML attribute and retrieve data from it in PHP?

I'm new to PHP and I would like to know how to retrieve data from an HTML element such as an src?
It's very easy to do that in jQuery:
$('img').attr('src');
But I have no idea how to do it in PHP (if it is possible).
Here's an example I'm working on:
I loaded $result into SimpleXMLElement and stored it into $xml:
$xml = simplexml_load_string($result) or die("Error: Cannot create object");
Then used foreach to loop over all elements:
foreach($xml->links->link as $link){
echo 'Image: ' . $link->{'link-code-html'}[0] . '</br>';
// returns sometihing similar to: <a href='....'><img src='....'></a>
}
Inside of the foreach I'm trying to access links (src) in img.
Is there a way to access src of the img nested inside of the a — clear when outputted to the screen:
echo 'Image: ' . $link->{'link-code-html'}[0] . '</br>';

I would do this with the built-in DOMDocument and DOMXPath APIs, and then you can use the getAttribute method on any matching img node:
$doc = new DOMDocument();
// Load some example HTML. If you need to load from file, use ->loadHTMLFile
$doc->loadHTML("<a href='abc.com'><img src='ping1.png'></a>
<a href='def.com'><img src='ping2.png'></a>
<a href='ghi.com'>something else</a>");
$xpath = new DOMXpath($doc);
// Collect the images that are children of anchor elements
$imgs = $xpath->query("//a/img");
foreach($imgs as $img) {
echo "Image: " . $img->getAttribute("src") . "\n";
}

Display rss xml index in html

I am adding an RSS feed to my website. I created the RSS.xml index file and next I want to display its contents in a nicely formatted way in a webpage.
Using PHP, I can do this:
$index = file_get_contents ($path . 'RSS.xml');
echo $index;
But all that does is dump the contents as a long stream of text with the tags removed.
I know that treating RSS.xml as a link, like this:
<a href="../blogs/RSS.xml">
<img src="../blogs/feed-icon-16.gif">Blog Index
</a>
causes my browser to parse and display it in a reasonable way when the user clicks on the link. However I want to embed it directly in the web page and not make the user go through another click.
What is the proper way to do what I want?

Use the following code:
include_once('Simple/autoloader.php');
$feed = new SimplePie();
$feed->set_feed_url($url);
$feed->enable_cache(false);
$feed->set_output_encoding('utf-8');
$feed->init();
$i=0;
$items = $feed->get_items();
foreach ($items as $item) {
$i++;
/*You are getting title,description,date of your rss by the following code*/
$title = $item->get_title();
$url = $item->get_permalink();
$desc = $item->get_description();
$date = $item->get_date();
}
Download the Simple folder data from : https://github.com/jewelhuq/Online-News-Grabber/tree/master/worldnews/Simple
Hope it will work for you. There $url mean your rss feed url. If you works then response.

Turns out, it's simple by using the PHP xml parer function:
$xml = simplexml_load_file ($path . 'RSS.xml');
$channel = $xml->channel;
$channel_title = $channel->title;
$channel_description = $channel->description;
echo "<h1>$channel_title</h1>";
echo "<h2>$channel_description</h2>";
foreach ($channel->item as $item)
{
$title = $item->title;
$link = $item->link;
$descr = $item->description;
echo "<h3><a href='$link'>$title</a></h3>";
echo "<p>$descr</p>";
}

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.

You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

Scraping Thumbnail from NYtimes

My scraping code works for just about every site i've come accross while testing... except for nytimes.com articles. I use ajax with the following PHP code (i've left out some details to focus on my specific problem):
$link = "http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp";
$article = new DOMDocument;
$article->loadHTMLFile($link);
//generate image array
$images = $article->getElementsByTagName("img");
foreach ($images as $image) {
$source = $image->getAttribute("src");
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
My problem is that the main images on nytimes pages don't even seem to get picked up by the getElementsByTagName. Pinterest finds a way to scrape the main images from this site for example: http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp whereas I cannot. Any suggestions?

OK. So this is what I tried so far as I found your question interesting.
When I do this on browser console using jQuery, I do get results on images. My query was
var a= new Array();
$('img[src]').each(function(){ a.push($(this).attr('src'));});
console.log(a);
Also see screenshot of results
Note that console.log(arrayname) work in Chrome browser.
So ideally your code must work. Please consider adding a is_null check like I've done.
Below is the code where I try loading the URL using a different approach(perhaps better too) and get the root cause of why you get only single image of NYT logo.
The resultant HTML screenshot is attached .
<?php
$html = file_get_contents("http://www.nytimes.com/2014/02/07/us/huge-leak-of-coal-ash-slows-at-north-carolina-power-plant.html?hp");
echo $html;
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->recover=true;
#$doc->loadHTML("<html><body>".$html."</body></html>");
$xpath = new DOMXpath($doc);
$images = $xpath->query("//*/img");
if (!is_null($images)) {
echo sizeof($images);
foreach ($images as $image) {
$source = $image->getAttribute('src');
echo '<img src="' . $source . '" alt="alt"><br><br>';
}
}
?>
You can't get the content via feed unless you are authenticated.
You can try-
To use context parameter in file_get_contents method
You can try consuming the RSS/ATOM feeds of the article.
you download the page as HTML and then load it in file_get_contents methods. PS: This works.

Extract and dump a DOM node (and its children) in PHP

’I have the following scenario and I'm already spending hours trying to handle it: I'm developing a Wordpress theme (hence PHP) and I want to check whether the content of a post (which is HTML) contains a tag with a certain id/class. If so, I want to extract it from the content and place it somewhere else.
Example: Let's say the text content of the Wordpress post is
<?php
/* $content actually comes from WP function get_the_content() */
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
?>
So how can I extract that div with the class (could also live with giving it an ID), output it (with tags and all that) in one place of the template, and output the rest (without the extracted tag, of course) in another place of the template?
I've already tried with the DOMDocument class, p.i.t.a. to me, maybe I'm too stupid.

Try:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
$contents = '';
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$contents = $dom->saveXml($node);
break;
}
echo $contents;
How to get the remaining xml/html:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$node->parentNode->removeChild($node);
break;
}
$contents = '';
foreach ($xpath->query('//body/*') as $node) {
$contents .= $dom->saveXml($node);
}
echo $contents;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Lazy loading a RSS feed in PHP - php

Related

How to access an HTML attribute and retrieve data from it in PHP?

Display rss xml index in html

Retrieve data from html page using xpath and php

Scraping Thumbnail from NYtimes

Extract and dump a DOM node (and its children) in PHP

Categories

Resources