Alternative for DOMDocument()

Alternative for DOMDocument() - php

i am using DOMDocument() to include RSS feed in my code. However i get this error:
URL file-access is disabled in the server configuration
and thats because my server doesnt allow me either to modify the php.ini file or to set allow_url_fopen to ON.
Is there a workaround for this? This is my full code:
<?php
$rss = new DOMDocument();
$rss->load('rss.php');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
echo '<table>';
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo <<<EOF
<tr>
<td><b>$title</b></td>
</tr>
EOF;
}
echo '</table>';
?>
Thank you.

Okay, i solved it myself.
<?php
$k = 'rss.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $k);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$rss = curl_exec($ch);
curl_close($ch);
$xml = simplexml_load_string($rss, 'SimpleXMLElement', LIBXML_NOCDATA);
$feed = array();
foreach($xml->channel->item as $item){
$item = array (
'title' => $item->title,
'desc' => $item->description,
'link' => $item->link,
'date' => $item->pubDate,
);
array_push($feed, $item);
}
$limit = 5;
echo '<table>';
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo <<<EOF
<tr>
<td><b>$title</b></td>
</tr>
EOF;
}
echo '</table>';
?>

Use cURL commands. You really should be using this for server to server interactions rather than trying to pass URL's to constructors anyways.
Here is cURL documentation - http://us1.php.net/curl
I also have a simple cURL based REST client you can feel free to use - https://github.com/mikecbrant/php-rest-client
Basically, all you are looking to do is use cURL to retrieve the remote content instead of trying to open it directly using fopen wrapper. Once you retrieve the content then you pass it in to DOMDocument.

Related

Why is new SimpleXMLElement causing a 500 error?

I have a simple script that until yesterday had worked fine for 2 years. Im just taking a XML feed from a WP site and formatting it to be displayed on a different website. Here is the code:
<?php
function download_page($path){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$path);
curl_setopt($ch, CURLOPT_FAILONERROR,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$retValue = curl_exec($ch);
curl_close($ch);
return $retValue;
}
$sXML = download_page('https://example.com/tradeblog/feed/atom/');
$oXML = new SimpleXMLElement($sXML);
$items = $oXML->entry;
$i = 0;
foreach($items as $item) {
$title = $item->title;
$link = $item->link;
echo '<li>';
foreach($link as $links) {
$loc = $links['href'];
$href = str_replace("/feed/atom/", "", $loc);
echo "<a href=\"$href\" target=\"_blank\">";
}
echo $title;
echo "</a>";;
echo "</li>";
if(++$i == 3) break;
}
?>
I can echo out $sXML and it will display the entire XML contents as expected. When I try and echo $oXML I get the 500 error. Any use of $oXML causes the 500. What changed? Is there a different / better way to do this using PHP?

It seems your xml source is not exactly a xml. I tried to validate it using w3 scholl validator and it throws an error. Tried here too, and got the same error.

Not sure why, but this worked
<?php
$rss = new DOMDocument();
$rss->load('https://example.com/tradeblog/feed/rss2/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 3;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo '<li>'.$title.'</li>';
}
?>

Simple Html Dom Scraping half the page

I am trying to scrape this url https://nrg91.gr/nrg-airplay-chart/ using simple-html-dom, but it does not seem to get the full html source code. This code:
include_once('simple_html_dom.php');
$html = file_get_html('https://nrg91.gr/nrg-airplay-chart');
echo $html->plaintext;
displays the content up to the h1, just before the content I am after. And from the simple-html-dom manual examples, this should display all links from that url:
foreach($html->find('a') as $e)
echo $e->href . '<br>';
but it only displays the links up to the main navigation menu, not from the main body or footer.
I also tried using prerender.com, to fully load url before passing it to file_get_html but the result was the same. What am I doing wrong?

That library looks like it hasn't been updated in 7 years. I'd always recommend using PHP's built-in functions:
$url = "https://nrg91.gr/nrg-airplay-chart/";
$dom = new DomDocument();
libxml_use_internal_errors(true);
$dom->load($url);
foreach($dom->getElementsByTagName("a") as $e) {
echo $e->getAttribute("href") . "\n";
}

Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocument and SimpleXML.
The concept is to locate each "row" of data via the xpath //ul[#id="chart_ul"]/li, then using dom_import_simplexml( $outer )->getNodePath() to build a new xpath to select the individual elements where the desired data can be located.
$temp = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'nrg-airplay-chart.html';
if( file_exists( $temp ) === false or filemtime( $temp ) < time() - 3600 )
{
file_put_contents( $temp, $html = file_get_contents('https://nrg91.gr/nrg-airplay-chart/') );
}
else
{
$html = file_get_contents( $temp );
}
$dom = new DOMDocument();
$dom->loadHTML( $html );
$xml = simplexml_import_dom( $dom );
$array = array();
foreach( $xml->xpath('//ul[#id="chart_ul"]/li') as $index => $set )
{
$basexpath = dom_import_simplexml( $set )->getNodePath();
$array[] = array(
'ranking' => (string) $xml->xpath( $basexpath . '//span[#id="ranking"]' )[0],
'artist' => (string) $xml->xpath( $basexpath . '//p[#id="artist"]/b' )[0],
'title' => (string) $xml->xpath( $basexpath . '//p[#id="title"]' )[0],
'youtube' => (string) $xml->xpath( $basexpath . '//div[#id="media"]/a/#href' )[0],
);
}
print_r( $array );

Another approach you might wanna comply:
<?php
function get_content($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_exec($ch);
$htmlContent = curl_exec($ch);
curl_close($ch);
return $htmlContent;
}
$link = "https://nrg91.gr/nrg-airplay-chart/";
$xml = get_content($link);
$dom = #DOMDocument::loadHTML($xml);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//li[contains(#id,"wprs_chart-")]') as $items){
$artist = $xpath->query('.//p[#id="artist"]/b',$items)->item(0)->nodeValue;
$title = $xpath->query('.//p[#id="title"]',$items)->item(0)->nodeValue;
echo "{$artist} -- {$title}<br>";
}
?>
Output you should get like:
PORTOGAL THE MAN -- Feel It Still
JAX JONEW Feat INA WROLDSEN -- Breathe
CAMILA CABELLO -- Havana
CARBI B, J BALVIN & BAD BUNNY -- I Like It
ZAYN Feat SIA -- Dusk Till Dawn

Illegal string offset 'date'

I am trying to upgrade a friend's Wordpress site to the latest versions of Wordpress and PHP.
All is working fine except for a scrolling news ticker he uses on his homepage that errors out with "Illegal string offset 'date'", and no news is shown.
This is the script:
<?php
$xmlOption = get_option('xmlFeed');
if (!isset($xmlOption)) {
$buildURL = "https://wordpress.org/news/feed/";
$request = curl_init();
curl_setopt($request, CURLOPT_URL, $buildURL);
curl_setopt($request, CURLOPT_HEADER, false);
curl_setopt($request, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($request);
curl_close($request);
$xml = new SimpleXMLElement($result);
$channel = $xml->channel;
delete_option('xmlFeed');
$otion = array(
'xml' => $channel,
'date' => date('y-m-d')
);
add_option('xmlFeed', $option);
}
if ($xmlOption['date'] == date('y-m-d')) {
$channel = $xmlOption['xml'];
} else {
$buildURL = "https://wordpress.org/news/feed/";
$request = curl_init();
curl_setopt($request, CURLOPT_URL, $buildURL);
curl_setopt($request, CURLOPT_HEADER, false);
curl_setopt($request, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($request);
curl_close($request);
$xml = new SimpleXMLElement($result);
$channel = $xml->channel;
delete_option('xmlFeed');
$otion = array(
'xml' => $channel,
'date' => date('y-m-d')
);
add_option('xmlFeed', $option);
}
$i = 0;
while ($i <= 5) {
echo "<li><a href='" . $channel->item->$i->link . "' target='_blank'>" . $channel->item->$i->title . "</a></li>";
$i++;
}
?>
I noticed the use of $otion two times, which i thought was maybe a typo. But when i changed that to $option the rest of the page was not parsed, so I guess that isn't the problem.
As I am not a coder and i pulled my hairs out for 2 nights now.
Time to get some help before i have none left.
Anyone can help me with this one?

It is not a real answer to my question, but I found another script that, with some small changes, works perfectly. So I'm happy.
<?php
$rss = new DOMDocument();
$rss->load('http://wordpress.org/news/feed/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
echo '<li>'.$title.'</li>';
}
?>
It is smaller and cleaner.
Thanks for your help #Marcus

PHP ReadFile Insert a code in readfile

I have a php readfile script, like this:
<?php
$contentFile = "http://google.com";
readfile( $contentFile );
?>`
I want to insert a code in a specific line in the output of the readfile.
Example:
<html>
{top_code}
{Code i want to insert here}
{bottom-code}
</html>
How can I make this possible?

You can't. readfile() streams whatever you're reading out to the user's browser. You could use the output buffering mechanism to capture that data instead, but then you might as well just use file_get_contents() instead and save yourself a few extra lines of code.
file_get_contents returns the requested file/url as a string. Then you use standard string or DOM operations to manipulate that 'page'.

This can do the job
$contentFile = "http://google.com";
$html = file_get_contents($contentFile);
$html = explode("\n",$html);
$line = $line_number - 1;
array_splice($html, $line, 0,"Burim Shala");
$html = implode("\n",$html);

I found this, to be the solution to the problem of RSS feeds using PHP without MYSQL
Thanks to http://bavotasan.com/2010/display-rss-feed-with-php/
$rss = new DOMDocument();
$rss->load('http://wordpress.org/news/feed/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
$description = $feed[$x]['desc'];
$date = date('l F d, Y', strtotime($feed[$x]['date']));
echo '<p><strong>'.$title.'</strong><br />';
echo '<small><em>Posted on '.$date.'</em></small></p>';
echo '<p>'.$description.'</p>';
}

From DOMDocument to CURL?

im using DOMDocuments to download an RSS feed into my PHP script, simple by:
$doc = new DOMDocument();
$doc->load($source);
I want to use instead of DOMDocument, CURL. How can change those 2 lines of code to make all my script compatible?. This is my complete script by the way:
<?php
//PUBLIC VARS
$arrFeeds = array();
$downItems = 0;
$time_taken = 0;
//*PUBLIC VARS
function getRSS($source) {
$start = microtime(true);
ini_set('default_socket_timeout', 1);
global $arrFeeds, $downItems, $time_taken;
$arrFeeds = array();
$doc = new DOMDocument();
$doc->load($source);
foreach ($doc->getElementsByTagName('item') as $node) {
$itemRSS = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue
);
array_push($arrFeeds, $itemRSS);
$downItems+=1;
}
$time_taken = microtime(true) - $start;
}
//getRSS("http://www.atm-mi.it/_layouts/atm/apps/PublishingRSS.aspx?web=388a6572-890f-4e0f-a3c7-a3dd463f7252&c=News%20Infomobilita");
//echo(strip_tags($arrFeeds[0]['title'])."<br><br>".$time_taken);
?>
Thanks for the help!

This ought to do it:
$ch = curl_init($source);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
$doc->loadXML($content);
Your mileage may vary, of course, and you might have to add more CURL options, but that's basic enough functionality to get it all started.

Use loadXML.
http://www.php.net/manual/en/domdocument.loadxml.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Alternative for DOMDocument() - php

Related

Why is new SimpleXMLElement causing a 500 error?

Simple Html Dom Scraping half the page

Illegal string offset 'date'

PHP ReadFile Insert a code in readfile

From DOMDocument to CURL?

Categories

Resources