I'm trying to create a small application that will simply read an RSS feed and then layout the info on the page.
All the instructions I find make this seem simplistic but for some reason it just isn't working. I have the following
include_once(ABSPATH.WPINC.'/rss.php');
$feed = file_get_contents('http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int');
$items = simplexml_load_file($feed);
That's it, it then breaks on the third line with the following error
Error: [2] simplexml_load_file() [function.simplexml-load-file]: I/O warning : failed to load external entity "<?xml version="1.0" encoding="UTF-8"?> <?xm
The rest of the XML file is shown.
I have turned on allow_url_fopen and allow_url_include in my settings but still nothing.
I've tried multiple feeds that all end up with the same result?
I'm going mad here
simplexml_load_file() interprets an XML file (either a file on your disk or a URL) into an object. What you have in $feed is a string.
You have two options:
Use file_get_contents() to get the XML feed as a string, and use e simplexml_load_string():
$feed = file_get_contents('...');
$items = simplexml_load_string($feed);
Load the XML feed directly using simplexml_load_file():
$items = simplexml_load_file('...');
You can also load the content with cURL, if file_get_contents insn't enabled on your server.
Example:
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,"http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$output = curl_exec($ch);
curl_close($ch);
$items = simplexml_load_string($output);
this also works:
$url = "http://www.some-url";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xmlresponse = curl_exec($ch);
$xml=simplexml_load_string($xmlresponse);
then I just run a forloop to grab the stuff from the nodes.
like this:`
for($i = 0; $i < 20; $i++) {
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$desc = $xml->channel->item[$i]->description;
$html .="<div><h3>$title</h3>$link<br />$desc</div><hr>";
}
echo $html;
***note that your node names will differ, obviously..and your HTML might be structured differently...also your loop might be set to higher or lower amount of results.
$url = 'http://legis.senado.leg.br/dadosabertos/materia/tramitando';
$xml = file_get_contents("xml->{$url}");
$xml = simplexml_load_file($url);
Related
I'm trying to create a small application that will simply read an RSS feed and then layout the info on the page.
All the instructions I find make this seem simplistic but for some reason it just isn't working. I have the following
include_once(ABSPATH.WPINC.'/rss.php');
$feed = file_get_contents('http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int');
$items = simplexml_load_file($feed);
That's it, it then breaks on the third line with the following error
Error: [2] simplexml_load_file() [function.simplexml-load-file]: I/O warning : failed to load external entity "<?xml version="1.0" encoding="UTF-8"?> <?xm
The rest of the XML file is shown.
I have turned on allow_url_fopen and allow_url_include in my settings but still nothing.
I've tried multiple feeds that all end up with the same result?
I'm going mad here
simplexml_load_file() interprets an XML file (either a file on your disk or a URL) into an object. What you have in $feed is a string.
You have two options:
Use file_get_contents() to get the XML feed as a string, and use e simplexml_load_string():
$feed = file_get_contents('...');
$items = simplexml_load_string($feed);
Load the XML feed directly using simplexml_load_file():
$items = simplexml_load_file('...');
You can also load the content with cURL, if file_get_contents insn't enabled on your server.
Example:
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,"http://feeds.bbci.co.uk/sport/0/football/rss.xml?edition=int");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$output = curl_exec($ch);
curl_close($ch);
$items = simplexml_load_string($output);
this also works:
$url = "http://www.some-url";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xmlresponse = curl_exec($ch);
$xml=simplexml_load_string($xmlresponse);
then I just run a forloop to grab the stuff from the nodes.
like this:`
for($i = 0; $i < 20; $i++) {
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$desc = $xml->channel->item[$i]->description;
$html .="<div><h3>$title</h3>$link<br />$desc</div><hr>";
}
echo $html;
***note that your node names will differ, obviously..and your HTML might be structured differently...also your loop might be set to higher or lower amount of results.
$url = 'http://legis.senado.leg.br/dadosabertos/materia/tramitando';
$xml = file_get_contents("xml->{$url}");
$xml = simplexml_load_file($url);
i got a url like this one http://somedomain.com/frieventexport.php and the content of this url is a plain XML structure if I check out the source-code of the site:
How can I parse this URL and use it in PHP?
This script gives me an error… "Error loading XML".
<?php
$xml_url = "http://somedomain.com/frieventexport.php";
if (($response_xml_data = file_get_contents($xml_url))===false){
echo "Error fetching XML\n";
} else {
libxml_use_internal_errors(true);
$data = simplexml_load_string($response_xml_data);
if (!$data) {
echo "Error loading XML\n";
foreach(libxml_get_errors() as $error) {
echo "\t", $error->message;
}
} else {
print_r($data);
}
}
?>
One of the best options in my opinion is to use CURL to get the raw XML data from the url:
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, "http://somedomain.com/frieventexport.php" );
$xml = curl_exec( $curl );
curl_close( $curl );
You can then use DOMDocument to parse the xml:
$document = new DOMDocument;
$document->loadXML( $xml );
I would also recommend using <![CDATA[]> tags in your XML. Please read the following:
What does <![CDATA[]]> in XML mean?
CDATA Sections in XML
More information about DOMDocument and usage
PHP.net DOMDocument documentation
W3Schools DOMDocument example
file_get_contents() is usually disabled if this is shared hosting, if not you can set allow_url_fopen in php.ini (however beware of the security risks). You can check the setting of this using php_info() or var_dump(ini_get('allow_url_fopen')) to show if it's allowed or not.
If you cannot do the above, you can use CURL to fetch external content
Try the following:
$url = 'xml-file.xml';
$xml = simplexml_load_file($url);
print_r($xml);
Try this :
$livescore_data = new SimpleXMLElement($file);
I have this code that will retrieve every link in the $curl_scrapped_page:
require_once ('simple_html_dom.php');
$des_array = array();
$url = 'http://citeseerx.ist.psu.edu/search?q=mean&t=doc&sort=rlv';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);
Then I want to get abstract for each of link (on the page of that link) I scrapped. (I also get other things like title, description and so on, but the problem only lies on this abstract):
foreach ($html->find('div.result h3 a') as $des) {
$des2 = 'http://citeseerx.ist.psu.edu' . $des->href;
$ch = curl_init($des2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page2 = curl_exec($ch);
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page2);//line 72
libxml_use_internal_errors(false);
$xpath2 = new DomXPath($dom);
$thing = $xpath2->query('//p[preceding::h3[preceding::div]]')->item(1)->textContent; //line 75
array_push($des_array, $thing);
}
curl_close ($ch);
This is the display code:
for ($i = 0; $i < 10; $i++) {
echo $des_array[$i];
}
When I checked it on my browser, it gave me this, thrice:
Warning: DOMDocument::loadHTML(): Empty string supplied as input in C:\xampp\htdocs\MSP\Citeseerx.php on line 72
Notice: Trying to get property of non-object in C:\xampp\htdocs\MSP\Citeseerx.php on line 75
I realised I pushed an empty string to the $des_array. So I tried this:
if (empty($thing)){
array_push($des_array,'');
}
else{
array_push($des_array, $thing);
}
And this: if ($thing!=''){..}.
It still gave me that error.
What should I do?
Thanks..
curl_exec() may return false. In that case check with curl_error() what's the error. For example if the href attribute does not begin with / you will pass invalid url to the curl_init function. Also you may use curl_info() to get more information about the server response
Actually the $curl_scraped_page should be an handle for an open file not a variable since you are returning the transfer as a. Binary it should be read to file you can't pass to a varible since it is not a string
I am trying to use the currentcy exchange rate feeds of the European Central Bank (ECB)
http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml
They have provided documentation on how to parse the xml but none of the options works for me: I checked that allow_url_fopen=On is set.
http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html
For instance, I used but it doesn't echo anything and it seems the $XML object is always empty.
<?php
//This is aPHP(5)script example on how eurofxref-daily.xml can be parsed
//Read eurofxref-daily.xml file in memory
//For the next command you will need the config option allow_url_fopen=On (default)
$XML=simplexml_load_file("http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml");
//the file is updated daily between 2.15 p.m. and 3.00 p.m. CET
foreach($XML->Cube->Cube->Cube as $rate){
//Output the value of 1EUR for a currency code
echo '1€='.$rate["rate"].' '.$rate["currency"].'<br/>';
//--------------------------------------------------
//Here you can add your code for inserting
//$rate["rate"] and $rate["currency"] into your database
//--------------------------------------------------
}
?>
Update:
As I am behind proxy at my test environment, I tried this but still I don't get to read the XML:
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_close ($ch);
return curl_exec($ch); }
$address = urlencode($address);
$data = curl("http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml");
$XML = simplexml_load_file($data);
var_dump($XML); -> returns boolean false
Please help me. Thanks!
I didn't find any relevant settings in php.ini. Check with phpinfo() if you have SimpleXML support and cURLsupport enabled. (You should have them both and especially SimpleXML since you're using it and it returns false, it doesn't complain about missing function.)
Proxy might be an issue here. See this and this answer. Using cURL could be an answer to your problem.
Here's one alternative foud here.
$url = file_get_contents('http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml');
$xml = new SimpleXMLElement($url) ;
//file put contents - same as fopen, wrote and close
//need to output "asXML" - simple xml returns an object based upon the raw xml
file_put_contents(dirname(__FILE__)."/loc.xml", $xml->asXML());
foreach($xml->Cube->Cube->Cube as $rate){
echo '1€='.$rate["rate"].' '.$rate["currency"].'<br/>';
}
This solution works for me:
$data = [];
$url = "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist-90d.xml";
$xmlRaw = file_get_contents($url);
$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
$doc->loadXML($xmlRaw);
$node1 = $doc->getElementsByTagName('Cube')->item(0);
foreach ($node1->childNodes as $node2) {
$value = [];
foreach ($node2->childNodes as $node3) {
$value['date'] = $node2->getAttribute('time');
$value['currency'] = $node3->getAttribute('currency');
$value['rate'] = $node3->getAttribute('rate');
$data[] = $value;
unset($value);
}
}
echo "<pre"> . print_r($data) . "</pre>";
i just want to get the name of 'channel' tag i.e. CHANNEL...the script works fine when i use it to parse the rss from Google..............but when i use it for some other provider it gives an output '#text' instead of giving 'channel' which is the intended output.......the following is my script plz help me out.
$url = 'http://ibnlive.in.com/ibnrss/rss/sports/cricket.xml';
$get = perform_curl($url);
$xml = new DOMDocument();
$xml -> loadXML($get['remote_content']);
$fetch = $xml -> documentElement;
$gettitle = $fetch -> firstChild -> nodeName;
echo $gettitle;
function perform_curl($rss_feed_provider_url){
$url = $rss_feed_provider_url;
$curl_handle = curl_init();
// Do we have a cURL session?
if ($curl_handle) {
// Set the required CURL options that we need.
// Set the URL option.
curl_setopt($curl_handle, CURLOPT_URL, $url);
// Set the HEADER option. We don't want the HTTP headers in the output.
curl_setopt($curl_handle, CURLOPT_HEADER, false);
// Set the FOLLOWLOCATION option. We will follow if location header is present.
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
// Instead of using WRITEFUNCTION callbacks, we are going to receive the remote contents as a return value for the curl_exec function.
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
// Try to fetch the remote URL contents.
// This function will block until the contents are received.
$remote_contents = curl_exec($curl_handle);
// Do the cleanup of CURL.
curl_close($curl_handle);
$remote_contents = utf8_encode($remote_contents);
$handle = #simplexml_load_string($remote_contents);
$return_result = array();
if(is_object($handle)){
$return_result['handle'] = true;
$return_result['remote_content'] = $remote_contents;
return $return_result;
}
else{
$return_result['handle'] = false;
$return_result['content_error'] = 'INVALID RSS SOURCE, PLEASE CHECK IF THE SOURCE IS A VALID XML DOCUMENT.';
return $return_result;
}
} // End of if ($curl_handle)
else{
$return_result['curl_error'] = 'CURL INITIALIZATION FAILED.';
return false;
}
}
php
it gives an output '#text' instead of giving 'channel' which is the intended output it happens because the $fetch -> firstChild -> nodeType is 3, which is a TEXT_NODE or just some text. You could select channel by
echo $fetch->getElementsByTagName('channel')->item(0)->nodeName;
and
$gettitle = $fetch -> firstChild -> nodeValue;
var_dump($gettitle);
gives you
string(5) "
"
or spaces and a new line symbol which happens to appear between the xml tags due to formatting.
ps: and RSS feed by your link fails validation at http://validator.w3.org/feed/
Take a look at the XML - it's been pretty printed with whitespace so it is being parsed correctly. The first child of the root node is a text node. I'd suggest using SimpleXML if you want an easier time of it, or use XPath queries on your DomDocument to obtain the tags of interest.
Here's how you'd use SimpleXML
$xml = new SimpleXMLElement($get['remote_content']);
print $xml->channel[0]->title;