Parsing XML feed in PHP

Parsing XML feed in PHP - php

<?php
$url='http://bart.gov/dev/eta/bart_eta.xml';
$c = curl_init($url);
curl_setopt($c, CURLOPT_MUTE, 1);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
$rawXML = curl_exec($c);
curl_close($c);
$fixedupXML = htmlspecialchars($rawXML);
foreach($fixedupXML->eta-> as $eta) {
echo $eta->destination;
}
?>
As a way to get introduced to PHP, I've decided to parse BART's XML feed and display it on my webpage. I managed (also via this site) to be able to fetch the data and preserve the XML tags. However, when I try to output the XML data, using what I found to be the simplest method, nothing happens.
foreach($fixedupXML->eta as $eta){
echo $eta->destination;
}
Am I not getting the nested elements right in the foreach loop?
Here is the BART XML feed http://www.bart.gov/dev/eta/bart_eta.xml
Thanks!

You may want to look at simplexml, which is a fantastic and really simple way to work with XML.
Here's a great example:
$xml = simplexml_load_file('http://bart.gov/dev/eta/bart_eta.xml');
Then you can run a print_r on $xml to see it's contents:
print_r($xml);
And you should be able to work with it from there :)
If you still need to use curl to get the feed data for some reason, you can feed the XML into simplexml like this:
$xml = simplexml_load_string($rawXML);

Related

Get Content from Web Pages with PHP

I am working on a small project to get information from several webpages based on the HTML Markup of the page, and I do not know where to start at all.
The basic idea is of getting the Title from <h1></h1>s, and content from the <p></p>s tags and other important information that is required.
I would have to setup each case from each source for it to work the way it needs. I believe the right method is using $_GET method with PHP. The goal of the project is to build a database of information.
What is the best method to grab the information which I need?

First of all: PHP's $_GET is not a method. As you can see in the documentation $_GET is simply an array initialized with the GET's parameters your web server received during the current query. As such it is not what you want to use for this kind of things.
What you should look into is cURL that allows you to compose even fairly complex query, send to the destination server and retrieve the response. For example for a POST request you could do something like:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.mysite.com/tester.phtml");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,
"postvar1=value1&postvar2=value2&postvar3=value3");
// in real life you should use something like:
// curl_setopt($ch, CURLOPT_POSTFIELDS,
// http_build_query(array('postvar1' => 'value1')));
// receive server response ...
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
curl_close ($ch);
Source
Of course if you don't have to do any complex query but simple GET requests you can go with the PHP function file_get_contents
After you received the web page content you have to parse it. IMHO the best way to do this is by using PHP's DOM functions. How to use them should really be another question, but you can find tons of example without much effort.

<?php
$remote = file_get_contents('http://www.remote_website.html');
$doc = new DomDocument();
$file = #$doc->loadHTML($remote);
$cells = #$doc->getElementsByTagName('h1');
foreach($cells AS $cell)
{
$titles[] = $cell->nodeValue ;
}
$cells = #$doc->getElementsByTagName('p');
foreach($cells AS $cell)
{
$content[] = $cell->nodeValue ;
}
?>

You can get the HTML source of a page with:
<?php
$html= file_get_contents('http://www.example.com/');
echo $html;
?>
Then once you ahve the structure of the page you get the request tag with substr() and strpos()

Page not being converted into xml format

I am grabbing a page and then converting it into an xml format, the function im using is below
public function getXML($url){
$ch = curl_init();
//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$xml = simplexml_load_string($response);
return $xml;
}
print_r($curl->getXML("http://www.amazon.co.uk/gp/offer-listing/0292783760/ref=tmm_pap_new_olp_sr?ie=UTF8&condition=used"));
After trying different urls nothing is returned, the page loads fine so the problem is with the line $xml = simplexml_load_string($response);
What could be wrong with this code?

Not understanding exactly what you're up to, it looks like you're trying to scrape the Amazon web page? If I pull up that URL in my browser, it's not listed as XHTML in the headers or document itself--I suspect it's not. I don't think simplexml can handle that.
(Does CURL do the conversion to XML for you? I don't think so but I'm not a master of all things CURL. If so, it might be an incompatability between CURL's output and what simplxml--which is fairly limited--will take in).
You might try working with DOMDocument instead, although my PHP could be a bit out of date--there may be better utilities these days.
A quick googling brought up this tutorial
<?php
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
?>
I don't think this is a complete answer, but it was a bit much for a comment; so take it with a grain of salt and a healthy serving of doubt. I hope it inspires some ideas.

Help with XML response from PHP

I am new to PHP, and I am using it to POST data from an iPhone. I have written a basic code to get data from the iPhone, post it to the php script on my server, and then using PHP, send that data to another webserver. However, the webserver is returning a response in XML, and since I am a newbie to PHP, I need help with it.
My code to send data:
<?php
$ch = curl_init("http://api.online-convert.com/queue-insert");
$count = $_POST['count'];
$request["queue"] = file_get_contents($count);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $request);
$response = curl_exec($ch);
curl_close ($ch);
echo $response;
?>
I know I need to parse the XML response, but I have no idea how to do that. The XML response would be something like this:
<?xml version="1.0" encoding="utf-8"?>
<queue-answer>
<status>
<code>0</code>
<message>Successfully inserted job into queue.</message>
</status>
<params>
<downloadUrl>http://www.online-convert.com/result/07d6c1491bb5929acd71c531122d2906</downloadUrl>
<hash>07d6c1491bb5929acd71c531122d2906</hash>
</params>
</queue-answer>

You're probably looking for SimpleXML or DOMDocument.
SimpleXML
To load data from a string into an object: simplexml_load_string().
DOMDocument
To create one from a string: DOMDocument->loadXML().

Read about SimpleXML: http://php.net/manual/en/book.simplexml.php

Google is your friend!
The built-in PHP XML Parser is your best bet.

Parsing XML response from CURL - confused

Using CURL, I send an xml file to a server and receive a 200 response and reciprocal XML file from said server.
$response = curl_exec($session);
So far so good. If I do print_r($response) I see that the url i need is indeed inside $response.
The question i have is how do i parse it out? I try variations of the following but nothing seems to work:
$xml = new SimpleXMLElement($response);
Pointer in the right direction would be great.
Thanks!

You need to set the right curl options. It looks like the header information is being included in the response data, which of course makes the response invalid XML. You can see the curl options here:
http://www.php.net/manual/en/function.curl-setopt.php
You'll want to turn off including the headers like this:
curl_setopt($ch, CURLOPT_HEADER, false);

You need use the following structure:
$xml = new SimpleXMLElement($response);
echo $xml->movie[0]->plot;
//Or
$xml = simplexml_load_file($file, 'SimpleXMLElement', LIBXML_NOCDATA);
Where movie is a node from yor xml structure.

Loading a remote xml page with file_get_contents()

I have seen some questions similar to this on the internet, none with an answer.
I want to return the source of a remote XML page into a string. The remote XML page, for the purposes of this question, is:
http://www.test.com/foo.xml
In a regular webbrowser, I can view the page and the source is an XML document. When I use file_get_contents('http://www.test.com/foo.xml'), however, it returns a string with the corresponding URL.
Is there to retrieve the XML component? I don't care if it uses file_get_contents or not, just something that will work.

You need to have allow_url_fopen set in your server for this to work.
If you don´t, then you can use this function as a replacement:
<?php
function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);
if ($contents) return $contents;
else return FALSE;
}
?>
Borrowed from here.

That seems odd. Does file_get_contents() return any valid data for other sites (not only XML)? An URL can only be used as the filename parameter if the fopen-wrappers has been enabled (which they are by default).
I'm guessing you're going to process the retrieved XML later on - then you should be able to load it into SimpleXml directly using the simplexml_load _file().
try {
$xml = simplexml_load_file('http://www.test.com/foo.xml');
print_r($xml);
} ...
I recommend using SimpleXML for reading XML-files, it's very easy to use.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing XML feed in PHP - php

Related

Get Content from Web Pages with PHP

Page not being converted into xml format

Help with XML response from PHP

Parsing XML response from CURL - confused

Loading a remote xml page with file_get_contents()

Categories

Resources