Parse an xml document located on the internet and convert to json - php

I found http://www.ibm.com/developerworks/xml/library/x-xml2jsonphp/ , but I don't know how to use this code to get the xml from my web server. any ideas?

The simplest way is with file_get_contents()
$xmlString = file_get_contents('http://www...../file.xml');
If you want a SimpleXML object you can use simplexml_load_file()
$xml = simplexml_load_file('http://www...../file.xml');
Both these methods require allow_url_fopen to be enabled. If it isn't, you can use curl - this is more complicated but also gives you more flexibility.
$c = curl_init('http://www...../file.xml');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$xmlString = curl_exec($c);
$error = curl_error($c);
curl_close($c);
if ($error)
die('Error: ' . $error);

Related

Getting whole HTML element with PHP

I want to get the whole element <article> which represents 1 listing but it doesn't work. Can someone help me please?
containing the image + title + it's link + description
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$first_step = explode( '<article>' , $content );
$second_step = explode("</article>" , $first_step[3] );
echo $second_step[0];
?>
You should definitely be using curl for this type of requests.
function curl_download($url){
// is cURL installed?
if (!function_exists('curl_init')){
die('cURL is not installed!');
}
$ch = curl_init();
// URL to download
curl_setopt($ch, CURLOPT_URL, $url);
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "Set your user agent here...");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = retu rn, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
for best results for your question. Combine it with HTML Dom Parser
use it like:
// Find all images
foreach($output->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($output->find('a') as $element)
echo $element->href . '<br>';
Good Luck!
I'm not sure I get you right, But I guess you need a PHP DOM Parser. I suggest this one (This is a great PHP library to parser HTML codes)
Also you can get whole HTML code like this:
$url = 'http://www.polkmugshot.com/';
$html = file_get_html($url);
echo $html;
Probably a better way would be to parse the document and run some xpath queries over it afterwards, like so:
$url = 'http://www.polkmugshot.com/';
$xml = simplexml_load_file($url);
$articles = $xml->xpath("//articles");
foreach ($articles as $article) {
// do sth. useful here
}
Read about SimpleXML here.
extract the articles with DOMDocument. working example:
<?php
$url = 'http://www.polkmugshot.com/';
$content = file_get_contents($url);
$domd=#DOMDocument::loadHTML($content);
foreach($domd->getElementsByTagName("article") as $article){
var_dump($domd->saveHTML($article));
}
and as pointed out by #Guns , you'd better use curl, for several reasons:
1: file_get_contents will fail if allow_url_fopen is not set to true in php.ini
2: until php 5.5.0 (somewhere around there), file_get_contents kept reading from the connection until the connection was actually closed, which for many servers can be many seconds after all content is sent, while curl will only read until it reaches content-length HTTP header, which makes for much faster transfers (luckily this was fixed)
3: curl supports gzip and deflate compressed transfers, which again, makes for much faster transfer (when content is compressible, such as html), while file_get_contents will always transfer plain

PHP: read a php url as xml and parse it

i got a url like this one http://somedomain.com/frieventexport.php and the content of this url is a plain XML structure if I check out the source-code of the site:
How can I parse this URL and use it in PHP?
This script gives me an error… "Error loading XML".
<?php
$xml_url = "http://somedomain.com/frieventexport.php";
if (($response_xml_data = file_get_contents($xml_url))===false){
echo "Error fetching XML\n";
} else {
libxml_use_internal_errors(true);
$data = simplexml_load_string($response_xml_data);
if (!$data) {
echo "Error loading XML\n";
foreach(libxml_get_errors() as $error) {
echo "\t", $error->message;
}
} else {
print_r($data);
}
}
?>
One of the best options in my opinion is to use CURL to get the raw XML data from the url:
$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, "http://somedomain.com/frieventexport.php" );
$xml = curl_exec( $curl );
curl_close( $curl );
You can then use DOMDocument to parse the xml:
$document = new DOMDocument;
$document->loadXML( $xml );
I would also recommend using <![CDATA[]> tags in your XML. Please read the following:
What does <![CDATA[]]> in XML mean?
CDATA Sections in XML
More information about DOMDocument and usage
PHP.net DOMDocument documentation
W3Schools DOMDocument example
file_get_contents() is usually disabled if this is shared hosting, if not you can set allow_url_fopen in php.ini (however beware of the security risks). You can check the setting of this using php_info() or var_dump(ini_get('allow_url_fopen')) to show if it's allowed or not.
If you cannot do the above, you can use CURL to fetch external content
Try the following:
$url = 'xml-file.xml';
$xml = simplexml_load_file($url);
print_r($xml);
Try this :
$livescore_data = new SimpleXMLElement($file);

Display xml data in a webpage with php

I am trying to display the data of a xml parsed page which i get from a external source. which i got passing through some parameters like this:-
http://www.somewebsite.com/phpfile.php?vendor_key=xxx&checkin=2012-11-02&checkout=2012-11-05&city_id=5&guests=3
when i pass this parameters i got an xml result. now i want to display that xml data in a designer way on my webpage. so how can i do so. i am new to xml so dont know what this technology called if any body can tell me what this called so that can also help me.
Take a look at simplexml_load_string.
You can use curl or file_get_contents function to make HTTP request. Then after you can use DOM or SimpleXML to parse the response (XML) of requested URL.
If u have already XMl then try
echo $xml->asXML();
A full example
<?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, 'http://rss.news.yahoo.com/rss/topstories');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($curl);
if ($result === false) {
die('Error fetching data: ' . curl_error($curl));
}
curl_close ($curl);
//we can at this point echo the XML if you want
//echo $result;
//parse xml string into SimpleXML objects
$xml = simplexml_load_string($result);
if ($xml === false) {
die('Error parsing XML');
}
//now we can loop through the xml structure
foreach ($xml->channel->item as $item) {
print $item->title;
}

php parse exchange rate feed XML

I am trying to use the currentcy exchange rate feeds of the European Central Bank (ECB)
http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml
They have provided documentation on how to parse the xml but none of the options works for me: I checked that allow_url_fopen=On is set.
http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html
For instance, I used but it doesn't echo anything and it seems the $XML object is always empty.
<?php
//This is aPHP(5)script example on how eurofxref-daily.xml can be parsed
//Read eurofxref-daily.xml file in memory
//For the next command you will need the config option allow_url_fopen=On (default)
$XML=simplexml_load_file("http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml");
//the file is updated daily between 2.15 p.m. and 3.00 p.m. CET
foreach($XML->Cube->Cube->Cube as $rate){
//Output the value of 1EUR for a currency code
echo '1€='.$rate["rate"].' '.$rate["currency"].'<br/>';
//--------------------------------------------------
//Here you can add your code for inserting
//$rate["rate"] and $rate["currency"] into your database
//--------------------------------------------------
}
?>
Update:
As I am behind proxy at my test environment, I tried this but still I don't get to read the XML:
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_close ($ch);
return curl_exec($ch); }
$address = urlencode($address);
$data = curl("http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml");
$XML = simplexml_load_file($data);
var_dump($XML); -> returns boolean false
Please help me. Thanks!
I didn't find any relevant settings in php.ini. Check with phpinfo() if you have SimpleXML support and cURLsupport enabled. (You should have them both and especially SimpleXML since you're using it and it returns false, it doesn't complain about missing function.)
Proxy might be an issue here. See this and this answer. Using cURL could be an answer to your problem.
Here's one alternative foud here.
$url = file_get_contents('http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml');
$xml = new SimpleXMLElement($url) ;
//file put contents - same as fopen, wrote and close
//need to output "asXML" - simple xml returns an object based upon the raw xml
file_put_contents(dirname(__FILE__)."/loc.xml", $xml->asXML());
foreach($xml->Cube->Cube->Cube as $rate){
echo '1€='.$rate["rate"].' '.$rate["currency"].'<br/>';
}
This solution works for me:
$data = [];
$url = "http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist-90d.xml";
$xmlRaw = file_get_contents($url);
$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
$doc->loadXML($xmlRaw);
$node1 = $doc->getElementsByTagName('Cube')->item(0);
foreach ($node1->childNodes as $node2) {
$value = [];
foreach ($node2->childNodes as $node3) {
$value['date'] = $node2->getAttribute('time');
$value['currency'] = $node3->getAttribute('currency');
$value['rate'] = $node3->getAttribute('rate');
$data[] = $value;
unset($value);
}
}
echo "<pre"> . print_r($data) . "</pre>";

PHP Regex for IP to Location API

How would I use Regex to get the information on a IP to Location API
This is the API
http://ipinfodb.com/ip_query.php?ip=74.125.45.100
I would need to get the Country Name, Region/State, and City.
I tried this:
$ip = $_SERVER["REMOTE_ADDR"];
$contents = #file_get_contents('http://ipinfodb.com/ip_query.php?ip=' . $ip . '');
$pattern = "/<CountryName>(.*)<CountryName>/";
preg_match($pattern, $contents, $regex);
$regex = !empty($regex[1]) ? $regex[1] : "FAIL";
echo $regex;
When I do echo $regex I always get FAIL how can I fix this
As Aaron has suggested. Best not to reinvent the wheel so try parsing it with simplexml_load_string()
// Init the CURL
$curl = curl_init();
// Setup the curl settings
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 0);
// grab the XML file
$raw_xml = curl_exec($curl);
curl_close($curl);
// Setup the xml object
$xml = simplexml_load_string( $raw_xml );
You can now access any part of the $xml variable as an object, with that in regard here is an example of what you posted.
<Response>
<Ip>74.125.45.100</Ip>
<Status>OK</Status>
<CountryCode>US</CountryCode>
<CountryName>United States</CountryName>
<RegionCode>06</RegionCode>
<RegionName>California</RegionName>
<City>Mountain View</City>
<ZipPostalCode>94043</ZipPostalCode>
<Latitude>37.4192</Latitude>
<Longitude>-122.057</Longitude>
<Timezone>0</Timezone>
<Gmtoffset>0</Gmtoffset>
<Dstoffset>0</Dstoffset>
</Response>
Now after you have loaded this XML string into the simplexml_load_string() you can access the response's IP address like so.
$xml->IP;
simplexml_load_string() will transform well formed XML files into an object that you can manipulate. The only other thing I can say is go and try it out and play with it
EDIT:
Source
http://www.php.net/manual/en/function.simplexml-load-string.php
You really are better off using a XML parser to pull the information.
For example, this script will parse it into an array.
Regex really shouldn't be used to parse HTML or XML.
If you really need to use regular expressions, then you should correct the one you are using. "|<CountryName>([^<]*)</CountryName>|i" would work better.

Categories