How extract XML data using PHP - php
I am required to extract the information at a particular "area" of this large collection of xml. But i'm not familiar with extracting xml. I've looked through the site and tried various ways but all i get back is "Error in my_thread_global_end(): 1 threads didn't exit"
Here's the url of the xml i'm getting my data from:
ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml
I would to retrieve all the 7 forecast-period located only for the Swans Hill "area".
Help please
I agree that using php's simple xml parser is the way to go with this one.
You can make your life easy here using the xpath method of extracting data from the xml.
There's an xpath tutorial here: http://www.w3schools.com/xpath/
And php documentation for it here: http://www.php.net/manual/en/simplexmlelement.xpath.php
Try this out
<?php
/*
Get the file with CURL
$curl_handle=curl_init();
curl_setopt($curl_handle,CURLOPT_URL,'ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml');
curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,true);
$xml_data = curl_exec($curl_handle);
curl_close($curl_handle);
*/
/*
Open the file locally
*/
$xml_data = file_get_contents("weather.xml");
$xml = simplexml_load_string($xml_data);
$result = $xml->xpath("//area[#description='Swan Hill']/forecast-period");
date_default_timezone_set('America/New_York');
foreach ($result as $day) {
//print_r($day);
$day_of_the_week = date("l", strtotime($day["start-time-local"])); //start-time-local is an attribute of a result, so use the [] syntax
$forecast = $day->text; //text is a child node, so use the -> syntax
printf("%s: %s\n", $day_of_the_week, $forecast);
}
?>
EDIT More illustrative example
Related
Loading a Search and Retrieve via URL (SRU) in php with simplexml_load_string returns an empty object
Im trying to load search result from an library api using Search and Retrieve via URL (SRU) at : https://data.norge.no/data/bibsys/bibsys-bibliotekbase-bibliografiske-data-sru If you see the search result links there, its looks pretty much like XML but when i try like i have before with xml using the code below, it just returns a empty object, SimpleXMLElement {#546} whats going on here? My php function in my laravel project: public function bokId($bokid) { $apiUrl = "http://sru.bibsys.no/search/biblio?version=1.2&operation=searchRetrieve&startRecord=1&maximumRecords=10&query=ibsen&recordSchema=marcxchange"; $filename = "bok.xml"; $xmlfile = file_get_contents($apiUrl); file_put_contents($filename, $xmlfile); // xml file is saved. $fileXml = simplexml_load_string($xmlfile); dd($fileXml); } If i do: dd($xmlfile); instead, it echoes out like this: Making me very confused that i cannot get an object to work with. Code i present have worked fine before.
It may be that the data your being provided ha changed format, but the data is still there and you can still use it. The main problem with using something like dd() is that it doesn't work well with SimpleXMLElements, it tends to have it's own idea of what you want to see of what data there is. In this case the namespaces are the usual problem. But if you look at the following code you can see a quick way of getting the data from a specific namespace, which you can then easily access as normal. In this code I use ->children("srw", true) to say fetch all child elements that are in the namespace srw (the second argument indicates that this is the prefix and not the URL)... $apiUrl = "http://sru.bibsys.no/search/biblio?version=1.2&operation=searchRetrieve&startRecord=1&maximumRecords=10&query=ibsen&recordSchema=marcxchange"; $filename = "bok.xml"; $xmlfile = file_get_contents($apiUrl); file_put_contents($filename, $xmlfile); // xml file is saved. $fileXml = simplexml_load_string($xmlfile); foreach ( $fileXml->children("srw", true)->records->record as $record) { echo "recordIdentifier=".$record->recordIdentifier.PHP_EOL; } This outputs... recordIdentifier=792012771 recordIdentifier=941956423 recordIdentifier=941956466 recordIdentifier=950546232 recordIdentifier=802109055 recordIdentifier=910941041 recordIdentifier=940589451 recordIdentifier=951721941 recordIdentifier=080703852 recordIdentifier=011800283 As I'm not sure which data you want to retrieve as the title, I just wanted to show the idea of how to fetch data when you have a list of possibilities. In this example I'm using XPath to look in each <srw:record> element and find the <marc:datafield tag="100"...> element and in that the <marc:subfield code="a"> element. This is done using //marc:datafield[#tag='100']/marc:subfield[#code='a']. You may need to adjust the #tag= bit to the datafield your after and the #code= to point to the subfield your after. $fileXml = simplexml_load_string($xmlfile); $fileXml->registerXPathNamespace("marc","info:lc/xmlns/marcxchange-v1"); foreach ( $fileXml->children("srw", true)->records->record as $record) { echo "recordIdentifier=".$record->recordIdentifier.PHP_EOL; $data = $record->xpath("//marc:datafield[#tag='100']/marc:subfield[#code='a']"); $subData=$data[0]->children("marc", true); echo "Data=".(string)$data[0].PHP_EOL; }
how to display SimpleXMLElement with php
Hi I have never used xml but need to now, so I am trying to quickly learn but struggling with the structure I think. This is just to display the weather at the top of someones website. I want to display Melbourne weather using this xml link ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml Basically I am trying get Melbourne forecast for 3 days (what ever just something that works) there is a forecast-period array [0] to [6] I used this print_r to view the structure: $url = "linkhere"; $xml = simplexml_load_file($url); echo "<pre>"; print_r($xml); and tried this just to get something: $url = "linkhere"; $xml = simplexml_load_file($url); $data = (string) $xml->forecast->area[52]->description; echo $data; Which gave me nothing (expected 'Melbourne'), obviously I need to learn and I am but if someone could help that would be great.
Because description is an attribute of <area>, you need to use $data = (string) $xml->forecast->area[52]['description']; I also wouldn't rely on Melbourne being the 52nd area node (though this is really up to the data maintainers). I'd go by its aac attribute as this appears to be unique, eg $search = $xml->xpath('forecast/area[#aac="VIC_PT042"]'); if (count($search)) { $melbourne = $search[0]; echo $melbourne['description']; }
This is a working example for you: <?php $forecastdata = simplexml_load_file('ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml','SimpleXMLElement',LIBXML_NOCDATA); foreach($forecastdata->forecast->area as $singleregion) { $area = $singleregion['description']; $weather = $singleregion->{'forecast-period'}->text; echo $area.': '.$weather.'<hr />'; } ?> You can edit the aforementioned example to extract the tags and attributes you want. Always remember that a good practice to understand the structure of your XML object is printing out its content using, for instance, print_r In the specific case of the XML you proposed, cities are specified through attributes (description). For this reason you have to read also those attributes using ['attribute name'] (see here for more information). Notice also that the tag {'forecast-period'} is wrapped in curly brackets cause it contains a hyphen, and otherwise it wouldn generate an error.
SimpleXML feed showing blank arrays - how do I get the content out?
I'm trying to get the image out of a rss feed using a simpleXML feed and parsing the data out via an array and back into the foreach loop... in the source code the array for [description] is shown as blank though I've managed to pull it out using another loop, however, I can't for the life of me work out how to pull in the next array, and subsequently the image for each post! help? you can view my progress here: http://dev.thebarnagency.co.uk/tfolphp.php here's the original feed: feed://feeds.feedburner.com/TheFutureOfLuxury?format=xml $xml_feed_url = 'http://feeds.feedburner.com/TheFutureOfLuxury?format=xml'; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $xml_feed_url); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $xml = curl_exec($ch); curl_close($ch); function produce_XML_object_tree($raw_XML) { libxml_use_internal_errors(true); try { $xmlTree = new SimpleXMLElement($raw_XML); } catch (Exception $e) { // Something went wrong. $error_message = 'SimpleXMLElement threw an exception.'; foreach(libxml_get_errors() as $error_line) { $error_message .= "\t" . $error_line->message; } trigger_error($error_message); return false; } return $xmlTree; } $feed = produce_XML_object_tree($xml); print_r($feed); foreach ($feed->channel->item as $item) { // $desc = $item->description; echo 'link<br>'; foreach ($item->description as $desc) { echo $desc;` } } thanks
Can you use wp_remote_get( $url, $args ); Which i get from here http://dynamicweblab.com/2012/09/10-useful-wordpress-functions-to-reduce-your-development-time Also get more details about this function http://codex.wordpress.org/Function_API/wp_remote_get Hope this will help
I'm not entirely clear what your problem is here - the code you provided appears to work fine. You mention "the image for each post", but I can't see any images specifically labelled in the XML. What I can see is that inside the HTML in the content node of the XML, there is often an <img> tag. As far as the XML document is concerned, this entire blob of HTML is just one string delimited with the special tokens <![CDATA[ and ]]>. If you get this string into a PHP variable (using (string)$item->content you can then find a way of extracting the <img> tag from inside it - but note that the HTML is unlikely to be valid XML. The other thing to mention is that SimpleXML is not, as you repeatedly refer to it, an array - it is an object, and a particularly magic one at that. Everything you do to the SimpleXML object - foreach ( $nodeList as $node ), isset($node), count($nodeList), $node->childNode, $node['attribute'], etc - is actually a function call, often returning another SimpleXML object. It's designed for convenience, so in many cases writing what seems natural will be more helpful than inspecting the object. For instance, since each item has only one description you don't need the inner foreach loop - the following will all have the same effect: foreach ($item->description as $desc) { echo $desc; } (loop over all child elements with tag name description) echo $item->description[0]; (access the first description child node specifically) echo $item->description; (access the first/only description child node implicitly; this is why you can write $feed->channel->item and it would still work if there was a second channel element, it would just ignore it)
I had an issue where simplexml_load_file was returning some array sections blank as well, even though they contained data when you view the source url directly. Turns out the data was there, but it was CDATA so it was not properly being displayed. Is this perhaps the same issue op was having? Anyways my solution was this: So initially I used this: $feed = simplexml_load_file($rss_url); And I got empty description back like this: [description] => SimpleXMLElement Object ( ) But then I found this solution in comments of PHP.net site, saying I needed to use LIBXML_NOCDATA: https://www.php.net/manual/en/function.simplexml-load-file.php $feed = simplexml_load_file($rss_url, "SimpleXMLElement", LIBXML_NOCDATA); After making this change, I got description like this: [description] => My description text!
Use PHP to load XML Data into Oracle
I'm fairly new to php although I've been programming for a couple years. I'm working on a project and the end goal is to load certain elements of an xml file into an oracle table on a nightly basis. I have a script which runs nightly and saves a the file on my local machine. I've searched endlessly for answers but have been unsuccessful. Here is an aggregated example of the xml file. <?xml version="1.0" encoding="UTF-8" ?> <Report account="7869" start_time="2012-02-23T00:00:00+00:00" end_time="2012-02-23T15:27:59+00:00" user="twilson" more_sessions="false"> <Session id="ID742247692" realTimeID="4306650378"> <Visitor id="5390643113837"> <ip>128.XXX.XX.XX</ip> <agent>MSIE 8.0</agent> </Visitor> </Session> <Session id="ID742247695" realTimeID="4306650379"> <Visitor id="7110455516320"> <ip>173.XX.XX.XXX</ip> <agent>Chrome 17.0.963.56</agent> </Visitor> </Session> </Report> One thing to note is that the xml file will contain several objects which I will need to load into my table and the above example would just be for two rows of data. I'm familiar with the whole process of connecting and loading data into oracle and have setup similar scripts which perform ETL of txt. and csv. files using php. Unfortunately for me in this case the data is stored in xml. The approach I've taken when loading a csv. file is to load the data into an array and proceed from there. I'm pretty certain that I can use something similar and perhaps create variable for each or something similar but am not really too sure how to do that with an xml. file. $xml = simplexml_load_file('C:/Dev/report.xml'); echo $xml->Report->Session->Visitor->agent; In the above code i'm trying to just return the agent associated with each visitor. This returns an error 'Trying to get property of non-object in C:\PHP\chatTest.php on line 11' The end result would be for me to load the data into a table similar to the example I provided would be to load two rows into my table which would look similar to below however I think I can handle that if i'm able to get the data into an array or something similar. IP|AGENT 128.XXX.XX.XX MSIE 8.0 173.XX.XX.XXX Chrome 17.0.963.56 Any help would be greatly appreciated. Revised Code: $doc = new DOMDocument(); $doc->load( 'C:/Dev/report.xml' ); $sessions = $doc->getElementsByTagName( "Session" ); foreach( $sessions as $session ) { $visitors = $session->getElementsByTagName( "Visitor" ); foreach( $visitors as $visitor ) $sessionid = $session->getAttribute( 'realTimeID' ); { $ips = $visitor->getElementsByTagName( "ip" ); $ip = $ips->item(0)->nodeValue; $agents = $visitor->getElementsByTagName( "agent" ); $agent = $ips->item(0)->nodeValue; echo "$sessionid- $ip- $agent\n"; }} ?>
The -> operator in PHP means that you are trying to invoke a field or method on an object. Since Report is not a method within $xml, you are receiving the error that you are trying to invoke a property on a non-object. You can try something like this (don't know if it works, didn't test it and haven't written PHP for a long time, but you can google it): $doc = new DOMDocument(); $doc->loadXML($content); foreach ($doc->getElementsByTagName('Session') as $node) { $agent = $node->getElementsByTagName('Visitor')->item(0)->getElementsByTagName('agent')->item(0)->nodeValue; } edit: Adding stuff to an array in PHP is easy as this: $arr = array(); $arr[] = "some data"; $arr[] = "some more data"; The PHP arrays should be seen as a list, since they can be resized on the fly.
I was able to figure this out using simplexml_load_file rather than the DOM approach. Although DOM works after modifying the Leon's suggestion the approach below is what I would suggest. $xml_object = simplexml_load_file('C:/Dev/report.xml'); foreach($xml_object->Session as $session) { foreach($session->Visitor as $visitor) { $ip = $visitor->ip; $agent = $visitor->agent; } echo $ip.','.$agent."\n"; }
Parsing XML with PHP (simplexml)
Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this. The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?! $file = fopen("compress.zlib://$url", 'r'); $xmlstr = file_get_contents($url); $xml = new SimpleXMLElement($url,null,true); foreach($xml as $name) { echo "{$name->awCat}\r\n"; } Many, many thanks in advance, Chris PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer: First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all) $merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE); To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site): foreach ($merchantProductFeed->merchant->prod as $prod) { echo $prod->cat->awCat , PHP_EOL; } or you can use an XPath query to get at the wanted elements directly $xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE); foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) { echo $awCat, PHP_EOL; } Live Demo Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them. Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML. For additional examples see A simple program to CRUD node and node values of xml file and PHP Manual - SimpleXml Basic Examples
Try this... $url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/"; $zd = gzopen($url, "r"); $data = gzread($zd, 1000000); gzclose($zd); if ($data !== false) { $xml = simplexml_load_string($data); foreach ($xml->merchant->prod as $pr) { echo $pr->cat->awCat . "<br>"; } }
<?php $xmlstr = file_get_contents("compress.zlib://$url"); $xml = simplexml_load_string($xmlstr); // you can transverse the xml tree however you want foreach ($xml->merchant->prod as $line) { // $line->cat->awCat -> you can use this } more information here
Use print_r($xml) to see the structure of the parsed XML feed. Then it becomes obvious how you would traverse it: foreach ($xml->merchant->prod as $prod) { print $prod->pId; print $prod->text->name; print $prod->cat->awCat; # <-- which is what you wanted print $prod->price->buynow; }
$url = 'you url here'; $f = gzopen ($url, 'r'); $xml = new SimpleXMLElement (fread ($f, 1000000)); foreach($xml->xpath ('//prod') as $name) { echo (string) $name->cat->awCatId, "\r\n"; }