Parsing XML with PHP (simplexml) - php
Firstly, may I point out that I am a newcomer to all things PHP so apologies if anything here is unclear and I'm afraid the more layman the response the better. I've been having real trouble parsing an xml file in to php to then populate an HTML table for my website. At the moment, I have been able to get the full xml feed in to a string which I can then echo and view and all seems well. I then thought I would be able to use simplexml to pick out specific elements and print their content but have been unable to do this.
The xml feed will be constantly changing (structure remaining the same) and is in compressed format. From various sources I've identified the following commands to get my feed in to the right format within a string although I am still unable to print specific elements. I've tried every combination without any luck and suspect I may be barking up the wrong tree. Could someone please point me in the right direction?!
$file = fopen("compress.zlib://$url", 'r');
$xmlstr = file_get_contents($url);
$xml = new SimpleXMLElement($url,null,true);
foreach($xml as $name) {
echo "{$name->awCat}\r\n";
}
Many, many thanks in advance,
Chris
PS The actual feed
Since no one followed my closevote, I think I can just as well put my own comments as an answer:
First of all, SimpleXml can load URIs directly and it can do so with stream wrappers, so your three calls in the beginning can be shortened to (note that you are not using $file at all)
$merchantProductFeed = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
To get the values you can either use the implicit SimpleXml API and drill down to the wanted elements (like shown multiple times elsewhere on the site):
foreach ($merchantProductFeed->merchant->prod as $prod) {
echo $prod->cat->awCat , PHP_EOL;
}
or you can use an XPath query to get at the wanted elements directly
$xml = new SimpleXMLElement("compress.zlib://$url", null, TRUE);
foreach ($xml->xpath('/merchantProductFeed/merchant/prod/cat/awCat') as $awCat) {
echo $awCat, PHP_EOL;
}
Live Demo
Note that fetching all $awCat elements from the source XML is rather pointless though, because all of them have "Bodycare & Fitness" for value. Of course you can also mix XPath and the implict API and just fetch the prod elements and then drill down to the various children of them.
Using XPath should be somewhat faster than iterating over the SimpleXmlElement object graph. Though it should be noted that the difference is in an neglectable area (read 0.000x vs 0.000y) for your feed. Still, if you plan to do more XML work, it pays off to familiarize yourself with XPath, because it's quite powerful. Think of it as SQL for XML.
For additional examples see
A simple program to CRUD node and node values of xml file and
PHP Manual - SimpleXml Basic Examples
Try this...
$url = "http://datafeed.api.productserve.com/datafeed/download/apikey/58bc4442611e03a13eca07d83607f851/cid/97,98,142,144,146,129,595,539,147,149,613,626,135,163,168,159,169,161,167,170,137,171,548,174,183,178,179,175,172,623,139,614,189,194,141,205,198,206,203,208,199,204,201,61,62,72,73,71,74,75,76,77,78,79,63,80,82,64,83,84,85,65,86,87,88,90,89,91,67,92,94,33,54,53,57,58,52,603,60,56,66,128,130,133,212,207,209,210,211,68,69,213,216,217,218,219,220,221,223,70,224,225,226,227,228,229,4,5,10,11,537,13,19,15,14,18,6,551,20,21,22,23,24,25,26,7,30,29,32,619,34,8,35,618,40,38,42,43,9,45,46,651,47,49,50,634,230,231,538,235,550,240,239,241,556,245,244,242,521,576,575,577,579,281,283,554,285,555,303,304,286,282,287,288,173,193,637,639,640,642,643,644,641,650,177,379,648,181,645,384,387,646,598,611,391,393,647,395,631,602,570,600,405,187,411,412,413,414,415,416,649,418,419,420,99,100,101,107,110,111,113,114,115,116,118,121,122,127,581,624,123,594,125,421,604,599,422,530,434,532,428,474,475,476,477,423,608,437,438,440,441,442,444,446,447,607,424,451,448,453,449,452,450,425,455,457,459,460,456,458,426,616,463,464,465,466,467,427,625,597,473,469,617,470,429,430,615,483,484,485,487,488,529,596,431,432,489,490,361,633,362,366,367,368,371,369,363,372,373,374,377,375,536,535,364,378,380,381,365,383,385,386,390,392,394,396,397,399,402,404,406,407,540,542,544,546,547,246,558,247,252,559,255,248,256,265,259,632,260,261,262,557,249,266,267,268,269,612,251,277,250,272,270,271,273,561,560,347,348,354,350,352,349,355,356,357,358,359,360,586,590,592,588,591,589,328,629,330,338,493,635,495,507,563,564,567,569,568/mid/2891/columns/merchant_id,merchant_name,aw_product_id,merchant_product_id,product_name,description,category_id,category_name,merchant_category,aw_deep_link,aw_image_url,search_price,delivery_cost,merchant_deep_link,merchant_image_url/format/xml/compression/gzip/";
$zd = gzopen($url, "r");
$data = gzread($zd, 1000000);
gzclose($zd);
if ($data !== false) {
$xml = simplexml_load_string($data);
foreach ($xml->merchant->prod as $pr) {
echo $pr->cat->awCat . "<br>";
}
}
<?php
$xmlstr = file_get_contents("compress.zlib://$url");
$xml = simplexml_load_string($xmlstr);
// you can transverse the xml tree however you want
foreach ($xml->merchant->prod as $line) {
// $line->cat->awCat -> you can use this
}
more information here
Use print_r($xml) to see the structure of the parsed XML feed.
Then it becomes obvious how you would traverse it:
foreach ($xml->merchant->prod as $prod) {
print $prod->pId;
print $prod->text->name;
print $prod->cat->awCat; # <-- which is what you wanted
print $prod->price->buynow;
}
$url = 'you url here';
$f = gzopen ($url, 'r');
$xml = new SimpleXMLElement (fread ($f, 1000000));
foreach($xml->xpath ('//prod') as $name)
{
echo (string) $name->cat->awCatId, "\r\n";
}
Related
Loading a Search and Retrieve via URL (SRU) in php with simplexml_load_string returns an empty object
Im trying to load search result from an library api using Search and Retrieve via URL (SRU) at : https://data.norge.no/data/bibsys/bibsys-bibliotekbase-bibliografiske-data-sru If you see the search result links there, its looks pretty much like XML but when i try like i have before with xml using the code below, it just returns a empty object, SimpleXMLElement {#546} whats going on here? My php function in my laravel project: public function bokId($bokid) { $apiUrl = "http://sru.bibsys.no/search/biblio?version=1.2&operation=searchRetrieve&startRecord=1&maximumRecords=10&query=ibsen&recordSchema=marcxchange"; $filename = "bok.xml"; $xmlfile = file_get_contents($apiUrl); file_put_contents($filename, $xmlfile); // xml file is saved. $fileXml = simplexml_load_string($xmlfile); dd($fileXml); } If i do: dd($xmlfile); instead, it echoes out like this: Making me very confused that i cannot get an object to work with. Code i present have worked fine before.
It may be that the data your being provided ha changed format, but the data is still there and you can still use it. The main problem with using something like dd() is that it doesn't work well with SimpleXMLElements, it tends to have it's own idea of what you want to see of what data there is. In this case the namespaces are the usual problem. But if you look at the following code you can see a quick way of getting the data from a specific namespace, which you can then easily access as normal. In this code I use ->children("srw", true) to say fetch all child elements that are in the namespace srw (the second argument indicates that this is the prefix and not the URL)... $apiUrl = "http://sru.bibsys.no/search/biblio?version=1.2&operation=searchRetrieve&startRecord=1&maximumRecords=10&query=ibsen&recordSchema=marcxchange"; $filename = "bok.xml"; $xmlfile = file_get_contents($apiUrl); file_put_contents($filename, $xmlfile); // xml file is saved. $fileXml = simplexml_load_string($xmlfile); foreach ( $fileXml->children("srw", true)->records->record as $record) { echo "recordIdentifier=".$record->recordIdentifier.PHP_EOL; } This outputs... recordIdentifier=792012771 recordIdentifier=941956423 recordIdentifier=941956466 recordIdentifier=950546232 recordIdentifier=802109055 recordIdentifier=910941041 recordIdentifier=940589451 recordIdentifier=951721941 recordIdentifier=080703852 recordIdentifier=011800283 As I'm not sure which data you want to retrieve as the title, I just wanted to show the idea of how to fetch data when you have a list of possibilities. In this example I'm using XPath to look in each <srw:record> element and find the <marc:datafield tag="100"...> element and in that the <marc:subfield code="a"> element. This is done using //marc:datafield[#tag='100']/marc:subfield[#code='a']. You may need to adjust the #tag= bit to the datafield your after and the #code= to point to the subfield your after. $fileXml = simplexml_load_string($xmlfile); $fileXml->registerXPathNamespace("marc","info:lc/xmlns/marcxchange-v1"); foreach ( $fileXml->children("srw", true)->records->record as $record) { echo "recordIdentifier=".$record->recordIdentifier.PHP_EOL; $data = $record->xpath("//marc:datafield[#tag='100']/marc:subfield[#code='a']"); $subData=$data[0]->children("marc", true); echo "Data=".(string)$data[0].PHP_EOL; }
PHP foreach statement issue with accessing XML data
First, I am pretty clueless with PHP, so be kind! I am working on a site for an SPCA (I'm a Vet and a part time geek). The PHP accesses an xml file from a portal used to administer the shelter and store images, info. The file writes that xml data to JSON and then I use the JSON data in a handlebars template, etc. I am having a problem getting some data from the xml file to outprint to JSON. The xml file is like this: </DataFeedAnimal> <AdditionalPhotoUrls> <string>doc_73737.jpg</string> <string>doc_74483.jpg</string> <string>doc_74484.jpg</string> </AdditionalPhotoUrls> <PrimaryPhotoUrl>19427.jpg</PrimaryPhotoUrl> <Sex>Male</Sex> <Type>Cat</Type> <YouTubeVideoUrls> <string>http://www.youtube.com/watch?v=6EMT2s4n6Xc</string> </YouTubeVideoUrls> </DataFeedAnimal> In the PHP file, written by a friend, the code is below, (just part of it), to access that XML data and write it to JSON: <?php $url = "http://eastbayspcapets.shelterbuddy.com/DataFeeds/AnimalsForAdoption.aspx"; if ($_GET["type"] == "found") { $url = "http://eastbayspcapets.shelterbuddy.com/DataFeeds/foundanimals.aspx"; } else if ($_GET["type"] == "lost") { $url = "http://eastbayspcapets.shelterbuddy.com/DataFeeds/lostanimals.aspx"; } $response_xml_data = file_get_contents($url); $xml = simplexml_load_string($response_xml_data); $data = array(); foreach($xml->DataFeedAnimal as $animal) { $item = array(); $item['sex'] = (string)$animal->Sex; $item['photo'] = (string)$animal->PrimaryPhotoUrl; $item['videos'][] = (string)$animal->YouTubeVideoUrls; $item['photos'][] = (string)$animal->PrimaryPhotoUrl; foreach($animal->AdditionalPhotoUrls->string as $photo) { $item['photos'][] = (string)$photo; } $item['videos'] = array(); $data[] = $item; } echo file_put_contents('../adopt.json', json_encode($data)); echo json_encode($data); ?> The JSON output works well but I am unable to get 'videos' to write out to the JSON file as the 'photos' do. I just get '/n'! Since the friend who helped with this is no longer around, I am stuck. I have tried similar code to the foreach statement for photos but am getting nowhere. Any help would be appreciated and the pets would appreciate it as well!
The trick with such implementations is to always look what you have got by dumping data structures to a log file or command line. Then to take a look at the documentation of the data you see. That way you know exactly what data you are working with and how to work with it ;-) Here it turns out that the video URLs you are interested in are placed inside an object of type SimpleXMLElement with public properties, which is not really surprising if you look at the xml structure. The documentation of class SimpleXMLElement shows the method children() which iterates through all children. Just what we are looking for... That means a clean implementation to access those sets should go along these lines: foreach($animal->AdditionalPhotoUrls->children() as $photo) { $item['photos'][] = (string)$photo; } foreach($animal->YouTubeVideoUrls->children() as $video) { $item['videos'][] = (string)$video; } Take a look at this full and working example: <?php $response_xml_data = <<< EOT <DataFeedAnimal> <AdditionalPhotoUrls> <string>doc_73737.jpg</string> <string>doc_74483.jpg</string> <string>doc_74484.jpg</string> </AdditionalPhotoUrls> <PrimaryPhotoUrl>19427.jpg</PrimaryPhotoUrl> <Sex>Male</Sex> <Type>Cat</Type> <YouTubeVideoUrls> <string>http://www.youtube.com/watch?v=6EMT2s4n6Xc</string> <string>http://www.youtube.com/watch?v=hgfg83mKFnd</string> </YouTubeVideoUrls> </DataFeedAnimal> EOT; $animal = simplexml_load_string($response_xml_data); $item = []; $item['sex'] = (string)$animal->Sex; $item['photo'] = (string)$animal->PrimaryPhotoUrl; $item['photos'][] = (string)$animal->PrimaryPhotoUrl; foreach($animal->AdditionalPhotoUrls->children() as $photo) { $item['photos'][] = (string)$photo; } $item['videos'] = []; foreach($animal->YouTubeVideoUrls->children() as $video) { $item['videos'][] = (string)$video; } echo json_encode($item); The obvious output of this is: { "sex":"Male", "photo":"19427.jpg", "photos" ["19427.jpg","doc_73737.jpg","doc_74483.jpg","doc_74484.jpg"], "videos":["http:\/\/www.youtube.com\/watch?v=6EMT2s4n6Xc","http:\/\/www.youtube.com\/watch?v=hgfg83mKFnd"] } I would however like to add a short hint: In m eyes it is questionable to convert such structured information into an associative array. Why? Why not a simple json_encode($animal)? The structure is perfectly fine and should be easy to work with! The output of that would be: { "AdditionalPhotoUrls":{ "string":[ "doc_73737.jpg", "doc_74483.jpg", "doc_74484.jpg" ] }, "PrimaryPhotoUrl":"19427.jpg", "Sex":"Male", "Type":"Cat", "YouTubeVideoUrls":{ "string":[ "http:\/\/www.youtube.com\/watch?v=6EMT2s4n6Xc", "http:\/\/www.youtube.com\/watch?v=hgfg83mKFnd" ] } } That structure describes objects (items with an inner structure, enclosed in json by {...}), not just arbitrary arrays (sets without a structure, enclosed in json by a [...]). Arrays are only used for the two unstructured sets of strings in there: photos and videos. This is much more logical, once you think about it...
Assuming the XML Data and the JSON data are intended to have the same structure. I would take a look at this: PHP convert XML to JSON You may not need for loops at all.
Breaking foreach at certain string/ Reading through text file and generating XML
I don't know if this is the right way to go about it, but right now I am dealing with a very large text file of membership details. It is really inconsistent though, but typically conforming to this format: Name School Department Address Phone Email &&^ (indicating the end of the individual record) What I want to do with this information is read through it, and then format it into XML. So right now I have a foreach reading through the long file like this: <?php $textline = file("asrlist.txt"); foreach($textline as $showline){ echo $showline . "<br>"; } ?> And that's where I don't know how to continue. Can anybody give me some hints on how I could organize these records into XML?
Here a straightforward solution using simplexml: $members = explode('&&^', $textline); // building array $members $xml = new SimpleXMLElement("<?xml version="1.0" encoding="UTF-8"?><members></members>"); $fieldnames = array('name','school','department','address','phone','email'); // set $fieldsep to character(s) that seperate fields from each other in your textfile $fieldsep = '\p\n'; // a wild guess... foreach ($members as $member) { $m = explode($fieldsep, $member); // build array $m; $m[0] would contain "name" etc. $xmlmember = $xml->addChild('member'); foreach ($m as $key => $data) $xmlmember->addChild($fieldnames[$key],$data); } // foreach $members $xml->asXML('mymembers.xml'); For reading and parsing the text-file, CSV-related functions could be a good alternative, as mentioned by other users.
To read big files you can use fgetcsv
If && works as a delimiter for records in that file, you could start with replacing it with </member><member>. Prepend whole file with <member> and append </member> at the end. You will have something XML alike. How to replace? You might find unix tools like sed useful. sed 's/&&/\<\/member\>\<member\>/' <input.txt >output.xml You can also accomplish it with PHP, using str_replace(): foreach($textline as $showline){ echo str_replace( '&&', '</member><member>', $showline ) . "<br>"; }
How to extract only certain tags from HTML document using PHP?
I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable: $string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>\n"; What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that? I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation). $xml = new SimpleXMLElement ($string); $result=$xml->xpath('/p'); while(list( , $node)=each($result)){ echo '/p: ' , $node, "\n"; } Hopefully someone with (a lot) more experience in PHP will be able to help me out :D
Try using DOMDocument along with DOMDocument::getElementsByTagName. The workflow should be quite simple. Something like: $doc = DOMDocument::loadHTML(htmlspecialchars($crawler->results)); $pNodes = $doc->getElementsByTagName('p'); Which will return a DOMNodeList.
I vote for use regexp. For tag p preg_match_all('/<p>(.*)<\/p>/', '<p>foo</p><p>foo 1</p><p>foo 2</p>', $arr, PREG_PATTERN_ORDER); if(is_array($arr)) { foreach($arr as $value) { echo $value."</br>"; } }
Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail. http://simplehtmldom.sourceforge.net/ It can be used like this: // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . '<br>';
How extract XML data using PHP
I am required to extract the information at a particular "area" of this large collection of xml. But i'm not familiar with extracting xml. I've looked through the site and tried various ways but all i get back is "Error in my_thread_global_end(): 1 threads didn't exit" Here's the url of the xml i'm getting my data from: ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml I would to retrieve all the 7 forecast-period located only for the Swans Hill "area". Help please
I agree that using php's simple xml parser is the way to go with this one. You can make your life easy here using the xpath method of extracting data from the xml. There's an xpath tutorial here: http://www.w3schools.com/xpath/ And php documentation for it here: http://www.php.net/manual/en/simplexmlelement.xpath.php Try this out <?php /* Get the file with CURL $curl_handle=curl_init(); curl_setopt($curl_handle,CURLOPT_URL,'ftp://ftp2.bom.gov.au/anon/gen/fwo/IDV10753.xml'); curl_setopt($curl_handle,CURLOPT_RETURNTRANSFER,true); $xml_data = curl_exec($curl_handle); curl_close($curl_handle); */ /* Open the file locally */ $xml_data = file_get_contents("weather.xml"); $xml = simplexml_load_string($xml_data); $result = $xml->xpath("//area[#description='Swan Hill']/forecast-period"); date_default_timezone_set('America/New_York'); foreach ($result as $day) { //print_r($day); $day_of_the_week = date("l", strtotime($day["start-time-local"])); //start-time-local is an attribute of a result, so use the [] syntax $forecast = $day->text; //text is a child node, so use the -> syntax printf("%s: %s\n", $day_of_the_week, $forecast); } ?> EDIT More illustrative example