Parsing through XML with namespace

Parsing through XML with namespace - php

I try to parse through a XML file with a difficult namespace but this does not work.
Hope you can help me with this issue.
This is my XML file which is generated from an URL:
<searchresultresponse xmlns="urn:veloconnect:catalog-1.1" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-1.0" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-1.0" xmlns:vct="urn:veloconnect:transaction-1.0" xmlns:vco="urn:veloconnect:order-1.1" xmlns:vcc="urn:veloconnect:catalog-1.1">
<vct:buyersid>417641</vct:buyersid>
<vct:responsecode>200</vct:responsecode>
<vct:transactionid>dmVsb2Nvbm5lY3Rfc2VhcmNoLkhJUl9TUkNfMTIwMg</vct:transactionid>
<vct:statuscode>2</vct:statuscode>
<startindex>0</startindex>
<count>500</count>
<totalcount>51691</totalcount>
<resultformat>ITEM_TYPE</resultformat>
<cac:item>
<cbc:description>BREMSBELAG GALFER ORGAN. FD171-G1054 (KBA)</cbc:description>
<cac:sellersitemidentification>
<cac:id>04303400</cac:id>
</cac:sellersitemidentification>
<cac:standarditemidentification>
<cac:id identificationschemeid="EAN/UCC-13">8400160001718</cac:id>
</cac:standarditemidentification>
<cac:manufacturersitemidentification>
<cac:id>FD171-G1054</cac:id>
<cac:issuerparty>
<cac:partyname>
<cbc:name>Galfer</cbc:name>
</cac:partyname>
</cac:issuerparty>
</cac:manufacturersitemidentification>
<cac:baseprice>
<cbc:priceamount amountcurrencyid="EUR">5.95</cbc:priceamount>
<cbc:basequantity quantityunitcode="EA">1</cbc:basequantity>
</cac:baseprice>
<cac:recommendedretailprice>
<cbc:priceamount amountcurrencyid="EUR">10.9</cbc:priceamount>
<cbc:basequantity quantityunitcode="EA">1</cbc:basequantity>
</cac:recommendedretailprice>
</cac:item>
I grab this from an URL via PHP like this:
<?php
error_reporting(E_ALL);
$url = "http://somedomain.com/feed";
set_time_limit(0);
$ch = curl_init($url);// or any url you can pass which gives you the xml file
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$xml = curl_exec($ch);
curl_close($ch);
$namespaces = $xml->getNameSpaces(true);
$cac = $xml->children($namespaces['cac']);
foreach ($xml as $entry){
$cac = $xml->children($namespaces['cac']);
echo $cac->item;
}
?>
I always all data shown without linebreaks etc. but I need to fetch specific objects to save them in array (later).
This namesapce here is really wired.

Related

Regular expression to extract the content inside the script tag in php

I tried to extract the download url from the webpage.
the code which tried is below
function getbinaryurl ($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);
$value1 = curl_exec($curl);
curl_close($curl);
$start = preg_quote('<script type="text/x-component">', '/');
$end = preg_quote('</script>', '/');
$rx = preg_match("/$start(.*?)$end/", $value1, $matches);
var_dump($matches);
}
$url = "https://www.sourcetreeapp.com/download-archives";
getbinaryurl($url);
this way i am getting the tags info not the content inside the script tag. how to get the info inside.
expected result is:
https://product-downloads.atlassian.com/software/sourcetree/ga/Sourcetree_4.0.1_234.zip,
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourceTreeSetup-3.3.6.exe,
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourcetreeEnterpriseSetup_3.3.6.msi
i am very much new in writing these regular expressions. can any help me pls.

Instead of using regex, using DOMDocument and XPath allows you to have more control of the elements you select.
Although XPath can be difficult (same as regex), this can look more intuitive to some. The code uses //script[#type="text/x-component"][contains(text(), "macURL")] which broken down is
//script = any script node
[#type="text/x-component"] = which has an attribute called type with
the specific value
[contains(text(), "macURL")] = who's text contains the string macURL
The query() method returns a list of matches, so loop over them. The content is JSON, so decode it and output the values...
function getbinaryurl ($url)
{
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FRESH_CONNECT, true);
$value1 = curl_exec($curl);
curl_close($curl);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($value1);
libxml_use_internal_errors(false);
$xp = new DOMXPath($doc);
$srcs = $xp->query('//script[#type="text/x-component"][contains(text(), "macURL")]');
foreach ( $srcs as $src ) {
$content = json_decode( $src->textContent, true);
echo $content['params']['macURL'] . PHP_EOL;
echo $content['params']['windowsURL'] . PHP_EOL;
echo $content['params']['enterpriseURL'] . PHP_EOL;
}
}
$url = "https://www.sourcetreeapp.com/download-archives";
getbinaryurl($url);
which outputs
https://product-downloads.atlassian.com/software/sourcetree/ga/Sourcetree_4.0.1_234.zip
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourceTreeSetup-3.3.8.exe
https://product-downloads.atlassian.com/software/sourcetree/windows/ga/SourcetreeEnterpriseSetup_3.3.8.msi

PHP curl Inside Foreach

EDIT:What is really happening is that a new xml is created each time but it is adding the new $html information to the previous so by the time it gets to the last element in the list being curled, it is saving parsed information from all previous curls. Can't figure out what is wrong.
Having trouble with a curl not executing as expected. In the code below I have a foreach loop that loops thru a list ($textarray) and passes the list element to a curl and also used to create an xml file using the element as the file name. The curl then returns $html which is then parsed and saved to an xml. The script runs, the list is passed, the url is created and passed to the curl function. I get an echo showing the correct url, a return is made and then each return is parsed and saved to the appropriate file. The problem seems to be that the curl is not actually curling the new $url. I get the exact same information saved in every xml file. I no this is not correct. Not sure why this is happening. Any help appreciated.
Function FeedXml($textarray){
$doc=new DOMDocument('1.0', 'UTF-8');
$feed=$doc->createElement("feed");
Foreach ($textarray as $text){
$url="http://xxx/xxx/".$text;
echo "PATH TO CURL".$url."<br>";
$html=curlurl($url);
$xmlsave="http://xxxx/xxx/".$text;
$dom = new DOMDocument(); //NEW dom FOR EACH SHOW
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$dom->formatOutput = true;
$dom->preserveWhiteSpace = true;
//PARSE EACH RETURN INFORMATION
$images= $dom->getElementsByTagName('img');
foreach($images as $img){
$icon= $img ->getAttribute('src');
if( preg_match('/\.(jpg|jpeg|gif)(?:[\?\#].*)?$/i', $icon) ) {
// ITEM TAG
$item= $doc->createElement("item");
$sdAttribute = $doc->createAttribute("sdImage");
$sdAttribute->value = $icon;
$item->appendChild($sdAttribute);
} // IMAGAGE FOR EACH
$feed->appendChild($item);
$doc->appendChild($feed);
$doc->save($xmlsave);
}
}
}
Function curlurl($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_VERBOSE, 1);//0-FALSE 1 TRUE
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER ,FALSE);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_TIMEOUT,'10');
$html = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
echo $httpcode;
return $html;
}

Thanks for pointing out my shortcomings on the above. I have figured out the problem. The following needed to be moved into the Foreach.
$doc=new DOMDocument('1.0', 'UTF-8');
$feed=$doc->createElement("feed");

Parsing XML with PHP?

This has been driving me insane for about the last hour. I'm trying to parse a bit of XML out of Last.fm's API, I've used about 35 different permutations of the code below, all of which have failed. I'm really bad at XML parsing, lol. Can anyone help me parse the first toptags>tag>name 'name' from this XML API in PHP? :(
http://ws.audioscrobbler.com/2.0/?method=track.getinfo&api_key=b25b959554ed76058ac220b7b2e0a026&artist=Owl+city&track=fireflies
Which in that case ^ would be 'electronic'
Right now, all I have is this
<?
$xmlstr = file_get_contents("http://ws.audioscrobbler.com/2.0/?method=track.getinfo&api_key=b25b959554ed76058ac220b7b2e0a026&artist=Owl+city&track=fireflies");
$genre = new SimpleXMLElement($xmlstr);
echo $genre->lfm->track->toptags->tag->name;
?>
Which returns with, blank. No errors either, which is what's incredibly annoying!
Thank You very Much :) :) :)
Any help greatly, and by greatly I mean really, really greatly appreciated! :)

The <tag> tag is an array, so you should loop through them with a foreach or similar construct. In your case, just grabbing the first would look like this:
<?
$xmlstr = file_get_contents("http://ws.audioscrobbler.com/2.0/?method=track.getinfo&api_key=b25b959554ed76058ac220b7b2e0a026&artist=Owl+city&track=fireflies");
$genre = new SimpleXMLElement($xmlstr);
echo $genre->track->toptags->tag[0]->name;
Also note that the <lfm> tag is not needed.
UPDATE
I find it's much easier to grab exactly what I'm looking for in a SimpleXMLElement by using print_r(). It'll show you what's an array, what's a simple string, what's another SimpleXMLElement, etc.

Try using
$url = "http://ws.audioscrobbler.com/2.0/?method=track.getinfo&api_key=b25b959554ed76058ac220b7b2e0a026&artist=Owl+city&track=fireflies";
$xml = simplexml_load_file($url);
echo $xml->track->toptags->tag[0]->name;

Suggestion: insert a statement to echo $xmlstr, and make sure you are getting something back from the API.

You don't need to reference lfm. Actually, $genre already is lfm. Try this:
echo $genre->track->toptags->tag->name;

if you wan't to read xml data please follow those steps,
$xmlURL = "your xml url / file name goes here";
try {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $xmlURL);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-type: text/xml'
));
$content = curl_exec($ch);
$error = curl_error($ch);
curl_close($ch);
$obj = new SimpleXMLElement($content);
echo "<pre>";
var_dump($obj);
echo "</pre>";
}
catch(Exception $e){
var_dump($e);exit;
}
You will get array formate of whole xml file.
Thanks.

Styling Feed Content returned in XML Format

I have this code that is pulling info from my database and then checking it against an API. I'm trying to customize the return information into a table. I'll include a picture to show what I would like. I've searched everywhere and I don't know how to code this. Don't worry about the CSS it's just getting the information to fill a table.
<?php
$key = "********************";
$address = urlencode($CustomFields->field('jr_address',$listing,false,false));
$city = $listing['Category']['title'];
$zip = $CustomFields->field('jr_zipcode',$listing,false,false);
$url = "http://api.greatschools.org/schools/nearby?key={$key}&address={$address}&city={$city}&state=MI&zip={$zip}&schoolType=public-charter&levelCode=elementary-schools&minimumSchools=50&radius=10&limit=5";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl);
curl_close($curl);
print_r($response);
?><style type='text/css'>gsId{display:none;}name{color:#9BB055; font-size:18px;}type{display:none;}gradeRange{}enrollment{}gsRating{}city{display:none;}state{display:none;}district{}districtNCESId{display:none;}address{display:none;}phone{display:none; }fax{display:none;}ncesId{display:none;}lat{display:none;}lon{display:none;}</style>
The Return Code:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schools>
<school>
<gsId>6350</gsId>
<name>Chinese Education Center</name>
<type>public</type>
<gradeRange>K-5</gradeRange>
<enrollment>63</enrollment>
<gsRating>3</gsRating>
<city>San Francisco</city>
<state>CA</state>
<district>San Francisco Unified School District</district>
<districtNCESId>0634410</districtNCESId>
<address>657 Merchant St., San Francisco, CA 94111</address>
<phone>(415) 291-7918</phone>
<fax>(415) 291-7965</fax>
<ncesId>063441005596</ncesId>
<lat>37.795</lat>
<lon>-122.4042</lon>
<overviewLink>http://www.greatschools.org/california/san-francisco/6350-Chinese-Education-Center/?s_cid=gsapi</overviewLink>
<ratingsLink>http://www.greatschools.org/school/rating.page?state=CA&id=6350&s_cid=gsapi</ratingsLink>
<reviewsLink>http://www.greatschools.org/school/parentReviews.page?state=CA&id=6350&s_cid=gsapi</reviewsLink>
</school>
<school>
<gsId>6389</gsId>
<name>Gordon J. Lau Elementary School</name>
<type>public</type>
<gradeRange>K-5</gradeRange>
<enrollment>667</enrollment>
<gsRating>7</gsRating>
<city>San Francisco</city>
<state>CA</state>
<district>San Francisco Unified School District</district>
<districtNCESId>0634410</districtNCESId>
<address>950 Clay St., San Francisco, CA 94108</address>
<phone>(415) 291-7921</phone>
<fax>(415) 291-7952</fax>
<website>http://www.gjles.org/</website>
<ncesId>063441005599</ncesId>
<lat>37.794</lat>
<lon>-122.4086</lon>
<overviewLink>http://www.greatschools.org/california/san-francisco/6389-Gordon-J.-Lau-Elementary-School/?s_cid=gsapi</overviewLink>
<ratingsLink>http://www.greatschools.org/school/rating.page?state=CA&id=6389&s_cid=gsapi</ratingsLink>
<reviewsLink>http://www.greatschools.org/school/parentReviews.page?state=CA&id=6389&s_cid=gsapi</reviewsLink>
</school>
</schools>
I don't know how to write the script to make it look like this:
Link to image:
link text
So the new file looks like this?
<?php
$key = "xxxxxxxxxxxxxxxxxxxxxxxx";
$address = urlencode($CustomFields->field('jr_address',$listing,false,false));
$city = $listing['Category']['title'];
$zip = $CustomFields->field('jr_zipcode',$listing,false,false);
$url = "http://api.greatschools.org/schools/nearby?key={$key}&address={$address}&city={$city}&state=MI&zip={$zip}&schoolType=public-charter&levelCode=elementary-schools&minimumSchools=50&radius=10&limit=5";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl);
curl_close($curl);
echo (xml2html($response, "/xslt/schools.xsl");
?>
I get an error on the echo line.
Where does this go?
<?php
function xml2html($xmldata, $xslPath)
{
/* $xmldata -> your XML */
/* $xsl -> XSLT file */
$arguments = array('/_xml' => $xmldata);
$xsltproc = xslt_create();
xslt_set_encoding($xsltproc, 'ISO-8859-1');
$html =
xslt_process($xsltproc, 'arg:/_xml', $xslPath, NULL, $arguments);
if (empty($html)) {
die('XSLT processing error: '. xslt_error($xsltproc));
}
xslt_free($xsltproc);
return $html;
}
?>

Browsers are unable to visualize custom XML (i.e. your schools xml).
So you should manually transform it to HTML. I know 2 solutions:
Parse XML with xml parser functions
Transform XML by XSLT sheet (as for me, better approach).
For offerred XML I've written next XSLT: link text (sorry for external link, I have unexplainable troubles with pasting XSLT code here)
Transformation could be applied by the function xslt_process. (You should store xsl file somewhere in your server's folder (for example "xslt/schools.xsl") and convert xml response):
<?php
function xml2html($xmldata, $xslPath)
{
/* $xmldata -> your XML */
/* $xsl -> XSLT file */
$arguments = array('/_xml' => $xmldata);
$xsltproc = xslt_create();
xslt_set_encoding($xsltproc, 'ISO-8859-1');
$html =
xslt_process($xsltproc, 'arg:/_xml', $xslPath, NULL, $arguments);
if (empty($html)) {
die('XSLT processing error: '. xslt_error($xsltproc));
}
xslt_free($xsltproc);
return $html;
}
// making request...
$response = curl_exec($curl);
curl_close($curl);
echo(xml2html($response, "/xslt/schools.xsl"));
?>

Parsing XML data with Namespaces in PHP

I'm trying to work with this XML feed that uses namespaces and i'm not able to get past the colon in the tags. Here's how the XML feed looks like:
<r25:events pubdate="2010-05-19T13:58:08-04:00">
<r25:event xl:href="event.xml?event_id=328" id="BRJDMzI4" crc="00000022" status="est">
<r25:event_id>328</r25:event_id>
<r25:event_name>Testing 09/2005-08/2006</r25:event_name>
<r25:alien_uid/>
<r25:event_priority>0</r25:event_priority>
<r25:event_type_id xl:href="evtype.xml?type_id=105">105</r25:event_type_id>
<r25:event_type_name>CABINET</r25:event_type_name>
<r25:node_type>C</r25:node_type>
<r25:node_type_name>cabinet</r25:node_type_name>
<r25:state>1</r25:state>
<r25:state_name>Tentative</r25:state_name>
<r25:event_locator>2005-AAAAMQ</r25:event_locator>
<r25:event_title/>
<r25:favorite>F</r25:favorite>
<r25:organization_id/>
<r25:organization_name/>
<r25:parent_id/>
<r25:cabinet_id xl:href="event.xml?event_id=328">328</r25:cabinet_id>
<r25:cabinet_name>cabinet 09/2005-08/2006</r25:cabinet_name>
<r25:start_date>2005-09-01</r25:start_date>
<r25:end_date>2006-08-31</r25:end_date>
<r25:registration_url/>
<r25:last_mod_dt>2008-02-27T14:22:43-05:00</r25:last_mod_dt>
<r25:last_mod_user>abc00296004</r25:last_mod_user>
</r25:event>
</r25:events>
And here is what I'm using for code - I'll trying to throw these into a bunch of arrays where I can format the output however I want:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://somedomain.com/blah.xml");
curl_setopt ($ch, CURLOPT_HTTPHEADER, Array("Content-Type: text/xml"));
curl_setopt($ch, CURLOPT_USERPWD, "username:password");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);
$xml = new SimpleXmlElement($output);
foreach ($xml->events->event as $entry){
$dc = $entry->children('http://www.collegenet.com/r25');
echo $entry->event_name . "<br />";
echo $entry->event_id . "<br /><br />";
}

Figured out the issue was with the XML feed rather than code:
XML feed was missing this line:
<r25:events xmlns:r25="http://www.collegenet.com/r25" xmlns:xl="http://www.w3.org/1999/xlink" pubdate="2010-05-19T13:58:08-04:00">
Thanks for the help though.

"All kinds of errors" isn't a helpful description; what errors are you actually getting?
You should give the object a namespace option like this:
$xml = new SimpleXmlElement($output, null, false, $ns = 'r25');
See the manual.

Alternatively, since r25 is the only namespace used and therefore is not especially helpful, I just run
$xml = preg_replace('/r25:/','',$xml);
And that strips out the namespace. Then you can navigate much easier with simplexml, just like in your example.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing through XML with namespace - php

Related

Regular expression to extract the content inside the script tag in php

PHP curl Inside Foreach

Parsing XML with PHP?

Styling Feed Content returned in XML Format

Parsing XML data with Namespaces in PHP

Categories

Resources