Using xpath on a PHP SimpleXML object, SOAP + namespaces (not working..) - php

After researching this on SO and google for hours now... I hope to get some help here:
(I am just one step away from running a regex to remove the namespaces completely)
First this is the XML:
<?xml version="1.0" encoding="utf-16"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header xmlns="http://webservices.site.com/definitions">
<SessionId>0119A|1</SessionId>
</soap:Header>
<soap:Body>
<Security_AuthenticateReply xmlns="http://xml.site.com/QQ">
<processStatus>
<statusCode>P</statusCode>
</processStatus>
</Security_AuthenticateReply>
</soap:Body>
</soap:Envelope>
Now this is what my code in PHP looks like:
$response = simplexml_load_string( $str ,NULL,
false, "http://schemas.xmlsoap.org/soap/envelope/" );
// just making sure the name space is "registered"
// but I tested all examples also with this removed
$response->registerXPathNamespace("soap",
"http://schemas.xmlsoap.org/soap/envelope/");
$_res = $response->xpath('//soap:Header');
print_r($_res);
/*** result: simple query for the root "soap" namespace, this looks good! (so far..)
Array
(
[0] => SimpleXMLElement Object
(
[SessionId] => 0119A|1
)
)
***/
// now we query for the "SessionId" element in the XML
$_res = $response->xpath('//soap:Header/SessionId');
print_r($_res);
/*** result: this does not return anything!
Array
(
)
***/
// another approach
$_res = $response->xpath('//soap:Header/SessionId/text()');
print_r($_res);
/*** result: this does not return anything at all!
***/
// Finally, without using XPath this does work
$_res = $response->xpath('//soap:Header');
$_res = (string)$_res[0]->SessionId;
echo $_res;
/*** result: this worked
0119A|1
***/
How can I get the SOAP message working with XPATH???
Thanks,
Roman

The multiple namespaces are messing with it, adding the following works for me
$response->registerXPathNamespace("site", "http://webservices.site.com/definitions");
$_res = $response->xpath('//site:SessionId');
also, see this previous stack overflow question

You need to register the default namespace used by <SessionId> element as well. Because <SessionId> is in the default namespace it does not have any prefix but in order to your XPath to work, you need to bind also this namespace to some prefix and then use that prefix in your XPath expression.
$response->registerXPathNamespace("ns",
"http://webservices.site.com/definitions");
$_res = $response->xpath('//soap:Header/ns:SessionId');
XPath (1.0) expressions without a namespace prefix always match only to targets in no-namespace.

Related

Getting Exception: String could not be parsed as XML when calling SimpleXMLElement

I am trying to parse this XML (http://numismatics.org/search/apis/getNuds?identifiers=1995.11.282) and pull out the elements under but when I call SimpleXMLElement I get: "Exception: String could not be parsed as XML" (the API is from http://numismatics.org/search/apis)
Here is what the XML basically looks like from the link above:
<nuds xmlns="http://nomisma.org/nuds" xmlns:mets="http://www.loc.gov/METS/" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nomisma.org/nuds http://nomisma.org/nuds.xsd" recordType="physical">
...deleted to shorten question
<digRep>
<mets:fileSec>
<mets:fileGrp USE="obverse">
<mets:file USE="iiif">
<mets:FLocat LOCYPE="URL" xlink:href="http://images.numismatics.org/collectionimages%2F19501999%2F1995%2F1995.11.282.obv.noscale.jpg"/>
</mets:file>
...deleted to shorten question
</mets:fileGrp>
</mets:fileSec>
</digRep>
</nuds>
Here is the test code I'm using from this example
$detail = simplexml_load_file("http://numismatics.org/search/apis/getNuds?identifiers=1995.11.282");
$digRep = new \SimpleXMLElement($detail->nuds->digRep); //the slash before the SimpleXMLElement is required for Laravel
$digRep->registerXPathNamespace('c', 'http://www.loc.gov/METS');
$result = $digRep->xpath('//c:mets:fileSec');
foreach ($result as $title) {
echo $title . "\n";
}
I am wondering if this has to do with "mets:fileSec" but I am not quite sure what "fileSec" is in this context.
mets is just the alias for the namespace http://www.loc.gov/METS/ (the xmlns:mets definition). This is perfectly fine XML and it parses in the browser if called directly. I suggest checking if you really can get the XML with PHP (with file_get_contents()) and not some error message. Services return sometimes different result for different user agents.

Get Attribute value from Soap Response PHP

I am getting a soap response as expected and then converting to an array. Here is my code:
$response = $client->__getLastResponse();
$response = preg_replace("/(<\/?)(\w+):([^>]*>)/", "$1$2$3", $response);
$xml = new SimpleXMLElement($response);
$body = $xml->xpath('//soapBody')[0];
$array = json_decode( str_replace('#', '', json_encode((array)$body)), TRUE);
print_r($array);
here is the output:
Array (
[GetCompanyCodeResponse] => Array (
[GetCompanyCodeResult] => Array (
[Customers] => Array (
[Customer] => Array (
[attributes] => Array (
[CustomerNo] => 103987
[CustomerName] => epds api testers Inc
[ContactId] => 219196
)
)
)
)
)
How do i echo the ContactId? Ive tried the following:
$att = $array->attributes();
$array->attributes()->{'ContactId'};
print_r($array);
I get the following error:
Fatal error: Uncaught Error: Call to a member function attributes() on array
Also tried:
$array->Customer['CustomerId'];
I get following error:
Notice: Trying to get property 'Customer' of non-object
Expecting to get 219196
I found the solution to the above problem. Not sure if it is the most elegant way to do it, but it returns result as expected. If there is a more efficient way to get the ContactId, I am open to suggestions.
print_r($array['GetCompanyCodeResponse']['GetCompanyCodeResult']
['Customers']['Customer']['attributes']['ContactId']);
You have followed some very bad advice on how to parse the XML, and completely thrown away the functionality of SimpleXML.
Specifically, the reason you can't run the attributes() method is that you've converted the SimpleXML object to a plain array using this ugly hack:
$array = json_decode( str_replace('#', '', json_encode((array)$body)), TRUE);
To use SimpleXML as its authors intended, I suggest you read:
The examples in the PHP manual
This reference answer on handling XML namespaces
Since you didn't paste the actual XML in the question, I'm going to take a guess that it looks like this:
<?xml version = "1.0"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope">
<soap:Body xmlns="http://www.example.org/companyInfo">
<GetCompanyCodeResponse>
<GetCompanyCodeResult>
<Customers>
<Customer CustomerNo="103987" CustomerName="epds api testers Inc" ContactId="219196" />
</Customers>
</GetCompanyCodeResult>
</GetCompanyCodeResponse>
</soap:Body>
</soap:Envelope>
If that is in $response, we don't need to do any weirdness with str_replace or json_encode, we can use the methods built into SimpleXML to navigate around the XML:
$xml = new SimpleXMLElement($response);
// The Body is in the SOAP Envelope namespace
$body = $xml->children('http://www.w3.org/2001/12/soap-envelope')->Body;
// The element inside that is in some other namespace
$innerResponse = $body->children('http://www.example.org/companyInfo')->GetCompanyCodeResponse;
// We need to traverse the XML to get to the node we're interested in
$customer = $innerResponse->GetCompanyCodeResult->Customers->Customer;
// Unprefixed attributes aren't technically in any namespace (an oddity in the XML namespace spec!)
$attributes = $customer->attributes(null);
// Here's the value you were looking for
echo $attributes['ContactId'];
Unlike your previous code, this won't break if:
The server starts using a different local prefix instead of soap:, or adding a prefix on the GetCompanyCodeResponse element
The response comes back with more than one Customer (the ->Customer always means the same as ->Customer[0], the first child element with that name)
The Customer element has child elements or text content as well as attributes
It also allows you to use other features of SimpleXML, like using an xpath expression to search the document or even switching to the full DOM API for more complex operations.

How to echo information from xml in PHP

I have some problems with echo line from xml file.
How i can do it fine?
I try to
$test = file_get_contents('');
$test = iconv('WINDOWS-1251', 'UTF-8', $test);
$test = "<xmp>".$test."</xmp>";
And try to find with preg_match_all, but it isn't work.
preg_match_all('/<ya:created dc:date="\d+\-\d+\-\d+\T\d+\:\d+\:\d+/', $test, $output_array);
It's work on https://www.phpliveregex.com/ but isn't work on my site.
https://www.phpliveregex.com/p/qCH
My XML:
<?xml version="1.0" encoding="WINDOWS-1251"?>
<rdf:RDF
xml:lang="ru"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:ya="http://blogs.yandex.ru/schema/foaf/"
xmlns:img="http://blogs.yandex.ru/schema/foaf/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<foaf:Person>
<ya:publicAccess>allowed</ya:publicAccess>
<foaf:gender>male</foaf:gender>
<ya:created dc:date="2011-01-30T16:43:45+03:00"/>
<ya:lastLoggedIn dc:date="2019-01-16T18:54:55+03:00"/>
<ya:modified dc:date="2019-01-13T21:15:43+03:00"/>
</foaf:Person>
</rdf:RDF>
You would be better accessing it using something like SimpleXML and XPath. There are at least two ways of doing it, both here rely on using XPath but you have to use the namespaces (the ya: bit) to ensure you get the right element. As XPath returns a list of matches, I just use [0] to get the first one, if there are multiple ones you can use a loop...
$test = file_get_contents("data.xml");
$xml = simplexml_load_string($test);
// Version 1
// Fetch the ya:created element
$created = $xml->xpath("//ya:created")[0];
// Extract the attributes and print the date
echo $created[0]->attributes("http://purl.org/dc/elements/1.1/")->date;
// Version 2
// Extract the dd:date attribute (using the #)
$createdDate = $xml->xpath("//ya:created/#dc:date")[0];
echo $createdDate;
Forgot to say - if you want to use these fields for a database etc. you may need to cast them to a string to make sure they are converted...
$date = (string)$createdDate;

Remove namespace from XML using PHP

I have an XML document that looks like this:
<Data
xmlns="http://www.domain.com/schema/data"
xmlns:dmd="http://www.domain.com/schema/data-metadata"
>
<Something>...</Something>
</Data>
I am parsing the information using SimpleXML in PHP. I am dealing with arrays and I seem to be having a problem with the namespace.
My question is: How do I remove those namespaces? I read the data from an XML file.
Thank you!
I found the answer above to be helpful, but it didn't quite work for me.
This ended up working better:
// Gets rid of all namespace definitions
$xml_string = preg_replace('/xmlns[^=]*="[^"]*"/i', '', $xml_string);
// Gets rid of all namespace references
$xml_string = preg_replace('/[a-zA-Z]+:([a-zA-Z]+[=>])/', '$1', $xml_string);
If you're using XPath then it's a limitation with XPath and not PHP look at this explanation on xpath and default namespaces for more info.
More specifically its the xmlns="" attribute in the root node which is causing the problem. This means that you'll need to register the namespace then use a QName thereafter to refer to elements.
$feed = simplexml_load_file('http://www.sitepoint.com/recent.rdf');
$feed->registerXPathNamespace("a", "http://www.domain.com/schema/data");
$result = $feed->xpath("a:Data/a:Something/...");
Important: The URI used in the registerXPathNamespace call must be identical to the one that is used in the actual XML file.
The following PHP code automatically detects the default namespace specified in the XML file under the alias "default". No all xpath queries have to be updated to include the prefix default:
So if you want to read XML files rather they contain an default NS definition or they don't and you want to query all Something elements, you could use the following code:
$xml = simplexml_load_file($name);
$namespaces = $xml->getDocNamespaces();
if (isset($namespaces[''])) {
$defaultNamespaceUrl = $namespaces[''];
$xml->registerXPathNamespace('default', $defaultNamespaceUrl);
$nsprefix = 'default:';
} else {
$nsprefix = '';
}
$somethings = $xml->xpath('//'.$nsprefix.'Something');
echo count($somethings).' times found';
When you just want your xml, parsed to be used, and you don't care for any namespaces,
you just remove them. Regular expressions are good, and way faster than my method below.
But for a safer approach when removing namespaces, one could parse the xml with SimpleXML and ask for the namespaces it has, like below:
$xml = '...';
$namespaces = simplexml_load_string($xml)->getDocNamespaces(true);
//The line bellow fetches default namespace with empty key, like this: '' => 'url'
//So we remove any default namespace from the array
$namespaces = array_filter(array_keys($namespaces), function($k){return !empty($k);});
$namespaces = array_map(function($ns){return "$ns:";}, $namespaces);
$ns_clean_xml = str_replace("xmlns=", "ns=", $xml);
$ns_clean_xml = str_replace($namespaces, array_fill(0, count($namespaces), ''), $ns_clean_xml);
$xml_obj = simplexml_load_string($ns_clean_xml);
Thus you hit replace only for the namespaces avoiding to remove anything else the xml could have.
Actually I am using it as a method:
function refined_simplexml_load_string($xml_string) {
if(false === ($x1 = simplexml_load_string($xml_string)) ) return false;
$namespaces = array_keys($x1->getDocNamespaces(true));
$namespaces = array_filter($namespaces, function($k){return !empty($k);});
$namespaces = array_map(function($ns){return "$ns:";}, $namespaces);
return simplexml_load_string($ns_clean_xml = str_replace(
array_merge(["xmlns="], $namespaces),
array_merge(["ns="], array_fill(0, count($namespaces), '')),
$xml_string
));
}
To remove the namespace completely, you'll need to use Regular Expressions (RegEx). For example:
$feed = file_get_contents("http://www.sitepoint.com/recent.rdf");
$feed = preg_replace("/<.*(xmlns *= *[\"'].[^\"']*[\"']).[^>]*>/i", "", $feed); // This removes ALL default namespaces.
$xml_feed = simplexml_load_string($feed);
Then you've stripped any xml namespaces before you load the XML (be careful with the regex through, because if you have any fields with something like:
<![CDATA[ <Transfer xmlns="http://redeux.example.com">cool.</Transfer> ]]>
Then it will strip the xmlns from inside the CDATA which may lead to unexpected results.

PHP simplexml: why does xpath stop working?

A strange thing happened after a supplier changed the XML header a bit. I used to be able to read stuff using xpath, but now I can't even get a reply with
$xml->xpath('/');
They changed it from this...
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE NewsML SYSTEM "http://www.newsml.org/dl.php?fn=NewsML/1.2/specification/NewsML_1.2.dtd" [
<!ENTITY % nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.4/specification/dtd/nitf-3-4.dtd">
%nitf;
]>
<NewsML>
...
to this:
<?xml version="1.0" encoding="iso-8859-1"?>
<NewsML
xmlns="http://iptc.org/std/NewsML/2003-10-10/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://iptc.org/std/NewsML/2003-10-10/ http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.xsd http://iptc.org/std/NITF/2006-10-18/ http://contentdienst.pressetext.com/misc/nitf-3-4.xsd"
>
...
Most likely this is because they've introduced a default namespace (xmlns="http://iptc.org/std/NewsML/2003-10-10/") into their document. SimpleXML's support for default namespaces is not very good, to put it mildly.
Can you try to explicitly register a namespace prefix:
$xml->registerXPathNamespace("n", "http://iptc.org/std/NewsML/2003-10-10/");
$xml->xpath('/n:NewsML');
You would have to adapt your XPath expressions to use the "n:" prefix on every element. Here is some additional info: http://people.ischool.berkeley.edu/~felix/xml/php-and-xmlns.html.
EDIT: As per the spec:
The registerXPathNamespace() function creates a prefix/ns context for the next XPath query.
This means it would have to be called before every XPath query, thus a function to wrap XPath queries would be the natural thing to do:
function simplexml_xpath_ns($element, $xpath, $xmlns)
{
foreach ($xmlns as $prefix_uri)
{
list($prefix, $uri) = explode("=", $prefix_uri, 2);
$element->registerXPathNamespace($prefix, $uri);
}
return $element->xpath($xpath);
}
Usage:
$xmlns = ["n=http://iptc.org/std/NewsML/2003-10-10/"];
$result = simplexml_xpath_ns($xml, '/n:NewsML', $xmlns);

Categories