I'm trying to parse out the CDATA from a SOAP response using SimpleXML and Xpath. I get the output that I looking for but the output returned is one continuous line of data with no separators that would allow me to parse.
I appreciate any help!
Here is the SOAP response containing the CDATA that I need to parse:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<ns1:getIPServiceDataResponse xmlns:ns1="http://ws.icontent.idefense.com/V3/2">
<ns1:return xsi:type="ns1:IPServiceDataResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns1:status>Success</ns1:status>
<ns1:serviceType>IPservice_TIIncremental_ALL_xml_v1</ns1:serviceType>
<ns1:ipserviceData><![CDATA[<?xml version="1.0" encoding="utf-8"?><threat_indicators><tidata><indicator>URL</indicator><format>STRING</format><value>http://update.lflink.com/aspnet_vil/debug.swf</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:13:43.587000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata><tidata><indicator>URL</indicator><format>STRING</format><value>http://update.lflink.com/crossdomain.xml</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:14:04.108000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata><tidata><indicator>DOMAIN</indicator><format>STRING</format><value>update.lflink.com</value><role>EXPLOIT</role><sample_md5/><last_observed>2012-11-02 18:15:10.445000</last_observed><comment>APT Blade2009 - CVE-2012-5271</comment><ref_id/></tidata></threat_indicators>]]></ns1:ipserviceData>
</ns1:return>
</ns1:getIPServiceDataResponse>
</soapenv:Body>
</soapenv:Envelope>
Here is PHP code I'm using to try to parse the CDATA:
<?php
$xml = simplexml_load_string($soap_response);
$xml->registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');
foreach ($xml->xpath("//ns1:ipserviceData") as $item)
{
echo '<pre>';
print_r($item);
echo '</pre>';
}
?>
Here's the print_r output:
SimpleXMLElement Object
(
[0] => URLSTRINGhttp://update.lflink.com/aspnet_vil/debug.swfEXPLOIT2012-11-02 18:13:43.587000APT Blade2009 - CVE-2012-5271URLSTRINGhttp://update.lflink.com/crossdomain.xmlEXPLOIT2012-11-02 18:14:04.108000APT Blade2009 - CVE-2012-5271DOMAINSTRINGupdate.lflink.comEXPLOIT2012-11-02 18:15:10.445000APT Blade2009 - CVE-2012-5271
)
Any ideas what I can do to make the output usable? For example, parsing out each element of the CDATA output such as: <indicator></indicator>, <value></value>, <role></role>, etc.
FYI - Also tried using LIBXML_NOCDATA with no change in output.
You get it as a single string because you have asked for that - just the string.
If you want to be able to parse that string as XML then, well create a new Simplexml object out of it.
Then you have another parser on the string which can parse the HTML (yes that simple; Demo):
$soap = simplexml_load_string($soapXML);
$soap->registerXPathNamespace('ns1', 'http://ws.icontent.idefense.com/V3/2');
$ipserviceData = simplexml_load_string($soap->xpath('//ns1:ipserviceData')[0]);
// <threat_indicators><tidata><indicator>URL</indicator>
echo $ipserviceData->tidata->indicator, "\n"; # URL
Btw, the LIBXML_NOCDATA flagDocs only controls whether the <![CDATA[...]]> parts are preserved as CDATA nodes or merged into text-nodes.
Related
I am trying to parse this XML (http://numismatics.org/search/apis/getNuds?identifiers=1995.11.282) and pull out the elements under but when I call SimpleXMLElement I get: "Exception: String could not be parsed as XML" (the API is from http://numismatics.org/search/apis)
Here is what the XML basically looks like from the link above:
<nuds xmlns="http://nomisma.org/nuds" xmlns:mets="http://www.loc.gov/METS/" xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://nomisma.org/nuds http://nomisma.org/nuds.xsd" recordType="physical">
...deleted to shorten question
<digRep>
<mets:fileSec>
<mets:fileGrp USE="obverse">
<mets:file USE="iiif">
<mets:FLocat LOCYPE="URL" xlink:href="http://images.numismatics.org/collectionimages%2F19501999%2F1995%2F1995.11.282.obv.noscale.jpg"/>
</mets:file>
...deleted to shorten question
</mets:fileGrp>
</mets:fileSec>
</digRep>
</nuds>
Here is the test code I'm using from this example
$detail = simplexml_load_file("http://numismatics.org/search/apis/getNuds?identifiers=1995.11.282");
$digRep = new \SimpleXMLElement($detail->nuds->digRep); //the slash before the SimpleXMLElement is required for Laravel
$digRep->registerXPathNamespace('c', 'http://www.loc.gov/METS');
$result = $digRep->xpath('//c:mets:fileSec');
foreach ($result as $title) {
echo $title . "\n";
}
I am wondering if this has to do with "mets:fileSec" but I am not quite sure what "fileSec" is in this context.
mets is just the alias for the namespace http://www.loc.gov/METS/ (the xmlns:mets definition). This is perfectly fine XML and it parses in the browser if called directly. I suggest checking if you really can get the XML with PHP (with file_get_contents()) and not some error message. Services return sometimes different result for different user agents.
I want to create dynamic tags in XML using PHP
like this : <wsse:Username>fqsuser01</wsse:Username>
the main thing is that I want the tags will change the value inside ---> "wsse"
(like this value)
what I need to do? to create this XML file wite PHP?
Thanks,
For this purpose you can use XMLWriter for example (another option is SimpleXML). Both option are in PHP core so any third party libraries aren't needed. wsse is a namespace - more about them you can read here
I also share with you some example code:
<?php
//create a new xmlwriter object
$xml = new XMLWriter();
//using memory for string output
$xml->openMemory();
//set the indentation to true (if false all the xml will be written on one line)
$xml->setIndent(true);
//create the document tag, you can specify the version and encoding here
$xml->startDocument();
//Create an element
$xml->startElement("root");
//Write to the element
$xml->writeElement("r1:id", "1");
$xml->writeElement("r2:id", "2");
$xml->writeElement("r3:id", "3");
$xml->endElement(); //End the element
//output the xml
echo $xml->outputMemory();
?>
Result:
<?xml version="1.0"?>
<root>
<r1:id>1</r1:id>
<r2:id>2</r2:id>
<r3:id>3</r3:id>
</root>
You could use a string and convert it to XML using simplexml_load_string(). The string must be well formed.
<?php
$usernames= array(
'username01',
'username02',
'username03'
);
$xml_string = '<wsse:Usernames>';
foreach($usernames as $username ){
$xml_string .= "<wsse:Username>$username</wsse:Username>";
}
$xml_string .= '</wsse:Usernames>';
$note=
<<<XML
$xml_string
XML; //backspace this line all the way to the left
$xml=simplexml_load_string($note);
?>
If you wanted to be able to change the namespaces on each XML element you would do something very similar to what is shown above. (Form a string with dynamic namespaces)
The XML portion that I instructed you to backspace all of the way has weird behavior. See https://www.w3schools.com/php/func_simplexml_load_string.asp for an example that you can copy & paste.
If I use the following php code to convert an xml to json:
<?php
header("Content-Type:text/json");
$resultXML = "
<QUERY>
<Company>fcsf</Company>
<Details>
fgrtgrthtyfgvb
</Details>
</QUERY>
";
$sxml = simplexml_load_string($resultXML);
echo json_encode($sxml);
?>
I get
{"Company":"fcsf","Details":"\n fgrtgrthtyfgvb\n "}
However, If I use CDATA in the Details element as follows:
<?php
header("Content-Type:text/json");
$resultXML = "
<QUERY>
<Company>fcsf</Company>
<Details><![CDATA[
fgrtgrthtyfgvb]]>
</Details>
</QUERY>
";
$sxml = simplexml_load_string($resultXML);
echo json_encode($sxml);
?>
I get the following
{"Company":"fcsf","Details":{}}
In this case the Details element is blank. Any idea why Details is blank and how to correct this?
This is not a problem with the JSON encoding – var_dump($sxml->Details) shows you that SimpleXML already messed it up before, as you will only get
object(SimpleXMLElement)#2 (0) {
}
– an “empty” SimpleXMLElement, the CDATA content is already missing there.
And after we figured that out, googling for “simplexml cdata” leads us straight to the first user comment on the manual page on SimpleXML Functions, that has the solution:
If you are having trouble accessing CDATA in your simplexml document, you don't need to str_replace/preg_replace the CDATA out before loading it with simplexml.
You can do this instead, and all your CDATA contents will be merged into the element contents as strings.
$xml = simplexml_load_file($xmlfile, 'SimpleXMLElement', LIBXML_NOCDATA);
So, use
$sxml = simplexml_load_string($resultXML, 'SimpleXMLElement', LIBXML_NOCDATA);
in your code, and you’ll get
{"Company":"fcsf","Details":"\n fgrtgrthtyfgvb\n "}
after JSON-encoding it.
I've come across a weird but apparently valid XML string that I'm being returned by an API. I've been parsing XML with SimpleXML because it's really easy to pass it to a function and convert it into a handy array.
The following is parsed incorrectly by SimpleXML:
<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>
Simple XML results in:
SimpleXMLElement Object ( [CustomsID] => 010912-1 )
Is there a way to parse this in XML? Or another XML library that returns an object that reflects the XML structure?
That is an odd response with the text along with other nodes. If you manually traverse it (not as an array, but as an object) you should be able to get inside:
<?php
$xml = '<?xml version="1.0" standalone="yes"?>
<Response>
<CustomsID>010912-1
<IsApproved>NO</IsApproved>
<ErrorMsg>Electronic refunds...</ErrorMsg>
</CustomsID>
</Response>';
$sObj = new SimpleXMLElement( $xml );
var_dump( $sObj->CustomsID );
exit;
?>
Results in second object:
object(SimpleXMLElement)#2 (2) {
["IsApproved"]=>
string(2) "NO"
["ErrorMsg"]=>
string(21) "Electronic refunds..."
}
You already parse the XML with SimpleXML. I guess you want to parse it into a handy array which you not further define.
The problem with the XML you have is that it's structure is not very distinct. In case it does not change much, you can convert it into an array using a SimpleXMLIterator instead of a SimpleXMLElement:
$it = new SimpleXMLIterator($xml);
$mode = RecursiveIteratorIterator::SELF_FIRST;
$rit = new RecursiveIteratorIterator($it, $mode);
$array = array_map('trim', iterator_to_array($rit));
print_r($array);
For the XML-string in question this gives:
Array
(
[CustomsID] => 010912-1
[IsApproved] => NO
[ErrorMsg] => Electronic refunds...
)
See as well the online demo and How to parse and process HTML/XML with PHP?.
I'm aware of how to drill down into the nodes of an xml document as described here:
http://www.php.net/manual/en/simplexml.examples-basic.php
but am at a loss on how to extract the value in the following example
$xmlStr = '<Error>Hello world. There is an Error</Error>';
$xml = simplexml_load_string($xmlStr);
simplexml_load_string returns an object of type SimpleXMLElement whose properties will have the data of the XML string.
In your case there is no opening <xml> and closing </xml> tags, which every valid XML should have.
If these were present then to get the data between <Error> tags you can do:
$xmlStr = '<xml><Error>Hello world. There is an Error</Error></xml>';
$xml = simplexml_load_string($xmlStr);
echo $xml->Error; // prints "Hello world. There is an Error"
What do you know. The value of the tag is just:
$error = $xml;
Thanks for looking :)