SimpleXML outputs unicode in a strange way - php

I use simpleXML to process xml file. It has Cyrillic characters. I also use dom_import_simplexml, importNode and appendChild to copy trees from file to file and place to place.
At the end of processing I do print_r of resulting simpleXmlElement and everything is ok. But I also do asXml('outputfile.xml') and something strange is going on: all cyrillic characters that was not wrapped with CDATA (some tags bodies and all attributes) change to their unicode code.
For example, the output of print_r (just a fragment):
SimpleXMLElement Object ( [#attributes] => Array
( [NAME] => Государственный аппарат и механизм
[COSTYES] => 3.89983579639 [COSTNO] => 0
[ID] => 9 )
[COMMENTYES] => Вы совершенно правы.
[COMMENTNO] => Нет, Вы ошиблись. ) ) )
But in file that asXml generates, i get something like this:
<QUEST NAME="Теория#x434;вухмечей"
style="educ" ID="1">
<DESC><![CDATA[Теория происхождения государства, известная как теория "двух мечей" [2, с.40],
представляет из себя...
]]></DESC>`
I set utf-8 locale everywhere it's possible, googled every combination of words "simplexml, unicode, cyrillic, asXml, etc" but nothing worked.
UPD Looks like some function used does htmlentities(). So, thanks to voitcus, the solution is to use html_entity_decode() as adviced here.

I wonder you might not declare encoding when you imported xml document at first. The following two give you different output.
$simplexml = simplexml_load_string('<QUEST NAME="Государственный" />');
if (!$simplexml) { exit('parse failed'); }
print_r($simplexml->asXml());
$simplexml = simplexml_load_string('<?xml version="1.0" encoding="UTF-8"?><QUEST NAME="Государственный" />');
if (!$simplexml) { exit('parse failed'); }
print_r($simplexml->asXml());
SimpleXMLElement object knows its own encoding from the original xml declaration, and if it was not declared, it generates numerical character references for safety, I guess.

Related

SimpleXML removes tags in node

I'd like to parse a XML file which is generated by an application called Folker. It's an application to transcribe spoken text. Sometimes it saves the lines in a good format which can be parsed with SimpleXML but sometimes it doesn't.
This line is good:
<contribution speaker-reference="KU" start-reference="TLI_107" end-reference="TLI_109" parse-level="1">
<unparsed>ich überLEG mir das [nochma:l,]</unparsed>
</contribution>
This line is not:
<contribution speaker-reference="VK" start-reference="TLI_108" end-reference="TLI_111" parse-level="1">
<unparsed>[JA:_a; ]<time timepoint-reference="TLI_109"/>ja,<time timepoint-reference="TLI_110"/>also (.) wie [geSAGT;]</unparsed>
</contribution>
In the second line SimpleXML removes the tags which are inside the unparsed node.
How can I get SimpleXML to not remove these tags but parse it as deeper nodes or outputs as an object for example like this (just in JSON for better understanding):
"contribution": {
"speaker-reference": "VK",
"start-reference": "TLI_108",
"end-reference": "TLI_111",
"parse-level": "1",
"unparsed": {
"content": "[JA:_a; ]",
"time": {
[
"timepoint-reference": "TLI_109",
"content": "ja,"
],
[
"timepoint-reference": "TLI_110",
"content": "also (.) wie [geSAGT;]"
]
}
}
}
No, it does not remove them. This works flawlessly (interesting app btw):
<?php
$string = '<contribution speaker-reference="VK" start-reference="TLI_108" end-reference="TLI_111" parse-level="1">
<unparsed>[JA:_a; ]<time timepoint-reference="TLI_109"/>ja,<time timepoint-reference="TLI_110"/>also (.) wie [geSAGT;]</unparsed>
</contribution>';
$xml = simplexml_load_string($string);
$t = $xml->unparsed->time[0];
print_r($t->attributes());
?>
// output:
SimpleXMLElement Object
(
[#attributes] => Array
(
[timepoint-reference] => TLI_109
)
)
You can even iterate over them:
$times = $xml->unparsed->children();
foreach ($times as $t) {
$attributes = $t->attributes());
// do sth. useful with them afterwards
}
Hint: Assumingly, you were trying print_r() or var_dump() on the xml tree. This sometimes gives back opaque results as most of the magic happens behind the scenes. Better use echo $xml->asXML(); to see the actual XML string.

XML parsing with XPath and PHP handle empty values

while parsing through a XML tree like this:
<vco:ItemDetail>
<cac:Item>
<cbc:Description>ROLLENKERNSATZ 20X12 MM 10,5 GR.</cbc:Description>
<cac:SellersItemIdentification>
<cac:ID>78392636</cac:ID>
</cac:SellersItemIdentification>
<cac:ManufacturersItemIdentification>
<cac:ID>RMS100400370</cac:ID>
<cac:IssuerParty>
<cac:PartyName>
<cbc:Name></cbc:Name>
</cac:PartyName>
</cac:IssuerParty>
</cac:ManufacturersItemIdentification>
</cac:Item>
<vco:Availability>
<vco:Code>available</vco:Code>
</vco:Availability>
</vco:ItemDetail>
I always get a blank space which breaks my CSV structure if cbc:Name is empty, which looks like this:
"ROLLENKERNSATZ 20X12 MM 10,5 GR.";78392636;;RMS100400370;"";available
The "available" string is in a new line so my CSV is not structered any more.
My XPath array looks like this:
$columns = array('Description' => 'string(cac:Item/cbc:Description)',
'SellersItemIdentification' => 'string(cac:Item/cac:SellersItemIdentification/cac:ID)',
'StandardItemIdentification' => 'string(cac:Item/cac:StandardItemIdentification/cac:ID)',
'Availability' => 'string(vco:Availability/vco:Code)',
'Producer' => 'string(cac:Item/cac:ManufacturersItemIdentification/cac:IssuerParty/cac:PartyName/cbc:Name');
Is there any expception or replacement I can handle with like replacing the empty node value with "no producer" or something like this?
Thank you
If the value to use as the 'default' can somehow be made to exist somewhere in your input document, the problem can be solved with a quasi-coalesce like this:
e.g.
<vco:ItemDetail xmlns:vco="x1" xmlns:cac="x2" xmlns:cbc="x3">
<ValueIfNoProducer>No Producer</ValueIfNoProducer>
<cac:Item>
...
Then this Xpath 1.0 will apply a default if the Name element is empty or whitespace:
(cac:Item/cac:ManufacturersItemIdentification/cac:IssuerParty
/cac:PartyName/cbc:Name[normalize-space(.)] | ValueIfNoProducer)[1]
I think the following is possible directly in XPath 2.0, but I stand to be corrected:
(cac:Item/cac:ManufacturersItemIdentification/cac:IssuerParty
/cac:PartyName/cbc:Name[normalize-space(.)] | 'No Producer')[1]

SimpleXMLElement: asXML() "on a non-object" that really is an object

I am parsing a SVG file using SimpleXMLElement in PHP. The SVG file is carefully constructed in Adobe Illustrator follow a layer format that I am attempted to dissect.
Consider this code:
// Create an XML object out of the SVG
$svg = new SimpleXMLElement('floorplan.svg', null, true);
// Register the SVG namespace
$svg->registerXPathNamespace('svg', 'http://www.w3.org/2000/svg');
// Get the normal floorplan layer
$normal = $svg->xpath('svg:g[#id="Normal"]');
// If the normal layer has content, continue
if(count($normal) > 0) {
// If there are floors, continue
if(count($normal[0]->g > 0)) {
// Loop through each floor
foreach($normal[0]->g as $floor) {
// Declare the namespace for the floor
$floor->registerXPathNamespace('svg', 'http://www.w3.org/2000/svg');
// Select the base floorplan
$floorsvg = $floor->xpath('svg:g[#id="Base"]')[0];
var_dump($floorsvg);
echo $floorsvg->asXML(); // THIS CAUSES THE ERROR
}
}
}
When I do a var_dump on $floorsvg, it is declaring it is a SimpleXMLElement object:
object(SimpleXMLElement)[9]
public '#attributes' =>
array (size=1)
'id' => string 'Base' (length=4)
public 'g' =>
array (size=859)
0 => ...
However, when I run asXML() on the object, I am presented with the following PHP error:
PHP Fatal error: Call to a member function asXML() on a non-object
I'm uncertain why asXML() is failing, considering it is an object. Can anyone shed any light on why this problem is occurring and what I might try to remedy it?
EDIT: Adding an echo $normal->asXML(); up above results in the same error. It seems like xpath is causing the object to become malformed somehow.
EDIT 2: The SVG file being parsed can be seen here: http://pastebin.com/zK1yRFA7
There are a few issues. Your code assumes that:
$floor->xpath('svg:g[#id="Base"]')
Will return an array with at least 1 element. The <g id="Second_Floor"> element does not contain any child elements that will be matched by that XPath expression, so trying to access element 0 of an empty array will give you the error you are seeing.
Adding a simple guard expression:
// Select the base floorplan
$floorsvg = $floor->xpath('svg:g[#id="Base"]')
if (count($floorsvg) > 0) {
echo $floorsvg[0]->asXML();
}
Will resolve that. Secondly, you have some misplaced parentheses in this line:
if(count($normal[0]->g > 0)) {
It should be:
if(count($normal[0]->g) > 0) {
Appears to be just a simple typo and doesn't appear to affect the outcome of this particular script one way or another.

can't figure out SimpleXML syntax

I'm trying to parse some items from a log of soap request XML and can't seem to figure out how to get to the innards of a SimpleXMLElement. Here's an example of the XML I'm using:
<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="http://fedex.com/ws/rate/v9">
<SOAP-ENV:Body>
<ns1:RateRequest>
<ns1:WebAuthenticationDetail>
<ns1:UserCredential>
<ns1:Key>aaaaaaaaaaa</ns1:Key>
<ns1:Password>aaaaaaaaaaaaaaaaaaa</ns1:Password>
</ns1:UserCredential>
</ns1:WebAuthenticationDetail>
<ns1:ClientDetail>
<ns1:AccountNumber>11111111</ns1:AccountNumber>
<ns1:MeterNumber>88888888</ns1:MeterNumber>
</ns1:ClientDetail>
<ns1:TransactionDetail>
<ns1:CustomerTransactionId>1</ns1:CustomerTransactionId>
</ns1:TransactionDetail>
<ns1:Version>
<ns1:ServiceId>crs</ns1:ServiceId>
<ns1:Major>9</ns1:Major>
<ns1:Intermediate>0</ns1:Intermediate>
<ns1:Minor>0</ns1:Minor>
</ns1:Version>
<ns1:ReturnTransitAndCommit>true</ns1:ReturnTransitAndCommit>
<ns1:RequestedShipment>
<ns1:ShipTimestamp>2013-08-06T12:39:26-04:00</ns1:ShipTimestamp>
<ns1:DropoffType>REGULAR_PICKUP</ns1:DropoffType>
<ns1:Shipper>
<ns1:AccountNumber>11111111</ns1:AccountNumber>
<ns1:Address>
<ns1:StreetLines>24 Seaview Blvd</ns1:StreetLines>
<ns1:City>Port Washington</ns1:City>
<ns1:StateOrProvinceCode>NY</ns1:StateOrProvinceCode>
<ns1:PostalCode>11050</ns1:PostalCode>
<ns1:CountryCode>US</ns1:CountryCode>
</ns1:Address>
</ns1:Shipper>
<ns1:Recipient>
<ns1:Address>
<ns1:StreetLines>1234 Fifth Street</ns1:StreetLines>
<ns1:City>Sixton</ns1:City>
<ns1:StateOrProvinceCode>AR</ns1:StateOrProvinceCode>
<ns1:PostalCode>72712</ns1:PostalCode>
<ns1:CountryCode>US</ns1:CountryCode>
</ns1:Address>
</ns1:Recipient>
<ns1:ShippingChargesPayment>
<ns1:Payor>
<ns1:AccountNumber>11111111</ns1:AccountNumber>
<ns1:CountryCode>US</ns1:CountryCode>
</ns1:Payor>
</ns1:ShippingChargesPayment>
<ns1:RateRequestTypes>ACCOUNT</ns1:RateRequestTypes>
<ns1:PackageCount>1</ns1:PackageCount>
<ns1:PackageDetail>INDIVIDUAL_PACKAGES</ns1:PackageDetail>
<ns1:RequestedPackageLineItems>
<ns1:Weight>
<ns1:Units>LB</ns1:Units>
<ns1:Value>14</ns1:Value>
</ns1:Weight>
<ns1:Dimensions>
<ns1:Length>20</ns1:Length>
<ns1:Width>20</ns1:Width>
<ns1:Height>9</ns1:Height>
<ns1:Units>IN</ns1:Units>
</ns1:Dimensions>
</ns1:RequestedPackageLineItems>
</ns1:RequestedShipment>
</ns1:RateRequest>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
I'm trying to get out values like the CountryCode, PostalCode Weight->Value and all of the Dimensions but the namespaces are confusing me. Any time I parse it, and look in the debugger all I see for any given variable is {SimpleXMLElement} [0] yet some will show output with ->asXML() but most of my attempts to access data either error out, return false, or return another {SimpleXMLElement} [0]. I just want the string/value for these nodes!
The following for example, outputs absolutely nothing:
$fReq = simplexml_load_string($xmlRequest);
$fReq->registerXPathNamespace('ns1', 'http://fedex.com/ws/rate/v9');
$ns = $fReq->getNameSpaces(true);
$rr = $fReq->children($ns['ns1']);
echo $rr->asXML()."\n";
I prefer using XMLReader: http://us3.php.net/manual/en/class.xmlreader.php
This is a example form PHP with small modification for match your requirements:
function xmlParse($content,$wrapperName,$callback,$limit=NULL){
$xml = new XMLReader();
$xml->xml($content);
$n=0;
$x=0;
while($xml->read()){
if($xml->nodeType==XMLReader::ELEMENT && $xml->name == $wrapperName){
while($xml->read() && $xml->name != $wrapperName){
if($xml->nodeType==XMLReader::ELEMENT){
$name = $xml->name;
$xml->read();
$value = $xml->value;
if(preg_match("/[^\s]/",$value)){
$subarray[$name] = $value;
}
}
}
if($limit==NULL || $x<$limit){
if($callback($subarray)){
$x++;
}
unset($subarray);
}
$n++;
}
}
$xml->close();
}
xmlParse($test,'SOAP-ENV:Envelope','callback');
function callback($array){
var_dump($array);
}
Just execute this with you XML and will return
Array
(
[ns1:Key] => aaaaaaaaaaa
[ns1:Password] => aaaaaaaaaaaaaaaaaaa
[ns1:AccountNumber] => 11111111
[ns1:MeterNumber] => 88888888
[ns1:CustomerTransactionId] => 1
[ns1:ServiceId] => crs
[ns1:Major] => 9
[ns1:Intermediate] => 0
[ns1:Minor] => 0
[ns1:ReturnTransitAndCommit] => true
[ns1:ShipTimestamp] => 2013-08-06T12:39:26-04:00
[ns1:DropoffType] => REGULAR_PICKUP
[ns1:StreetLines] => 1234 Fifth Street
[ns1:City] => Sixton
[ns1:StateOrProvinceCode] => AR
[ns1:PostalCode] => 72712
[ns1:CountryCode] => US
[ns1:RateRequestTypes] => ACCOUNT
[ns1:PackageCount] => 1
[ns1:PackageDetail] => INDIVIDUAL_PACKAGES
[ns1:Units] => IN
[ns1:Value] => 14
[ns1:Length] => 20
[ns1:Width] => 20
[ns1:Height] => 9
)
Regards
You don't need to call registerXPathNamespace unless you're using XPath. You do, however, need to traverse through the SOAP Body element, which means you need to think about two namespaces, not just one.
It's also a good idea not to rely on namespace prefixes like ns1 staying the same in the future; the part that's guaranteed is the actual URI in the xmlns attribute. One way to make this more readable is to define constants for the XML namespaces used in your application, and pass those to the ->children() and ->attributes() methods when necessary.
Note also that ->children() "selects" a namespace until further notice, so once you've selected the Fedex namespace and traversed to the RateRequest node, you can just access elements as though namespaces weren't an issue (since everything below that in this document is in that same namespace).
Here's a completed example showing how this might look (see a live demo here).
// Give your own aliases to namespaces, don't rely on the XML always having the same ones
define('NS_SOAP', 'http://schemas.xmlsoap.org/soap/envelope/');
define('NS_FEDEX_RATE', 'http://fedex.com/ws/rate/v9');
// Parse the XML
$sx = simplexml_load_string($xml_string);
// $sx now represents the <SOAP-ENV:Envelope> element
// Traverse through the SOAP Body
$sx_soap_body = $sx->children(NS_SOAP)->Body;
// Now "switch" to the Fedex namespace, and start with the RateRequest node
$sx_rate_request = $sx_soap_body->children(NS_FEDEX_RATE)->RateRequest;
// Now traverse as normal, e.g. to the Recipient's CountryCode
// (the string cast isn't necessary for echo, but is a good habit to get into)
echo (string)$sx_rate_request->RequestedShipment->Recipient->Address->CountryCode;

trying format json a certain way in PHP using array from mysql

I am trying to build a restful web service for my website. I have a php mysql query using the following code:
function mysql_fetch_rowsarr($result, $taskId, $num, $count){
$got = array();
if(mysql_num_rows($result) == 0)
return $got;
mysql_data_seek($result, 0);
while ($row = mysql_fetch_assoc($result)) {
$got[]=$row;
}
print_r($row)
print_r(json_encode($result));
return $got;
which returns the following using the print_r($data) in the code above
Array ( [0] => Array ( [show] => Blip TV Photoshop Users TV [region] => UK [url] => http://blip.tv/photoshop-user-tv/rss [resourceType] => RSS / Atom feed [plugin] => Blip TV ) [1] => Array ( [show] => TV Highlights [region] => UK [url] => http://feeds.bbc.co.uk/iplayer/highlights/tv [resourceType] => RSS / Atom feed [plugin] => iPlayer (UK) ) )
Here is the json it returns:
[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http:\/\/blip.tv\/photoshop-user-tv\/rss","resourceType":"RSS \/ Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK","url":"http:\/\/feeds.bbc.co.uk\/iplayer\/highlights\/tv","resourceType":"RSS \/ Atom feed","plugin":"iPlayer (UK)"}]
I am using the following code to add some items to the array then convert it to json and return the json.
$got=array(array("resource"=>$taskId,"requestedSize"=>$num,"totalSize"=>$count,"items"),$got);
using the following code to convert it to json and return it.
$response->body = json_encode($result);
return $response;
this gives me the following json.
[{"resource":"video","requestedSize":2,"totalSize":61,"0":"items"},[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http:\/\/blip.tv\/photoshop-user-tv\/rss","resourceType":"RSS \/ Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK","url":"http:\/\/feeds.bbc.co.uk\/iplayer\/highlights\/tv","resourceType":"RSS \/ Atom feed","plugin":"iPlayer (UK)"}]]
The consumers of the API want the json in the following format and I cannot figure out how to get it to come out this way. I have searched and tried everything I can find and still not get it. And I have not even started trying to get the xml formatting
{"resource":"video", "returnedSize":2, "totalSize":60,"items":[{"show":"Blip TV Photoshop Users TV","region":"UK","url":"http://blip.tv/photoshop-user-tv/rss","resourceType":"RSS / Atom feed","plugin":"Blip TV"},{"show":"TV Highlights","region":"UK", "url":"http://feeds.bbc.co.uk/iplayer/highlights/tv","resourceType":"RSS / Atom feed","plugin":"iPlayer (UK)"}]}
I appreciate any and all help with this. I have setup a copy of the database with readonly access and can give all the source code it that will help, I will warn you that I am just now learning php, I learned to program in basic, fortran 77 so the php is pretty messy and I would guess pretty bloated.
OK The above about json encoding was answered. The API consumers also want the special character "/", not to be escaped since it is a URL. I tried the "JSON_UNESCAPED_SLASHES " in the json_encode and got the following error.
json_encode() expects parameter 2 to be long
Your $result line should look like
$result=array(
"resource"=>$taskId,
"requestedSize"=>$num,
"totalSize"=>$count,
"items" => $got
);

Categories