How to manage JMS Serializer serialization rules - php

I am using xsd2php library to parse XSD which describes API request body. Then using the same library (which itself uses jsm-serializer) I try to serialize objects:
$payload = new TrackRequest;
$searchCriteria = new SearchCriteriaAType;
$searchCriteria->addToConsignmentNumber(11111);
$payload->setSearchCriteria($searchCriteria);
$levelOfDetail = new LevelOfDetailAType;
$levelOfDetail->setSummary(true);
$payload->setLevelOfDetail($levelOfDetail);
Using basic serializer settings:
$serializerBuilder = SerializerBuilder::create();
$serializerBuilder->addMetadataDir(__DIR__ . '/../../metadata/Tracking', 'TNTExpressConnect\Tracking\XSD');
$serializerBuilder->setPropertyNamingStrategy(new IdenticalPropertyNamingStrategy);
$serializerBuilder->configureHandlers(function (HandlerRegistryInterface $handler) use ($serializerBuilder) {
$serializerBuilder->addDefaultHandlers();
$handler->registerSubscribingHandler(new BaseTypesHandler()); // XMLSchema List handling
$handler->registerSubscribingHandler(new XmlSchemaDateHandler()); // XMLSchema date handling
});
Serialization results in:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<searchCriteria>
<account/>
<alternativeConsignmentNumber/>
<consignmentNumber>
<entry><![CDATA[11111]]></entry>
</consignmentNumber>
<customerReference/>
<pieceReference/>
</searchCriteria>
<levelOfDetail>
<summary>true</summary>
</levelOfDetail>
</result>
Regarding this results I have several questions:
Why the root element is <result> and not <TrackRequest>?
How to get rid of CDATA?
How to get rid of <entry> tags in favor of creating separate consigmentNumber tag for each entry?
How to replace <summary>true</summary> with self-closing tag <summary/>
I guess for every one of this cases I can create a dedicated handler, but maybe there is a built-in solution, which I overlooked in the documentation (maybe some config options that can be placed in yaml).
And if I have to create handlers maybe someone can point me the more sophisticated example, that explains how to do it right.
I'm not a big fan of annotations, so I would prefer to use separate config files.
Thank you in advance.

You should have a look ar the YAML Reference. A lot of things can be set up with the meta data files.
To change the "result" to "TrackRequest" add this line to the file:
Vendor\MyBundle\Model\ClassName:
xml_root_name: TrackRequest ## Changes the root element
To get rid of cdata in entry change the property:
properties:
entry:
xml_element:
cdata: false ## Add this to disable cdata tags
Just came accross the same problems as you did. I hope it helps.

Related

Load an XLSX spreadsheet having XML namespaced

I have a set of XLSX files that PhpSpreadsheet cannot load, because simplexml_load_string returns an empty SimpleXMLelement from (for instance) the workbook XML file.
The file has the following format, that can be loaded by simplexml after removing all occurrences of the x: namespace, and the declaration itself (that is, for instance, the <x:workbook> tag has been converted to <workbook>).
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<x:workbook xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:fileVersion appName="xl" lastEdited="7" lowestEdited="4" rupBuild="23801" />
<x:workbookPr codeName="ThisWorkbook" />
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice Requires="x15">
<x15ac:absPath xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" url=".........." />
</mc:Choice>
</mc:AlternateContent>
<xr:revisionPtr revIDLastSave="0" documentId=".........." xr6:coauthVersionLast="46" xr6:coauthVersionMax="46" xr10:uidLastSave="{00000000-0000-0000-0000-000000000000}" />
<x:bookViews>
<x:workbookView xWindow="-120" yWindow="-120" windowWidth="29040" windowHeight="15840" xr2:uid="{00000000-000D-0000-FFFF-FFFF00000000}" />
</x:bookViews>
<x:sheets>
<x:sheet name="......" sheetId="1" r:id="rId1" />
</x:sheets>
<x:calcPr calcId="191029" />
</x:workbook>
I'm not sure the XML file is wrong, since the XLSX file(s) can be opened - for instance - with Libre Office. Anyway, have managed to load the file(s) hacking a simple minded function cleanup_xml() in Xlsx.php:
//~ http://schemas.openxmlformats.org/spreadsheetml/2006/main"
$xmlWorkbook = simplexml_load_string(
cleanup_xml($this->securityScanner->scan($this->getFromZipArchive($zip, "{$rel['Target']}"))),
'SimpleXMLElement',
Settings::getLibXmlLoaderOptions()
);
Maybe there is a proper/clean way to force simplexml API to load such files ?
edit:
I was wrong thinking all problems were gone after the cleanup_xml hack.
Seems that also the data rows XML file has problems, probably the same as above...
edit:
Indeed, I moved cleanup_xml() into XmlScanner::scan, to apply to every loaded XML, and now seems to work...
edit:
Seems the namespace declaration is correct, at least, from this simple example...
Then, I wonder why simplexml_load_string doesn't accept the format:
<x:workbook ... xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
</x:workbook>
while it apparently accepts
<workbook ... xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
<workbook>
edit
Have digged into simplexml API, this answer helped to understand the problem. Now I can try to rewrite my hackish cleanup_xml accounting for namespaces... Just wondering if PhpSpreadsheet offers a better way... seems strange this problem has been unnoticed before...
edit
ok, now I've found the bug report...
This appears to be a bug in PhpSpreadsheet.
Opening an XLSX file I created this week with a real copy of Microsoft Excel, the "workbook.xml" starts like this:
<workbook
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
This declares eight different namespaces that will be used in the document. One happens to be defined as the "default namespace", and the other seven are assigned prefixes - but all of that is just local to this specific file.
If we look at your XML document, we can see all the same namespaces in use, plus an extra one:
<x:workbook
xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac"
xmlns:r="http://schemas.openxmlformats.org/officeDocumen/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
The only difference is that the namespace "http://schemas.openxmlformats.org/spreadsheetml/2006/main" has been assigned prefix "x", rather than set as the default namespace, but that makes no difference to its meaning. A different library might label the namespaces completely differently, just because of the way it generates the XML:
<ns0:workbook
xmlns:ns0="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:ms1="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:ns2="http://schemas.openxmlformats.org/markup-compatibility/2006"
ns2:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:ns3="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:ns4="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:ns5="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:ns6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:ns7="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
As explained in this reference answer, SimpleXML's namespace handling is based around using the ->children() method to select the namespace you want to work with. The correct way to use this is to always specify the namespace URI you want, e.g. "http://schemas.openxmlformats.org/spreadsheetml/2006/main" or "http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".
However, because the same program generally creates XML documents with the same choice of prefixes, it's easy to write incorrect code which relies on:
A particular namespace being the default, and therefore selected before you first call ->children()
Particular namespaces being bound to particular prefixes, and therefore selectable by looking up that prefix
The author of PhpSpreadsheet appears to have made both mistakes, meaning that when you try to load a document created by a different program, it doesn't find the namespaces it expects even though they're actually there.

Tell XMLReader to ignore provided namespace info

I'm using an instance of PHPs built-in XMLReader to read some kind of user-generated XML file. Usually this XML files content starts like the following sample snippet, where everything works fine:
<?xml version="1.0" encoding="UTF-8"?>
<openimmo>
<uebertragung art="OFFLINE" umfang="VOLL" version="1.2.7" (...)
However, another user uses a different software to send and generate the XML file. The XML generated by this software starts like:
<?xml version="1.0" encoding="UTF-8"?>
<openimmo xmlns="http://www.openimmo.de" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openimmo.de openimmo.xsd">
<uebertragung art="OFFLINE" umfang="VOLL" version="1.2.7" (...)
Which causes my importer to fail with the following error:
XMLReader::read(): Element '{http://www.openimmo.de}openimmo': No matching global declaration available for the validation root.
I'm already doing validation by manually applying some XSD schema. The passed file follows the same schema, just explicitly specifies the xmlns attributes. How can I work around this issue? How can I tell XMLReader to just ignore that xmlns statement?
My code (simplified to the relevant sections) looks like the following snippet:
$reader = new XMLReader();
$success = #$reader->open($path);
if (!$success) { /* error handling */ }
$reader->setSchema($localOpenImmoXsdPath);
/* then starts reading and throws the above exception */
Namespace information is fundamental and there's no way an XML parser is going to ignore it.
Your options are either (a) send the file back to sender, saying it doesn't conform to the agreed schema, or (b) transform the file sent to you so that it does conform, by changing the namespace. That's a fairly simple XSLT transformation.
My immediate instinct was to look at the OpenImmo specs to see what they say about namespaces and schema conformance, but unfortunately access to the specs requires registration and licensing. Basically, either the specs allow both these formats, which would be a pretty shoddy spec, or they only allow one of them, in which case you shouldn't be accepting both.

SimpleXML parent - child issue

i am having an issue with parsing an XML file using SimpleXML and PHP.
The XML file in question is provided by a third party and includes a number of child elements (going down multiple levels) within it. I know which elements i require and can see them within the XML file, but i just can't seem to get them to print using PHP.
Example XML feed for test.xml:
<?xml version="1.0" encoding="utf-8"?>
<Element1 xmlns="" release="8.1" environment="Production" lang="en-US">
<Element2>
<Element3>
<Element4>
<Element5>it worked</Element5>
</Element4>
</Element3>
</Element2>
</Element1>
The file only includes one of each attribute so i can be very particular with the request, the code i have so far is below:
$lib=simplexml_load_file("test.xml");
$make=$lib->Element1->Element2->Element3->Element4->Element5;
print $make;
I have tried to look this up before asking, but the only solutions i can see are when the child attributes are unknown or there are multiple results for each request, which is not the case in this instance.
Any help or guidance would be greatly received.
Thanks
In your code above, $lib is Element1. So you just need to drop one of your references. This:
$make=$lib->Element1->Element2->Element3->Element4->Element5;
Should become this:
$make=$lib->Element2->Element3->Element4->Element5;
Also, SimpleXML is an awful awful awful awful interface (considering that "Simple" is in the name and there is mass confusion about how to use it). I would always recommend DOMDocument instead.
I'd strongly recommend using xpath as it will give you more flexibility e.g. Allow you to restrict results based on xml node attributes.
$xml = simplexml_load_string('<?xml version="1.0" encoding="utf-8"?>
<Element1 xmlns="" release="8.1" environment="Production" lang="en-US">
<Element2>
<Element3>
<Element4>
<Element5>it worked</Element5>
</Element4>
</Element3>
</Element2>
</Element1>');
$data=$xml->xpath('/Element1/Element2/Element3/Element4/Element5');
echo (string)$data[0]; //outputs 'it worked'
//this also works
$data=$xml->xpath('//Element5');
echo (string)$data[0]; //outputs 'it worked'

Update DomNode pointers when inserting to another document

I have a DOM structure which acts as a template for building a larger document. The template looks something like this (oversimplified example)
<book> // $cache[0]
<data></data>
<author></author> // $cache[1]
<published>
<company></company> // $cache[3]
<date></date>
</published>
<blurb></blurb>
<related></related> // $cache[2]
</book>
As you can hopefully see, I cache certain nodes within this template with the hope of doing expensive searches only once. (XPath is unusable in this situation due to the strict standards of the template.)
The above template will be added to a document looking like this:
<store>
<genre>
<computing>
// Insert here
</computing>
<nature>
// Again here
</nature>
</genre>
</store>
Basically, it can be inserted anywhere. The problem I can't figure out how to solve is how to keep or quickly update the cache points after the template has been inserted with methods like appendChild and insertBefore. The only solution I can see is to re-search the inserted node, but like I mentioned, this is expensive and certain tags which aided the first search will have been removed.
I find the insert points similar to any template engine, by iterating the dom and perform actions on certain handlers eg. {{book}} will request the above template be inserted.
The cache is simply an array of DomNodes but this can easily be changed if there is a better cross document method. I'm open to suggestions or pointers to code that have implemented similar.
I solved this by not caching the DomNode but rather a path to the node. I first looked at getNodePath() which returns an XPath to the node, but just looking at the returned path I saw that XPath must do a lot of branching under the hood. So I came up with this:
foreach ( $node->childNodes as $child ) {
$index++;
$path = $path . "->childNodes->item($index)";
}
Then after inserting the node into the second document, those cache points can be quickly referenced by
eval("\$node = \$node$path;");

Cakephp generating xml error - blank space

I am trying to generate a dynamic xml document in CakePHP to output to the browser.
Here is my controller code:
Configure::write ('debug', 0);
$this->layout = null;
header('Content-type: text/xml');
echo "<?xml version=\"1.0\"?>";
View is something like this:
<abc>
something
</abc>
The output is probably as expected:
<?xml version="1.0"?><abc>something</abc>
The only problem is that there is a space before <?xml giving me an error:
XML Parsing Error: XML or text declaration not at start of entity
Line Number 1, Column 2:
<?xml version="1.0"?><abc> something </abc>
-^
I know this problem in PHP, when you have php-start and end tags it leaves a space and creates problems, so, I tried to move the line echo "<?xml ver... to controller from the view to avoid that but it didn't help.
Thanks in advance.
-happyhardik
Yes, the problem should be an space after the php end tag somewhere.
As the php end tag is not mandatory, remove any end tag in all your models (if there're any), the controller you're asking about, from app_controller.php and app_model.php and from your view helpers... It should be somewhere but it is not easy to find
EDIT: In fact it could be also an space before the php begin tag, look into those files and check that the begin tag is at the absolute beginning of the file
EDIT AGAIN: There are people that have created some scripts for doing that automatically for you, take a look to:
http://ragrawal.wordpress.com/2007/11/07/script-for-removing-blank-spaces-before-and-after-php-tags/
Actually, I find that it is most often a space AFTER the closing ?> tag in the layout file.
Also you should know that if you use the RequestHandler component and Router::parseExtensions( 'xml' ) in your routes.php you will automatically get the XmlHelper for use in your xml views.
The XmlHelper has a few neat functions in it. Check it out.
<?php
echo( $xml->header( ));
// outputs <?xml version="1.0" encoding="UTF-8" ?>
?>
The links for RequestHandler Component and the XmlHelper
http://book.cakephp.org/view/174/Request-Handling
http://book.cakephp.org/view/380/XML
Even though this does not answer the question directly. I thought it would be worth mentioning how easy it is to create dynamic XML views automatically using the CakePHP JSON and XML views helper, just in case people don't want to be doing it manually as seem to be the case above.
Step one: Add Router::parseExtensions(); to your routes.php file
Step two: Ensure the RequestHandler component is included in the relevant countroller by adding public $components = array('RequestHandler');
Step three: Now we only have to load some data and then display the data as XML or JSON automatically. Add something like the below:
public function xml_view () {
$this->set('data_array', $this->Model->find('all'));
$this->set('_serialize', array('data_array'));
}
That's literally all we need to do to generate an XML or JSON respone for the xml_view action. Not even necessary to set up a view file. When your request is .../controller/xml_view.xml then CakePHP will return an XML document, and when .json is the extension, a JSON response will be generate. So easy I can't believe it!

Categories