Generate an XML file with ZF2 - php

I see when i started to learn ZF2 that we can use Zend\Config\Writer to write config in XML and put it into a file. I saw the PhP class DomDocument too. And i saw also SimpleXMLDocument.
I have to make a complex and extremely heavy XML with a lot of data from my Database.
With DOMDocument, for 30% of the work, my code is already too complicated and not maintainable anymore.
This is a sample of how my XML has to be :
//...a lot more XML before
<Lines>
<line lineNum="Num">
</line>
<line lineNum="Num2">
</line>
</Lines>
//A lot more XML after
Those line can be created by a foreach loop. Zend\Config\Writer can't do that (especially for attribute parts) does it ?
My question is :
Is exist a better way that i don't know yet for generate an XML with Zend Framework 2 ?
P.S :I'm looking for Object oriented programming solution.
Thanks for help.

I'm looking for an ZF2 internal solution too, #Hooli. Hoping to get some answers on Twitter too:
What is the best way to generate / write #XML in #ZF2? Make use of \Zend\Feed\Writer or an external class like #SimpleXML? Any known module?
Maybe one can use the Zend\Feed\Writer for these cases as a workaround but I would prefer a ZF2 module or similar.
Otherwise #Tim is right, SimpleXML is a good solution. For bigger files XMLWriter is recommended.
Click here for a comparison of the two and DOM.

I just took a look at the Zend\Config\Writer\Xml class.
Fortunately an / the XMLWriter is already included via PHP Extension!
You can easily inspect the processConfig method() and adapt it to your needs. For my use case I even re-used the complete addBranch method().
About XMLWriter:
PHP Manual
This extension represents a writer that provides a non-cached, forward-only means of generating streams or files containing XML data.

Related

Load an XLSX spreadsheet having XML namespaced

I have a set of XLSX files that PhpSpreadsheet cannot load, because simplexml_load_string returns an empty SimpleXMLelement from (for instance) the workbook XML file.
The file has the following format, that can be loaded by simplexml after removing all occurrences of the x: namespace, and the declaration itself (that is, for instance, the <x:workbook> tag has been converted to <workbook>).
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<x:workbook xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:fileVersion appName="xl" lastEdited="7" lowestEdited="4" rupBuild="23801" />
<x:workbookPr codeName="ThisWorkbook" />
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice Requires="x15">
<x15ac:absPath xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac" url=".........." />
</mc:Choice>
</mc:AlternateContent>
<xr:revisionPtr revIDLastSave="0" documentId=".........." xr6:coauthVersionLast="46" xr6:coauthVersionMax="46" xr10:uidLastSave="{00000000-0000-0000-0000-000000000000}" />
<x:bookViews>
<x:workbookView xWindow="-120" yWindow="-120" windowWidth="29040" windowHeight="15840" xr2:uid="{00000000-000D-0000-FFFF-FFFF00000000}" />
</x:bookViews>
<x:sheets>
<x:sheet name="......" sheetId="1" r:id="rId1" />
</x:sheets>
<x:calcPr calcId="191029" />
</x:workbook>
I'm not sure the XML file is wrong, since the XLSX file(s) can be opened - for instance - with Libre Office. Anyway, have managed to load the file(s) hacking a simple minded function cleanup_xml() in Xlsx.php:
//~ http://schemas.openxmlformats.org/spreadsheetml/2006/main"
$xmlWorkbook = simplexml_load_string(
cleanup_xml($this->securityScanner->scan($this->getFromZipArchive($zip, "{$rel['Target']}"))),
'SimpleXMLElement',
Settings::getLibXmlLoaderOptions()
);
Maybe there is a proper/clean way to force simplexml API to load such files ?
edit:
I was wrong thinking all problems were gone after the cleanup_xml hack.
Seems that also the data rows XML file has problems, probably the same as above...
edit:
Indeed, I moved cleanup_xml() into XmlScanner::scan, to apply to every loaded XML, and now seems to work...
edit:
Seems the namespace declaration is correct, at least, from this simple example...
Then, I wonder why simplexml_load_string doesn't accept the format:
<x:workbook ... xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
</x:workbook>
while it apparently accepts
<workbook ... xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
....
<workbook>
edit
Have digged into simplexml API, this answer helped to understand the problem. Now I can try to rewrite my hackish cleanup_xml accounting for namespaces... Just wondering if PhpSpreadsheet offers a better way... seems strange this problem has been unnoticed before...
edit
ok, now I've found the bug report...
This appears to be a bug in PhpSpreadsheet.
Opening an XLSX file I created this week with a real copy of Microsoft Excel, the "workbook.xml" starts like this:
<workbook
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
This declares eight different namespaces that will be used in the document. One happens to be defined as the "default namespace", and the other seven are assigned prefixes - but all of that is just local to this specific file.
If we look at your XML document, we can see all the same namespaces in use, plus an extra one:
<x:workbook
xmlns:x15ac="http://schemas.microsoft.com/office/spreadsheetml/2010/11/ac"
xmlns:r="http://schemas.openxmlformats.org/officeDocumen/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml2016/revision10"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
mc:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
The only difference is that the namespace "http://schemas.openxmlformats.org/spreadsheetml/2006/main" has been assigned prefix "x", rather than set as the default namespace, but that makes no difference to its meaning. A different library might label the namespaces completely differently, just because of the way it generates the XML:
<ns0:workbook
xmlns:ns0="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:ms1="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:ns2="http://schemas.openxmlformats.org/markup-compatibility/2006"
ns2:Ignorable="x15 xr xr6 xr10 xr2"
xmlns:ns3="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main"
xmlns:ns4="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:ns5="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6"
xmlns:ns6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10"
xmlns:ns7="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2">
As explained in this reference answer, SimpleXML's namespace handling is based around using the ->children() method to select the namespace you want to work with. The correct way to use this is to always specify the namespace URI you want, e.g. "http://schemas.openxmlformats.org/spreadsheetml/2006/main" or "http://schemas.microsoft.com/office/spreadsheetml/2016/revision10".
However, because the same program generally creates XML documents with the same choice of prefixes, it's easy to write incorrect code which relies on:
A particular namespace being the default, and therefore selected before you first call ->children()
Particular namespaces being bound to particular prefixes, and therefore selectable by looking up that prefix
The author of PhpSpreadsheet appears to have made both mistakes, meaning that when you try to load a document created by a different program, it doesn't find the namespaces it expects even though they're actually there.

read a password-protected page

I'm trying to read specific div-elements of a website with a script either written in php or perl.
Unfortunately, the page requests a login before those specific site can be read. As I can see, it's ssl-protected. I'm not looking for a complete solution, I just need a hint regarding the best way to tell the script the informations needed for logging in (user+password), before reading parts of the sourcecode of the page that comes afterwards.
I'm not quite sure if it's better to do this with PERL or PHP, so i have tagged this question with both of these languages.
Mojo::UserAgent (see cookbook) has a built-in cookie jar and can do SSL if you have IO::Socket::SSL installed. It has a DOM parser which can easily use CSS3 selectors to traverse the returned result. And if that wasn't good enough, the whole thing can be used non-blocking (if that's something you need).
Mojo::UserAgent and the other tools listed above are parts of the Mojolicious suite of tools. It's a Perl library, and I would certainly recommend Perl for this task since it is a more general purpose language than PHP is.
Here is a very simplistic example to get the text from all the links that are inside a div with a class myclass
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
$ua->post( 'http://mysite.com/login' => form => { ... } );
my #link_text =
$ua->get( 'http://mysite.com/protected/page' )
->res
->dom('div.myclass a')
->text
->each;
In fact, running this shell command may be enough to get you started (depending on permissions)
curl -L cpanmin.us | perl - -n Mojolicious IO::Socket::SSL

Parsing Wordpress XML file in PHP

Im migrating big Wordpress page to custom CMS. I need to extract information from big (20MB+) XML file, exported from Wordpress.
I don't have any experience in XML under PHP and i don't know how to start reading file.
Wordpress file contains structures like this:
<excerpt:encoded><![CDATA[Encoded text here]]></excerpt:encoded>
and i don't know how to handle this in PHP.
You are probably going to do fine with simplexml:
$xml = simplexml_load_file('big_xml_file.xml');
foreach ($xml->element as $el) {
echo $el->name;
}
See php.net for more info
Unfortunately, your XML example didn't come through.
PHP5 ships with two extensions for working with XML - DOM and "SimpleXML".
Generally speaking, I recommend looking into SimpleXML first since it's the more accessible library of the two.
For starters, use "simplexml_load_file()" to read an XML file into an object for further processing.
You should also check out the "SimpleXML basic examples page on php.net".
I don't have any experience in XML under PHP
Take a look at simplexml_load_file() or DomDocument.
<excerpt:encoded><![CDATA[Encoded text here]]></excerpt:encoded>
This should not be a problem for the XML parser. However, you will have a problem with the content exported by WordPress. For example, it can contain WordPress shortcodes, which will come across in their raw format instead of expanded.
Better Approach
Determine if what you are migrating to supports an export from WordPress feature. Many other systems do - Drupal, Joomla, Octopress, etc.
Although Adam is Absolutely right, his answer needed a bit more details. Here's a simple script that should get you going.
$xmlfile = simplexml_load_file('yourxmlfile.xml');
foreach ($xmlfile->channel->item as $item) {
var_dump($item->xpath('title'));
var_dump($item->xpath('wp:post_type'));
}
simplexml_load_file() is the way to go creating an object, but you will also need to use xpath as WordPress uses name spaces. If I remember correctly SimpleXML does not handle name space well or at all.
$xml = simplexml_load_file( $file );
$xml->xpath('/rss/channel/wp:category');
I would recommend looking at what WordPress uses for importing the files.
https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/class-wp-importer.php

MultiLenguage XML for php integration

i have a web site, in where i want to show some strings that may change according to the user lenguage and other parameters. I was thinking in a xml file like:
<strings>
<EN>
<userop1>This is the option 1<userop2>
</EN>
<ES>
<userop1>Esta es la opcion 1<userop1>
</ES>
</strings>
Then, using php something like: echo("You select: ".$userop1);
I really dont know if this is the most inteligent way to strutture the xml, so im asking for suggestiona (please with an example reading script). Thanks for any help!
Why are you using XML, this is an OVER HEAD in performance.
you should use Constants or Arrays.
$lang['en']['title'] = "title";
or separate files for each constants set/language
file: tranlate.en.php
defile('TITLE' , 'title');
since PHP is stateless, every page hit in your app will cause the system to parse this string.
no need for that
I think you shouldn't have all languages in a single xml file - it may get too big, will be harder to maintain, and so. Instead, make a xml for each language.

Creating a "two way" configuration file

I am writing a PHP application targeted at non-geeks, non-programmers. I need to create an option page with a bunch of "options" and then store those options...somewhere. Using a database application (MySQL/PostgreSQL/SQLite) is out of the question because it will require more configuration than the user needs to do (I don't want the user to do any kind of configuration if he doesn't want to). So the only solution left is to write the configuration to a configuration file. On the other hand, I also want that configuration file to be human-readable in case the user is a geek and he wants to edit the config file directly (or if he wants to edit the file remotely via SSH or any kind of reason...)
Here are the couple of potential solutions I found:
Using a JSON file...
...Retrieve the data from the file, using json_decode to convert the data, output it into HTML, retrieve any changes, encode back using json_encode, etc. You get the picture. There are a couple things that I don't like about this method, the main one being that the encoded JSON data using PHP will no be well formatted and very hard to edit without being reformatted beforehand.
Using an XML file
I won't describe that solution because I don't really like it either...and I don't know how to use XSLT and don't really want to learn...and because it's a pretty heavyweight solution, at least compared to the JSON solution. Correct me if I'm wrong.
Using an INI file
I love INI files, really I love them! I think they're really the most readable, and it's hard to mess up (ie: syntax errors). The problem with that solution is that there is no native way to write/edit an ini file. I found a topic showing a custom method to write one...that might be the solution I will adopt if I don't find anything better...
Using two files
That last solution seems as reasonable as the INI solution. In fact, I could use an INI file as "input" (the file that the user would edit if he wants to) and an XML/JSON file as output (the file that will be edited by PHP every time the user changes options using the web front-end). At this point, the best solution would be to ask the user to reload the configuration manually if he edited the config file directly, so that the "output" file is always up to date.
I know none of the solutions above are perfect, and that's why I created this topic to ask for advice. What is the best solution? Maybe (probably) I missed yet another solution.
One last thing: YAML isn't a valid solution because it's a lot easier to mess up the syntax if you're not used to it. PHP is not a solution either because editing PHP with PHP is a pain. PHP is only a good solution if I want to retrieve some configuration but not edit it directly via a web front-end.
ini
I'd write ini files, myself. As you said, the syntax is very simple, and that's what you want in a config file. The ini format's "key+value" pairing is exactly what you'd get with a database—without the database.
Related SO you may have seen already: create ini file, write values in PHP
Plus you can use parse_ini_file() to read it.
XML
XML isn't all that bad. It may be more work to write it (and may not be as clear to the user as an ini file), but reading it is really easy.
Create it:
<?php
// Create file
$xml = new SimpleXMLElement( '<?xml version="1.0" ?><config></config>' );
// Add stuff to it
$xml->addChild( 'option1' );
$xml->option1->addAttribute( 'first_name', 'billy' );
$xml->option1->addAttribute( 'middle_name', 'bob' );
$xml->option1->addAttribute( 'last_name', 'thornton' );
$xml->addChild( 'option2' );
$xml->option2->addAttribute( 'fav_dessert', 'cookies' );
// Save
$xml->asXML( 'config.xml' );
?>
Read it:
<?php
// Load
$config = new SimpleXMLElement( file_get_contents( 'config.xml' ) );
// Grab parts of option1
foreach( $config->option1->attributes() as $var )
{
echo $var.' ';
}
// Grab option2
echo 'likes '.$config->option2['fav_dessert'];
?>
Which gives you:
billy bob thornton likes cookies
Documentation for SimpleXML
SimpleXML Docs index
Basic Examples
Details on addChild() and addAttribute(), showing how to generate various XML structures (nested tags vs. attributes, for example)
I'd go with the ini. They're really not that hard to write. I personally hate XML. It's so bloated... even if the file size doesn't matter, it still makes me cringe at it's wordiness and the amount of typing I have to do. Plus, people are dumb. They won't close their tags.
The standard way would be XML files. They don't create that much overhead and are easily extensible. However, JSON files are the easiest on the programming end.
I'd rank my preference:
XML
JSON
ini (last resort)
Unless you have 1000+ options, you really shouldn't worry about the XML file size. The goal here is to keep things easy for the user. This means that whichever method you choose (JSON shouldn't be one of them in my opinion), it should be heavily documented at each config line.
Your two file solution brings me back to the days of sendmail config and makes me shudder.
I would just go with XML, it's self documenting to a point <Email>hi#hi.hi</Email>
Well, you could use PHP's serialize(), and although it is human readable, it isn't the most human readable thing there is. It's on the same level as JSON to implement.

Categories