Why use an XML parser? - php

I'm a somewhat experienced PHP scripter, however I just dove into parsing XML and all that good stuff.
I just can't seem to wrap my head around why one would use a separate XML parser instead of just using the explode function, which seems to be just as simple. Here's what I've been doing (assuming there is a valid XML file at the path xml.php):
$contents = file_get_contents("xml.php");
$array1 = explode("<a_tag>", $contents);
$array2 = explode("</a_tag>", $array1[1]);
$data = $array2[0];
So my question is, what is the practical use for an XML parser if you can just separate the values into arrays and extract the data from that point?
Thanks in advance! :)

Excuse me for not going into details but for starters try parsing
$contents = '<a xmlns="urn:something">
<a_tag>
<b>..</b>
<related>
<a_tag>...</a_tag>
</related>
</a_tag>
<foo:a_tag xmlns:foo="urn:something">
<![CDATA[This is another <a_tag> element]]>
</foo:a_tag>
</a>';
with your explode-approach. When you're done we can continue with some trickier things ;-)

In a nutshell, its consistency. Before XML came into wide use there were numerous undocumented formats for keeping information in files. One of the motivators behind XML was to create a well defined, standard document format. With this well defined format in place, a general set of parsing tools could be developed that would work consistently on documents so long as the documents adhered to the aforementioned well defined format.
In some specific cases, your example code will work. However, if the document changes
...
<!-- adding an attribute -->
<a_tag foo="bar">Contents of the Tag</a_tag>
...
...
<!-- adding a comment to the contents -->
<a_tag>Contents <!-- foobar --> of the Tag</a_tag>
...
Your parsing code will probably break. Code written using a correctly defined XML parser will not.

XML parsers:
Handle encoding
May have xpath support
Allow you to easily modify and save the XML; append/remove child nodes, add/remove attributes, etc.
Don't need to load the whole file into memory (except from DOM parsers)
Know about namespaces
...

How would you explode the same file if a_tag had an attribute?
explode("<a_tag>" ... will work differently than explode("<a_tag attr='value'>" ..., after all.
XML Parsers understand the XML specification. Explode can only handle the simplest of cases, and will most likely fail in a lot of instances of that case.

Using a proven XML parsing method will make the code more maintainable and easy to read. It will also make it more easily adaptable should the schema change, and it can make it easier to determine error conditions. XPath and XSLT exist for a reason, they are proven ways to deal with XML data in a sensible, legible manner. I'd suggest you use whichever is applicable in your given situation. Remember, just because you think you're only writing code for one specific purpose, you never know what a piece of well-written code could evolve into.

Related

SimpleHtmlDom Edit and Save file

http://simplehtmldom.sourceforge.net/
Looking to parse through an HTML file, make some small changes, and overwrite the current file with the updates. Was wondering if this was possible through simplehtmldom
As of right now, I can access everything, but have no way of displaying the entirely of the HTML With all of the changes included. Is it only possible to grab specific values?
If so, what other methods could I use to accomplish this? I'm afraid the environment I have to work in is extremely limited due to security issues.
The simplehtmldom object can be used as a string, and will just contain the full contents of the modified document.
For example:
echo($html);
file_put_contents($filename, $html);
If you prefer an object-oriented method, you can use save():
$updated = $html->save();
or
$html->save($filename);

Changing/deleting html from file_get_contents

I'm currently using this code:
$blog= file_get_contents("http://powback.tumblr.com/post/" . $post);
echo $blog;
And it works. But tumblr has added a script that activates each time you enter a password-field. So my question is:
Can i remove certain parts with file_get_contents? Or just remove everything above the <html> tag? could i possibly kill a whole div so it wont load at all? And if so; how?
edit:
I managed to do it the simple way. By skipping 766 characters. The script now work as intended!
$blog= file_get_contents("powback.tumblr.com/post/"; . $post, NULL, NULL, 766);
After file_get_contents returns, you have in your hands a string. You can do anything you want to it, including cutting out parts of it.
There are two ways to actually do the cutting:
Using string functions like str_replace, preg_replace and others; the exact recipe depends on what you need to do. This approach is kind of frowned upon because you are working at the wrong level of abstraction, but in some cases it has an unmatched performance to time spent ratio.
Parsing the HTML into a DOM tree, modifying it appropriately (this time working at the appropriate level of abstraction) and then turn it back into a string and echo it. This can be more convenient to work with if your requirements are not dead simple and is easier to maintain, but it typically requires more code to be written.
If you want to do something that's most naturally expressed in HTML document terms ("cutting out this <div>") then don't be tempted and go with the second approach.
At that point, $blog is just a string, so you can use normal PHP functions to alter it. Look into these 2:
http://php.net/manual/en/function.str-replace.php
http://us2.php.net/manual/en/function.preg-replace.php
You can parse your output using simple html dom parser and display olythe contents thatyou really want to display

PHP library for HTML output

Is there an standard output library that "knows" that php outputs to html?
For instance:
var_dump - this should be wrapped in <pre> or maybe in a table if the variable is an array
a version of echo that adds a "<br/>\n" in the end
Somewhere in the middle of PHPcode I want to add an H3 title:
.
?><h3><?= $title ?></h3><?
Out of php and then back in. I'd rather write:
tag_wrap($title, 'h3');
or
h3($title);
Obviously I can write a library myself, but I would prefer to use a conventional way if there is one.
Edit
3 Might not be a good example - I don't get much for using alternative syntax and I could have made it shorter.
1 and 2 are useful for debugging and quick testing.
I doubt that anyone would murder me for using some high-level html emitting functions of my own making when it saves a lot of writing.
In regards to #1, try xdebug's var_dump override, if you control your server and can install PHP extensions. The remote debugger and performance tools provided by xdebug are great additions to your arsenal. If you're looking only for pure PHP code, consider Kint or dBug to supplement var_dump.
In regards to #2 and #3, you don't need to do this. Rather, you probably shouldn't do this.
PHP makes a fine HTML templating language. Trying to create functions to emit HTML is going to lead you down a horrible road of basically implementing the DOM in a horribly awkward and backwards way. Considering how horribly awkward the DOM already is, that'll be quite an accomplishment. The future maintainers of your code are going to want to murder you for it.
There is no shame in escaping out of PHP to emit large blocks of HTML. Escaping out to emit a single tag, though, is completely silly. Don't do that, and don't create functions that do that. There are better ways.
First, don't forget that print and echo aren't functions, they're built in to the language parser. Because they're special snowflakes, they can take a list without parens. This can make some awkward HTML construction far less awkward. For example:
echo '<select name="', htmlspecialchars($select_name), '</select>';
foreach($list as $key => $value) {
echo '<option value="',
htmlspecialchars($key),
'">',
htmlspecialchars($value),
'</option>'
}
echo '</select>';
Next, PHP supports heredocs, a method of creating a double-quoted string without the double-quotes:
$snippet = <<<HERE
<h1>$heading</h1>
<p>
<span class="aside">$aside_content</span>
$actual_content
</p>
HERE;
With these two tools in your arsenal, you may find yourself breaking out of PHP far less frequently.
While there is a case for helper functions (there are only so many ways you can build a <select>, for example), you want to use these carefully and create them to reduce copy and paste, not simply to create them. The people that will be taking care of the code you're writing five years from now will appreciate you for it.
You should use a php template engine and just separate the entire presentation and logic. It make no sense for a educated programmer to try to create a library like that.

Dealing with XML in PHP

I'm currently working a project that has me working with XML a lot. I have to take an XML response and decrypt each text node and then do various tasks with the data. The problem I'm having is taking the response and processing each text node. Originally I was using the XMLToArray library, and that worked fine I would change the XML into an array and then loop through the array and decrypt the values. However some of the XML response I'm dealing with have repeated tags and the XMLToArray library will only return the last values.
Is there a good way that I can take an XML response and process all the text nodes and easily putting the values into an array that has a similar structure to the response?
Thanks in advance.
I would use SimpleXML.
Here's a small example of using it. It loads and parses XML from http://www.w3schools.com/xml/plant_catalog.xml and then outputs values of "COMMON" and "PRICE" tags of each "PLANT" tag.
$xml = simplexml_load_file('http://www.w3schools.com/xml/plant_catalog.xml');
foreach ( $xml->PLANT as $plantNode ) {
echo $plantNode->COMMON, ' - ', $plantNode->PRICE, "\n";
}
If you have any problems with adapting it to your needs, just give an example of your XML so that we can help with it.
All those XML to array libraries are a remain of the times where PHP 4 would force you to write your own XML parser almost from scratch. In recent PHP versions you have a good set of XML libraries that do the hard job. I particularly recommend SimpleXML (for small files) and XMLReader (for large files). If you still find them complicate, you can try phpQuery.
You might want to give SimpleXML a try. Plus it comes by default in php so you dont need to install
Check out SimpleXML, it may offer a bit more for what you are looking for.

Is it better to use pure DOM methods or string concatenation to generate a very dynamic xml file?

I basically have to query the database to grab all the active properties, then grab content under each property for sections such as accommodations/experiences, and based on that generate basically a site map.
I'm wondering if I should go with using purely DOM methods ( would have me doing a hundred or so createElements, inside of loops, appendChild, etc ) or just do a giant string concatenation and validate that then render it as xml?
Personally, I always try to use the DOM methods for XML generation. The reason is simple, it performs all the necessary escaping and entity generation for you.
$xmlBlock = '<foo>';
$xmlBlock .= '<bar>'.htmlspecialchars('baz', ENT_NOQUOTES, 'utf-8', false).'</bar>';
$xmlBlock .= '</foo>';
Compared to:
$node = $dom->createElement('foo');
$node->appendChild($dom->createElement('bar', 'baz'));
But then again, that's just my personal preference...
I would generate it with strings, in most cases.
$myXml = "<?xml ...... ";
$myXml .= "<rootNode>";
$myXml .= "<child>";
etc..
Performance-wise, string concatenation then doing a double-pass on the string is at a disadvantage. XML validation in the case of DOM methods is done for you automatically.
If you're concerned about performance, use output buffering to build the XML content. Also, build or find a trivial library that maintains a tag stack, does XML escaping, etc, so that you can guarantee well-formed output. That way, you can skip the validation pass unless you really need to verify conformance to a schema or DTD.
I don't know of anything in PHP, but the Android XmlSerializer is a good model of the minimum API you need to produce guaranteed-well-formed XML without building a DOM in memory as part of the process. The code involved isn't complicated and can be built and tested separately from any applications that use it.

Categories