GetElementsByTagName alternative to DOMDocument - php

I am creating an HTML file with DOMDocument, but I have a problem at the time of the search by the getElementsByTagName method. What I found is that as I'm generating the hot, does not recognize the labels that I inserted.
I tried with DOMXPath, but to no avail :S
For now, I've got to do is go through all the children of a node and store in an array, but I need to convert that score DOMNodeList, and in doing
return (DOMNodeList) $ my_array;
generates a syntax error.
My specific question is, how I can do to make a search for tags with the getElementsByTagName method or other alternative I can offer to achieve the task?
Recalling that the DOMDocument I'm generating at the time.
If you need more information, I'll gladly place it in the question.
Sure Jonathan Sampson.
I apologize for the editing of the question the way. I did not quite understand this forum format.
For a better understanding of what I do, I put the inheritance chain.
I have this base class
abstract class ElementoBase {
...
}
And I have this class that inherits from the previous one, with an abstract function insert (insert)
abstract class Elemento extends ElementoBase {
...
public abstract function insertar ( $elemento );
}
Then I have a whole series of classes that represent the HTML tags that inherit from above, ie.
class A extends Elemento {
}
...
Now the code I use to insert the labels in the paper is as follows:
public function insertar ( $elemento ) {
$this->getElemento ()->appendChild ( $elemento->getElemento () );
}
where the function getElemento (), return a DOMElement
Moreover, before inserting the element do some validations that depend on the HTML tag that is to be inserted,
because they all have very specific specifications.
Since I'm generating HTML code at the same time, it is obvious that there is no HTML file.
To your question, the theory tells me to do this:
$myListTags = $this->getElemento ()->getElementsByTagName ( $tag );
but I always returns null, this so I researched it because I'm not loading the HTML file, because if I
$myHtmlFile = $this->getDocumento ()->loadHTMLFile ( $filename );
$myListTags = $myHtmlFile->getElementsByTagName ( $etiqueta );
I do return the list of HTML tags
If you need more information, I'll gladly place it in the question.

I am assuming you have created a valid HTML file with DOMDocument. Your basic problem is to parse or search the HTML doc for a particular tag name.
To search a HTML file the best solution available in PHP is Simple HTML DOM parser.
You can just run the following code and you are done!
$html = file_get_html('url to your html file');
foreach($html->find('tag name') as $element)
{
// perform the action you want to do here.
// example: echo $element->someproperty;
}

$doc = new DOMDocument('1.0', 'iso-8859-1');
$doc->appendChild(
$doc->createElement('Filiberto', 'It works!')
);
$nodeList = $doc->getElementsByTagName('Filiberto');
var_dump($nodeList->item(0)->nodeValue);

Related

Using XSD schema validation for XPath queries

I'm using the following code to create a DOMDocument and validate it against an external xsd file.
<?php
$xmlPath = "/xml/some/file.xml";
$xsdPath = "/xsd/some/schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc>schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
Update 2 (rewritten question)
This works fine, meaning that if the XML doesn't match the definitions of XSD it will throw a meaningful exception.
Now, I want to retrieve information from the DOMDocument using Xpath. It works fine aswell, however, from this point on the DOMDocument is completely detached from the XSD! For example, if I have a DOMNode I cannot know whether it is of type simpleType or type complexType. I can check whether the node has child (hasChild()) nodes, but this is not the same. Also, there is tons of information more in the XSD (like, min and max number of occurrence, etc).
The question really is, do I have to query the XSD myself or is there a programmatic way of asking those kind of questions. I.e. is this DOMNode a complex or simple type?
In another post it was suggested "to process the schema using a real schema processor, and then use its API to ask questions about the contents of the schema". Does XPath has an API to retrieve information of the XSD or is there a different convenient way with DOMDocument?
For the record, the original question
Now, I wanted to proceed to parse information from the DOMDocument using XPath. To increase the integrity of my data I'm storing to a database and giving meaningful error message to the client I wanted to constantly use the schema information to validate the queries. I.e. I wanted to validate fetched childNodes against allowed child nodes defined in the xsd. I wanted to that by using XPath on the xsd document.
However, I sumbled across this post. It basically sais it is a kind of kirky way to that yourself and you should rather use a real schema processor and use its API to make the queries. If I understand that right, I'm using a real schema processor with schemaValidate, but what is meant by using its API?
I kind of guessed already I'm not using the schema in a correct way, but I have no idea how to research a proper usage.
The question
If I use schemaValidate on the DOMDocument is that a one-time validation (true or false) or is it tied to the DOMDocument for longer then? Precisely, can I use the validation also for adding nodes somehow or can I use it to select nodes I'm interested in as suggested by the referenced SO post?
Update
The question was rated unclear, so I want to try again. Say I would like to add a node or edit a node value. Can I use the schema provided in the xsd so that I can validate the user input? Originally, in order to do that I wanted to query the xsd manually with another XPath instance to get the specs for a certain node. But as suggested in the linked article this is not best practice. So the question would be, does the DOM lib offer any API to make such a validation?
Maybe I'm overthinking it. Maybe I just add the node and run the validation again and see where/why it breaks? In that case, the answer of the custom error handling would be correct. Can you confirm?
Your question is not very clear, but it sounds like you want to get detailed reporting about any schema validation failures. While DomDocument::validateSchema() only returns a boolean, you can use internal libxml functions to get some more detailed information.
We can start with your original code, only changing one thing at the top:
<?php
// without this, errors are echoed directly to screen and/or log
libxml_use_internal_errors(true);
$xmlPath = "file.xml";
$xsdPath = "schema.xsd";
$doc = new \DOMDocument();
$doc->loadXML(file_get_contents($xmlPath), LIBXML_NOBLANKS);
if (!$doc->schemaValidate($xsdPath)) {
throw new InvalidXmlFileException();
}
And then we can make the interesting stuff happen in the exception which is presumably (based on the code you've provided) caught somewhere higher up in the code.
<?php
class InvalidXmlFileException extends \Exception
{
private $errors = [];
public function __construct()
{
foreach (libxml_get_errors() as $err) {
$this->errors[] = self::formatXmlError($err);
}
libxml_clear_errors();
}
/**
* Return an array of error messages
*
* #return array
*/
public function getXmlErrors(): array
{
return $this->errors;
}
/**
* Return a human-readable error message from a libxml error object
*
* #return string
*/
private static function formatXmlError(\LibXMLError $error): string
{
$return = "";
switch ($error->level) {
case \LIBXML_ERR_WARNING:
$return .= "Warning $error->code: ";
break;
case \LIBXML_ERR_ERROR:
$return .= "Error $error->code: ";
break;
case \LIBXML_ERR_FATAL:
$return .= "Fatal Error $error->code: ";
break;
}
$return .= trim($error->message) .
"\n Line: $error->line" .
"\n Column: $error->column";
if ($error->file) {
$return .= "\n File: $error->file";
}
return $return;
}
}
So now when you catch your exception you can just iterate over $e->getXmlErrors():
try {
// do stuff
} catch (InvalidXmlFileException $e) {
foreach ($e->getXmlErrors() as $err) {
echo "$err\n";
}
}
For the formatXmlError function I just copied an example from the PHP documentation that parses the error into something human readable, but no reason you couldn't return some structured data or whatever you like.
I think what you're looking for is the PSVI (post schema validation infoset), see this answer for some references.
An other option would be to use XPath2 that has operators to check schema types.
I don't know if there are libraries in PHP that allows you to get PSVI or perform XPath2 queries, in Java there is Xerces for PSVI and Saxon for XPath2
For example With Xerces is possible to cast a DOM Element to a Xerces ElementPSVI in order to get schema informations of an Element.
I can warn that using XPath on the schema (as you were doing) will work only for simple cases since the XML of the schema is very different from the actual schema model (assembled schema) that is a graph of components with properties that are yes calculated from the XML declaration (schema file) but with very complex rules that are almost impossible to recreate with XPath.
So you need at least the PSVI or to make XPath2 queries but, in my experience, obtaining decent validation for application users from an XML schema is difficult.
What are you trying to achieve ?

Get attributes from item(tag) using SimplePie

I'm trying to get attributes for "id" tag in feed with usage of simplepie.
This is the fragment of code from feed:
<updated>2012-03-12T08:26:29-07:00</updated>
<id im:id="488627" im:bundleId="dmtmobile">http://www.example.com</id>
<title>Draw Something by OMGPOP - OMGPOP</title>
I want to get number (488627) from im:id attribute contained in id tag
How can I get this ?
I tried $item->get_item_tags('','im:id') but it didn't work
If this is in an Atom 1.0 feed, you'll want to use the Atom namespace:
$data = $item->get_item_tags(SIMPLEPIE_NAMESPACE_ATOM_10,'id');
From there, you should then find that the attributes you want are:
$id = $data['attribs'][IM_NAMESPACE]['id']
$bundleID = $data['attribs'][IM_NAMESPACE]['bundleId']`
where IM_NAMESPACE is set to the im XML namespace (i.e. what the value of xmlns:im is).
The reason SimplePie asks for a namespace is because it internally stores the node elements under the given namespace. If you don't know what your specific namespace is, use print_r to dump it:
print_r($item->data['child']);
You can also directly access the child elements if you know the namespace, or write a simple seeker function to step through each namespace and look for a matching tag.
$data = $item->data['child']['im']['bundleId'][0]['data'];
The get_item_tags() function is stupid and doesn't usually do what you want, but it's also very simple and easy to replace with your own special purpose functions. Original source is:
public function get_item_tags($namespace, $tag)
{
if (isset($this->data['child'][$namespace][$tag]))
{
return $this->data['child'][$namespace][$tag];
}
else
{
return null;
}
}

saving parts of xml object as object

I created an xml file that stores some information for me. Now I want to get elements that meet some conditions.
At the moment this looks like this:
Function getElements($xmlObject, $name){
foreach($xmlObject->feature as $feature){
if(stristr($feature->path, $name))){
array_push($aSubFeatures, $feature);
}
}
return $obj;
}
But I'd prefer getting an object as a return value. I used simpleXML for getting the xml file as an object.
I also tried using DOM (creating new DOMDocument and tried to append the gotten feature element objects) but without reasonable result.
Would deleting all not matching parts of the xml a solution? Did not found a way to delete special elements...
Thanks for your help
For appending an element of a current existing DOMDocument into a new DOMDocument you have to call $newdom->importNode($nodeInOldDOM). You cannot do a regular appendChild of a node from another document.

Recursive tree rendering with Agile Toolkit

I have a following situation. I have a Model A with following properties:
id int
name varchar(255)
parent_id int (references same Model A).
Now, I need to render Tree View using that ModelA. Of course, I could just load all data, sort it properly by parent_id and "render it" using traditional string sticking. e.g.
class Model_A extends Model_Table {
...
function render_branch($nodes, $parent){
if (!isset($nodes[$parent])){
return null;
}
$out = "<ul>";
foreach ($nodes[$parent] as $node){
$out .= "<li>" . $node["name"];
$out .= $this->render_branch($nodes, $node["id"]);
$out .= "</li>";
}
return $out;
}
function init(){
parent::init();
$nodes = array(); // preload from db and arrange so that key = parent and content is array of childs
$this->template->set("tree", $this->render_branch($nodes, 0));
}
}
now, I would instead like to use atk4 native lister/smlite template parser for the purpose. but, if you try to do that, then you would end up with nasty lister, where in format row, you would anyway try to substitute the specific tag with output from other lister which in fact you would have to destruct to void runtime memory overflows.
any suggestions?
p.s.
code above is not tested, just shows concept
thanks!
Okay, right time had come and proper add-on has been created. To use it, get your add ons and atk4 up-to-dated and follow this article to get to know how.
http://www.ambienttech.lv/blog/2012-07-06/tree_view_in_agile_toolkit.html
As per Jancha's comment
okay, after spending some time looking at possible options, I found that
the easiest thing to do in this particular case was to use above mentioned example.
The only way to make it more native would be to use external template for
nodes and use smite and clone region + render to move html outside t o
template. apart from that, usage of traditional lister did not seem to
be efficient enough. so, atk4 guys, follow up with query tree view
plugin and create proper backend! it would be cool. thanks,j
.

Using DOMXPath to replace a node while maintaining its position

Ok, so I had this neat little idea the other night to create a helper class for DOMDOCUMENT that mimics, to some extent, jQuery's ability to manipulate the DOM of an HTML or XML-based string. Instead of css selectors, XPath is used. For example:
$Xml->load($source)
->path('//root/items')
->each(function($Context)
{
echo $Context->nodeValue;
});
This would invoke a callback function on every resulting node. Unfortunately, PHP version < 5.3.x doesn't support lambda functions or closures, so I'm forced to do something a bit more like this for the time being:
$Xml->load($source)
->path('//root/items')
->walk('printValue', 'param1', 'param2');
Everything is working great at the moment and I think this project would be useful to a lot of people, but I'm stuck with one of the functions. I am attempting to mimic jQuery's 'replace' method. Using the following code, I can accomplish this quite easily by applying the following method:
$Xml->load($source)
->path('//root/items')
->replace($Xml->createElement('foo', 'bar')); // can be an object, string or XPath pattern
The code behind this method is:
public function replace($Content)
{
foreach($this->results as $Element)
{
$Element->parentNode->appendChild($Content->cloneNode(true));
$Element->parentNode->removeChild($Element);
}
return $this;
}
Now, this works. It replaces every resulting element with a cloned version of $Content. The problem is that it adds them to the bottom of the parent node's list of children. The question is, how do I clone this element to replace other elements, while still retaining the original position in the DOM?
I was thinking about reverse-engineering the node I was to replace. Basically, copying over values, attributes and element name from $Content, but I am unable to change the actual element name of the target element.
Reflection could be a possibility, but there's gotta be an easier way to do this.
Anybody?
Use replaceChild instead of appendChild/removeChild.
Lookup if $element has a nextsibbling prior to removing if so do an insertBefore that next sibling otherwise simply append.
public function replace($Content)
{
foreach($this->results as $Element)
{
if ($Element->nextSibling) {
$NextSiblingReference = $Element->nextSibling;
$Element->parentNode->insertBefore($Content->cloneNode(true),$NextSiblingReference);
}
else {
$Element->parentNode->appendChild($Content->cloneNode(true));
}
$Element->parentNode->removeChild($Element);
}
return $this;
}
Totally untested though.
Or as AnthonyWJones suggested replaceChild , big oomph how did i miss that moment :)

Categories