I have a bunch of .html files that I am including on a page. Conditionally, I need to add classes to some of the components in these files, for example:
<div id='foo' class='bar'></div>
to
<div id='foo' class='bar bar2'></div>
I know I can do this with some inline PHP like this
<div id='foo' class="bar <?php echo " bar2"; ?>"></div>
However, having PHP in any of the files I'm including is not an option.
I also looked into including a file and then modifying afterward, but that doesn't seem possible. Then I was thinking I should read the files line-by-line, and add it in then.
Is there a nicer way I'm not thinking of?
Since having PHP is not an option, you could use PHP's DOM Parser with an XPath selector:
$dom = new DOMDocument();
$dom->loadHTMLFile($htmlFile);
$finder = new DomXPath($dom);
// getting the class name using XPath
$nodes = $finder->query("//*[contains(#class, 'bar')]");
// changing the class name using setAttribute
foreach ($nodes as $node) {
$node->setAttribute('class', 'barbar2');
}
// modified HTML source
$html = $dom->saveHTML();
That should get you started.
You can use the DOMDocument class in PHP to retreive the information from the file and then add attributes and data.
I don't really remember the code for DOMDocument so I haven't included any code here (sorry), but here are some links:
Use this method to get the HTML from your file:
http://php.net/manual/en/domdocument.loadhtmlfile.php
Review the DOMDocument class:
http://php.net/manual/en/class.domdocument.php
You may need to use .php instead of .html.
So do like below:
$variableClass="bar2";
include("htmlfilename.html");
where the htmlfile.html consists of
<div id='foo' class="bar <?php echo $variableClass; ?>"></div>
Depends on what you actually want to achieve - but basically this tends to be better solved by jQuery on the client.
But anyway you might put your HTML fragments in a DOM object, analyze and modify it, and read the HTML back after the modifications, for example:
// including an HTML file writes to the output stream, so buffer this
ob_start();
include('myfile.html');
$html = ob_get_clean();
// make a DOMDocument
$doc = new DOMDocument();
$doc->loadHTML($html);
// make the changes you need to
$xpath = new DOMXPath($doc);
$nodelist $xpath->query('//div[#id="foo"]');
// etc...
// get modified HTML
$html = $doc->saveHTML();
Hope this helps.
Related
I am studying parsing HTML on PHP and I am using DOM for this.
I write this code inside my php file:
<?php
$site = new DOMDocument();
$div = $site->createElement("div");
$class = $site->createAttribute("class");
$class->nodeValue = "wrapper";
$div->appendChild($class);
$site->appendChild($div);
$html = $site->saveHTML();
echo $html;
?>
And when I run this on the browser and view the page source, only this code comes out:
<div class="wrapper"></div>
I don't know why it is not showing the whole html document that supposedly have to be. I am using XAMPP v3.2.1.
Please tell me where did I gone wrong with this. Thanks.
It's showing the whole HTML you created. A div node with a wrapper class attribute.
See the example in the docs. There the html, head, etc. nodes are explicitly created.
PHP only adds missing DOCTYPE, html and body elements when loading HTML, not when saving.
Adding $site->loadHTML($site->saveHTML()); before $html = $site->saveHTML(); will demonstrate this.
I've a problem. I want to load a HTML snippet with namespaces in it with DOMDocument.
<div class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu">
</div>
</div>
But I can't figure out how to preserve the namespaces. I tried loading it with loadHTML() but HTML does not have namespaces and so they get stripped.
I tried loading it with loadXML() but this doesn't work neither cause <my:text value="huhu"> is not correct XML.
What I need is a loadHTML() method which doesn't strip namespaces or a loadXML() method which does not validate the markup. So a combination of this two methods.
My code so far:
$html = '<div class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu">
</div>
</div>';
libxml_use_internal_errors(true);
$domDoc = new DOMDocument();
$domDoc->formatOutput = false;
$domDoc->resolveExternals = false;
$domDoc->substituteEntities = false;
$domDoc->strictErrorChecking = false;
$domDoc->validateOnParse = false;
$domDoc->loadHTML($html/*, LIBXML_NOERROR | LIBXML_NOWARNING*/);
$xpath = new DOMXPath($domDoc);
$xpath->registerNamespace ( 'my', 'http://www.example.com/' );
// -----> This results in zero nodes cause namespace gets stripped by loadHTML()
$nodes = $xpath->query('//my:*');
var_dump($nodes);
Is there a way to achieve what I want? I would be very happy for any advices.
EDIT I opened an enhancment request for libxml2 to provide an option to preserve namespaces in HTML: https://bugzilla.gnome.org/show_bug.cgi?id=711670
First, namespaces are allowed in XML (or XHTML) only. HTML does not support namespaces.
Given that it is XHTML and the xmlns declaration is present in the snippet, then you can access elements by namespace using DOMDocument::getElementsByTagNameNS():
$html = <<<EOF
<div xmlns:my="http://www.example.com/" class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu" />
</div>
</div>
EOF;
$domDoc = new DOMDocument();
$domDoc->loadXML($html);
var_dump(
// it is possible to use wildcard `*` here
$domDoc->getElementsByTagNameNS('http://www.example.com/', '*')
);
However as it is common that the namespace declaration is defined in the root element <html> rather than in sub nodes, the code above will not work in most cases..
So part two of the solution would be to check if the declaration is present and if not inject it.... (working on this)
As I said, the code above works for XML / XHTML only. It is still open how to do that with HTML. (check the discussion below)
Technically it's neither valid XML or HTML (or XHTML) because HTML does not allow for namespaced elements while valid XML requires that empty elements be self-closing and that the namespace be registered. So your basically asking "how can I have DOMDocument treat this invalid HTML as valid XML even though it's not valid XML either?" which is going to prove difficult and one might ask why should libxml be updated to allow for this? If I update your snippet to:
$html = <<<XML
<div xmlns:my="http://www.example.com/" class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu" />
</div>
</div>
XML;
adding in the NS registration and closing the my:text, it works just fine with:
$domDoc = new DOMDocument();
$domDoc->loadXML($html);
echo $domDoc->saveXML();
Notice that the namespace is not stripped out. The namespace is stripped out, as I understand it, because it's not valid XML or HTML. The XPath can't query by the namespace since the namespace wasn't defined via xmlns and therefore was dropped.
So I guess the question is: Why are you petitioning for invalid XML support rather than adding that closing slash? Is it because the data is from an external source or because in some context the empty non-closing tag is valid?
I need to add classes to the navigation HTML being output from a function in a custom CMS.
The only way I can get the output I need is to parse the HTML with PHP.
I am using PHP's DOM methods to look through the HTML and add a class to any <li> element that contains a child <ul> (top level navigation items).
So far it's working, but I have 2 questions:
Is there a more efficient way for me to go through this DOM data? It seems cumbersome to me, but that could just be my lack of experience.
In some cases, my <li> elements may already have a class, how can I add to the existing class attribute without destroying what may or may not already be there?
-
<?
$mcms_nav = getContent(
// call to cms that returns navigation html as a string
// ex. <ul id="pnav"><li>home</li>....</ul>
);
$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($mcms_nav);
$x = new DOMXPath($dom);
foreach($x->query('//ul/li/ul') as $node)
{
$parent = $node->parentNode;
$parent_attr = $dom->createAttribute('class');
$parent_attr->value = 'has-flyout';
$parent->appendChild($parent_attr);
$flyout_attr = $dom->createAttribute('class');
$flyout_attr->value = 'flyout';
$node->appendChild($flyout_attr);
}
$mcms_nav = $dom->getElementByID('pnav');
echo $dom->saveHTML($mcms_nav);
?>
Not really. You could take the XML class from the CakePHP framework, turn this into an array, manipulate the array, and turn it back. Not sure if that's an option in your case. http://book.cakephp.org/2.0/en/core-utility-libraries/xml.html
You can use dom->hasAttribute() and dom->getAttribute() to get the existing attribute contents if they exist.
Also, a new job wouldn't hurt ;)
How can I (with PHP) remove the style attributes from DIVS having a certain class?
Because of an 'Drag&Drop' process some DIV elements get polluted with unnecessary styles which can lead to problems later on.
I know I can remove the style attributes with JavaScript after a 'Drag&Drop' process, but I only remove them when to HTML is being processed by the server (For sending the HTML as an e-mail).
This isn't a particularly difficult problem, so far as I can tell. You need to load the HTML into a DOMDocument structure, then use a simple XPath attribute selector to find the relevant elements and DOMElement::removeAttribute to remove the style attribute. Your code might look like this:
$dom = new DOMDocument;
$dom->loadHTML($dirtyHtml);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[#class="someclass"]');
foreach ($divs as $div) {
$div->removeAttribute('style');
}
$cleanHtml = $dom->saveHTML();
I need work with namespaces on XML from a code and do something with it. For instance:
<system:include file="./test.php" cache="true" />
That would be the final output of the content, but it is necessary to process the special tags (like the system:include) before send to client.
So I will get all elements of final output to search about namespaced tags or specific ones. The problem is that if I use DOMDocument and read like XML, I have some problems with namespaces declaration (Namespace prefix system on include is not defined in Entity).
My test code is:
<?php
$document = new DOMDocument();
$document->loadXML('
<system:include file="./test.php" cache="true" />
');
foreach($document->childNodes as $node) {
var_dump($node->nodeName);
}
?>
I need do it because I need process some special tags and converts it to real HTML. For instance: convert <b> to <strong> (just an example!) or make something better like include and cache a specific page using tags.
Another example:
<h7>Hello World!</h7>
Converts to:
<div class="h7">Hello World!</div>
Note: the ob contents will be sent to a specific method that will search by this special tags. So I don't know if I can make namespaces declaration before (will be hard and slowly, probably).
Bye!
I can get it to work if I specify a root element in the XML, and then declare the system namespace inside the root element. <root xmlns:system="system">...</root>
<?php
function dump($root) {
foreach($root->childNodes as $node) {
echo $node->nodeName;
echo "\n";
dump($node);
}
}
$doc = new DOMDocument();
$doc->loadXML('<root xmlns:system="system"><system:include file="./test.php" cache="true" /></root>');
dump($doc);
?>