PHP DOM functuon to creat Div ID from HTML5 Elements - php

I am using the following function to replace HTML5 elements with Div ID.
<?php function nonHTML5 ($content){
$dom = new DOMDocument;
// Hide HTML5 element errors
libxml_use_internal_errors(true);
$dom->loadHTML($content);
libxml_clear_errors();
$xp = new DOMXPath($dom);
// Bring elements into array
$elements = $xp->query('//*[self::header| self::footer ]
[not(ancestor::pre) and not(ancestor::code)]');
// Loop through
foreach($elements as $element){
// Replace with 'div' tag
$newElement = $dom->createElement('div');
while($element->childNodes->length){
// Keepup with the child nodes
$childElement = $element->childNodes->item(0);
$newElement->appendChild($dom->importNode($childElement, true));
}
while($element->attributes->length){
// Mailtain the length
$attributeNode = $element->attributes->item(0);
$newElement->setAttributeNode($dom->importNode($attributeNode));
}
$element->parentNode->replaceChild($newElement, $element);
}
$content = $dom->saveXML($dom->documentElement);
return $content;
} ?>
I know we can use HTMLShiv but I want to do this primarily for Old browsers with JavaScript disabled.
My Challenge:
I am not able to add an id =" " to it. For example.....
<header>
<h1>I am the header</h1>
</header>
Should become
<div id ="header">
<h1>I am the header</h1>
</div>
I tried doing......
$newElement = $dom->createElement('div id ="' . $element . '"');
but did not work.
My question
What should be the correct code?
Please Note: I am not a PHP expert hence please be a little descriptive in your answers / comments.

Here is how you can do it:
NOTE : I have added comments for more clarification that what is happening in exactly each statement of the code.
CREATING AN HTML ELEMENT WITH ATTRIBUTE USING DOM :
<?php
// Initiate a new DOMDocument
$dom = new DOMDocument();
// Create an element
$div = $dom->createElement("div","HERE DIV CONTENTS");
// Create an attribute i.e id
$divAttr = $dom->createAttribute('id');
// Assign value to your attribute i.e id="value"
$divAttr->value = 'This is an id';
// Add your attribute (id) to your element (div)
$div->appendChild($divAttr);
// Add your element (div) to DOM
$dom->appendChild($div);
// Print your DOM HERE
echo $dom->saveHTML();
?>
CODE OUTPUT :
<div id="This is an id">HERE DIV CONTENTS</div>

Related

PHP Using DOMXPath to strip tags and remove nodes

I am trying to work with DOMDocument but I am encountering some problems. I have a string like this:
Some Content to keep
<span class="ice-cts-1 ice-del" data-changedata="" data-cid="5" data-time="1414514760583" data-userid="1" data-username="Site Administrator" undefined="Site Administrator">
This content should remain, but span around it should be stripped
</span>
Keep this content too
<span>
<span class="ice-cts-1 ice-ins" data-changedata="" data-cid="2" data-time="1414512278297" data-userid="1" data-username="Site Administrator" undefined="Site Administrator">
This whole node should be deleted
</span>
</span>
What I want to do is, if the span has a class like ice-del keep the inner content but remove the span tags. If it has ice-ins, remove the whole node.
If it is just an empty span <span></span> remove it as well. This is the code I have:
//this get the above mentioned string
$getVal = $array['body'][0][$a];
$dom = new DOMDocument;
$dom->loadHTML($getVal );
$xPath = new DOMXPath($dom);
$delNodes = $xPath->query('//span[#class="ice-cts-1 ice-del"]');
$insNodes = $xPath->query('//span[#class="ice-cts-1 ice-ins"]');
foreach($insNodes as $span){
//reject these changes, so remove whole node
$span->parentNode->removeChild($span);
}
foreach($delNodes as $span){
//accept these changes, so just strip out the tags but keep the content
}
$newString = $dom->saveHTML();
So, my code works to delete the entire span node, but how do I take a node and strip out it tags but keep its content?
Also, how would I just delete and empty span? I'm sure I could do this using regex or replace but I kind of want to do this using the dom.
thanks
No, I wouldn't recommend regex, I strongly recommend build on what you have right now with the use of this beautiful HTML Parser. You could use ->replaceChild in this case:
$dom = new DOMDocument;
$dom->loadHTML($getVal);
$xPath = new DOMXPath($dom);
$spans = $xPath->query('//span');
foreach ($spans as $span) {
$class = $xPath->evaluate('string(./#class)', $span);
if(strpos($class, 'ice-ins') !== false || $class == '') {
$span->parentNode->removeChild($span);
} elseif(strpos($class, 'ice-del') !== false) {
$span->parentNode->replaceChild(new DOMText($span->nodeValue), $span);
}
}
$newString = $dom->saveHTML();
More generic solution to delete any HTML tag from a DOM tree use this;
$dom = new DOMDocument;
$dom->loadHTML($getVal);
$xPath = new DOMXPath($dom);
$tagName = $xPath->query('//table'); //use what you want like div, span etc.
foreach ($tagName as $t) {
$t->parentNode->removeChild($span);
}
$newString = $dom->saveHTML();
Example html:
<html>
<head></head>
<body>
<table>
<tr><td>Hello world</td></tr>
</table>
</body>
</html>
Output after process;
<html>
<head></head>
<body></body>
</html>

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

DOMDocument grab html between two p tags [duplicate]

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

Replace element using phpquery (php version of jquery)

I want to replace all <span> tags with <p> using phpquery. What is wrong with my code? It finds the span but the replaceWith function is not doing anything.
$event = phpQuery::newDocumentHTML(file_get_contents('event.html'));
$formatted_event = $event->find('span')->replaceWith('<p>');
This documentation indicates this is possible:
http://code.google.com/p/phpquery/wiki/Manipulation#Replacing
http://api.jquery.com/replaceWith/
This is the html that gets returned with and without ->replaceWith('<p></p>') in the code:
<span class="Subhead1">Event 1<br></span><span class="Subhead2">Event 2<br>
August 12, 2010<br>
2:35pm <br>
Free</span>
If you dont mind a plain DOMDocument solution (DOMDocument is used unter the hood of phpQuery to parse the HTML fragments), I did something similiar a while ago. I adapted the code to do what you need:
$document = new DOMDocument();
// load html from file
$document->loadHTMLFile('event.html');
// find all span elements in document
$spanElements = $document->getElementsByTagname('span');
$spanElementsToReplace = array();
// use temp array to store span elements
// as you cant iterate a NodeList and replace the nodes
foreach($spanElements as $spanElement) {
$spanElementsToReplace[] = $spanElement;
}
// create a p element, append the children of the span to the p element,
// replace span element with p element
foreach($spanElementsToReplace as $spanElement) {
$p = $document->createElement('p');
foreach($spanElement->childNodes as $child) {
$p->appendChild($child->cloneNode(true));
}
$spanElement->parentNode->replaceChild($p, $spanElement);
}
// print innerHTML of body element
print DOMinnerHTML($document->getElementsByTagName('body')->item(0));
// --------------------------------
// Utility function to get innerHTML of an element
// -> "stolen" from: http://www.php.net/manual/de/book.dom.php#89718
function DOMinnerHTML($element) {
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
Maybe this can get you in the right direction on how to do the replacement in phpQuery?
EDIT:
I gave the jQuery documentation of replaceWith another look, it seems to me, that you have to pass in the whole html fragment which you want to be your new, replaced content.
This code snipped worked for me:
$event = phpQuery::newDocumentHTML(...);
// iterate over the spans
foreach($event->find('span') as $span) {
// make $span a phpQuery object, otherwise its just a DOMElement object
$span = pq($span);
// fetch the innerHTMLL of the span, and replace the span with <p>
$span->replaceWith('<p>' . $span->html() . '</p>');
}
print (string) $event;
I couldnt find any way to do this with chained method calls in one line.
Wont str_replace do a better job at this? Its faster and easier to debug.
Always take into consideration that external libraries may have bugs:)
$htmlContent = str_replace("<span", "<p", $htmlContent);
$htmlContent = str_replace("</span>", "</p>", $htmlContent);
Try;
$formatted_event = $event->find('span')->replaceWith('<p></p>');
Did you try:
$formatted_event = $event->find('span')->replaceWith('p');
You could actually use the wrap function
$event = phpQuery::newDocumentHTML(...);
// iterate over the spans
foreach($event->find('span') as $span) {
// make $span a phpQuery object, otherwise its just a DOMElement object
$span = pq($span);
// wrap the span with <p>
$span->wrap('<p></p>');
}
print (string) $event;

PHP Dom problem, how to insert html code in a particular div

I am trying to replace the html code inside the div 'resultsContainer' with the html of $response.
The result of my unsuccessful code is that the contents of 'resultsContainer' remain and the html of $response shows up on screen as text rather than being parsed as html.
Finally, I would like to inject the content of $response inside 'resultContainer' without having to create any new div, I need this: <div id='resultsContainer'>Html inside $response here...</div> and NOT THIS: <div id='resultsContainer'><div>Html inside $response here...</div></div>
// Set Config
libxml_use_internal_errors(true);
$doc = new DomDocument();
$doc->strictErrorChecking = false;
$doc->validateOnParse = true;
// load the html page
$app = file_get_contents('index.php');
$doc->loadHTML($app);
// get the dynamic content
$response = file_get_contents('search.php'.$query);
$response = utf8_decode($response);
// add dynamic content to corresponding div
$node = $doc->createElement('div', $response);
$doc->getElementById('resultsContainer')->appendChild($node);
// echo html snapshot
echo $doc->saveHTML();
if $reponse is plain text:
// add dynamic content to corresponding div
$node = $doc->createTextNode($response);
$doc->getElementById('resultsContainer')->appendChild($node);
if it (can) contain html (one could use createDocumentFragment but that creates its own set of trouble with entities, dtd, etc.):
// add dynamic content to corresponding div
$frag = new DomDocument();
$frag->strictErrorChecking = false;
$frag->validateOnParse = true;
$frag->loadHTML($response);
$target = $doc->getElementById('resultsContainer');
if(isset($target->childNodes) && $target->childNodes->length)){
for($i = $target->childNodes->length -1; $i >= 0;$i--){
$target->removeChild($target->childNodes->item($i));
}
}
//if there's lots of content in $target, you might try this:
//$target->parentNode->replaceChild($target->cloneNode(false),$target);
foreach($frag->getElementsByTagName('body')->item(0)->childNodes as $node){
$target->appendChild($doc->importNode($node,true));
}
Which goes to show it really isn't that suited (or at least cumbersome) to use DOMDocuments as a templating engine.

Categories