PHP XPath how to wrap contents of p's in a span - php

I don't know if you can read JS Jquery but this is what I'd like to do server sided instead of client sided: $('p').wrapInner('<span class="contentsInP" />'); I'd like to take all existing paragraphs from a page and wrap their contents in a new span with a specific class.
Luckily all my documents are HTML5 in its XML flavour and are valid so that in PHP I can do this (simplified):
$xml=new DOMDocument();
$xml->loadXML($html);
$xpath = new DOMXPath($xml);
// How to go on in here to wrap my p's?
$output=$xml->saveXML();
How do I get PHP's DOMXPath to do my wrapping?
EDIT: Fiddled with this based on the comment but couldn't make it work
// based on http://stackoverflow.com/questions/8426391/wrap-all-images-with-a-div-using-domdocument
$xml=new DOMDocument();
$xml->loadXML(utf8_encode($temp));
$xpath = new DOMXPath($xml);
//Create new wrapper div
$new_span = $xml->createElement('span');
$new_span->setAttribute('class','contentsInP');
$ps = $xml->getElementsByTagName('p');
//Find all p
//Iterate though p
foreach ($ps AS $p) {
//Clone our created span
$new_span_clone = $new_span->cloneNode();
//Replace p with this wrapper span
$p->parentNode->replaceChild($new_span_clone,$p);
//Append the p's contents to wrapper span
// THIS IS THE PROBLEM RIGHT NOW:
$new_span_clone->appendChild($p);
}
$temp=$xml->saveXML();
The above wraps the p in a span but I need a span wrapping the p's contents while keeping the p around the span... Furthermore the above fails if the p has a class, then it won't be touched.

In attempting to adapt that other answer, the primary thing that needs to change with it is to get all child nodes of the <p> element, first remove them as children from <p> then append them as children onto the <span>. Then finally, append the <span> as a child node of the <p>.
$html = <<<HTML
<!DOCTYPE html>
<html>
<head><title>xyz</title></head>
<body>
<div>
<p><a>inner 1</a></p>
<p><a>inner 2</a><div>stuff</div><div>more stuff</div></p>
</div>
</body>
</html>
HTML;
$xml=new DOMDocument();
$xml->loadXML(utf8_encode($html));
//Create new wrapper div
$new_span = $xml->createElement('span');
$new_span->setAttribute('class','contentsInP');
$ps = $xml->getElementsByTagName('p');
//Find all p
//Iterate though p
foreach ($ps AS $p) {
//Clone our created span
$new_span_clone = $new_span->cloneNode();
// Get an array of child nodes from the <p>
// (because the foreach won't work properly over a live nodelist)
$children = array();
foreach ($p->childNodes as $child) {
$children[] = $child;
}
// Loop over that list of child nodes..
foreach ($children as $child) {
// Remove the child from the <p>
$p->removeChild($child);
// Append it to the span
$new_span_clone->appendChild($child);
}
// Lastly, append the <span> as a child to the <p>
$p->appendChild($new_span_clone);
}
$temp=$xml->saveXML();
Given the input HTML fragment, this should produce output like: (demonstration...)
<!DOCTYPE html>
<html>
<head><title>xyz</title></head>
<body>
<div>
<p><span class="contentsInP"><a>inner 1</a></span></p>
<p><span class="contentsInP"><a>inner 2</a><div>stuff</div><div>more stuff</div></span></p>
</div>
</body>
</html>

Related

Fetch nested tags in php using simplehtmldom

Lets say I have this code. I want to fetch all p tag data from nested div tag. there can be 15 nested div tag. so want to write a script which can dig all the div and return p tag data from it.
<div>
<div>
<div>
<p>Hi</p>
</div>
<p>Hello</p>
</div>
<p>Hey</p>
</div>
required output(any order):
Hi
Hello
Hey
I have attempted the following:
function divDigger($div)
{
$internalP = $div->getElementsByTagName('p');
echo $internalP->innertext;
$internalDiv = $div->getElementsByTagName('div');
if (count($internalDiv) > 0) {
foreach ($internalDiv as $div) {
divDigger($div);
}
}
}
You may use the XPath API for this:
$doc = new \DOMDocument();
$doc->loadHTML($yourHtml);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query('//div//p') as $pWithinDiv) {
echo $pWithinDiv->textContent, PHP_EOL;
}
This will find any <p> element under a <div> (not necessarily directly under it, otherwise you can change the expression to //div/p), and display its text content.
Demo: https://3v4l.org/43QqX

appendXML stripping out img element

I need to insert an image with a div element in the middle of an article. The page is generated using PHP from a CRM. I have a routine to count the characters for all the paragraph tags, and insert the HTML after the paragraph that has the 120th character. I am using appendXML and it works, until I try to insert an image element.
When I put the <img> element in, it is stripped out. I understand it is looking for XML, however, I am closing the <img> tag which I understood would help.
Is there a way to use appendXML and not strip out the img elements?
$mcustomHTML = "<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></img></div>";
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . $content);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$template = $doc->createDocumentFragment();
$template->appendXML($mcustomHTML);
$tag->appendChild($template);
break;
}
}
return $doc->saveHTML();
This should work for you. It uses a temporary DOM document to convert the HTML string that you have into something workable. Then we import the contents of the temporary document into the main one. Once it's imported we can simply append it like any other node.
<?php
$mcustomHTML = '<div style="position:relative; overflow:hidden;"><img src="https://s3.amazonaws.com/a.example.com/image.png" alt="No image" /></div>';
$customDoc = new DOMDocument();
$customDoc->loadHTML($mcustomHTML, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc = new DOMDocument();
$doc->loadHTML($content);
$customImport = $doc->importNode($customDoc->documentElement, true);
// read all <p> tags and count the text until reach character 120
// then add the custom html into current node
$pTags = $doc->getElementsByTagName('p');
foreach($pTags as $tag) {
$characterCounter += strlen($tag->nodeValue);
if($characterCounter > 120) {
// this is the desired node, so put html code here
$tag->appendChild($customImport);
break;
}
}
return $doc->saveHTML();

How to find a h3 tag with a certain value

Well, I have a HTML File with the following structure:
<h3>Heading 1</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
<h3>Heading 2</h3>
<table>
<!-- contains a <thead> and <tbody> which also cointain several columns/lines-->
</table>
I want to get JUST the first table with all its content. So I'll load the HTML File
<?php
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(file_get_contents('http://www.example.com'));
libxml_clear_errors();
?>
All tables have the same classes and also have NO specific ID's. That's why the only way I could think of was to grab the h3-tag with the value "Heading 1". I already found this one, which works well for me. (Thinking of the fact that other tables and captions could be added leaves the solution as unfavorable)
How could I grab the h3 tag WITH the value "Heading 1"? + How could I select the following table?
EDIT#1: I don't have access to the HTML File, so I can't edit it.
EDIT#2: My Solution (thanks to Martin Henriksen) for now is:
<?php
$doc = new DOMDocument(1.0);
libxml_use_internal_errors(true);
$doc->loadHTML(file_get_contents('http://example.com'));
libxml_clear_errors();
foreach($doc->getElementsByTagName('h3') as $element){
if($element->nodeValue == 'exampleString')
$table = $element->nextSibling->nextSibling;
$innerHTML= '';
$children = $table->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
echo $innerHTML;
file_put_contents("test.xml", $innerHTML);
}
?>
You can Find any tag in HTML using simple_html_dom.php class you can download this file from this link https://sourceforge.net/projects/simplehtmldom/?source=typ_redirect
Than
<?php
include_once('simple_html_dom.php');
$htm = "**YOUR HTML CODE**";
$html = str_get_html($htm);
$h3_tag = $html->find("<h3>",0)->innertext;
echo "HTML code in h3 tag";
print_r($h3_tag);
?>
You can fetch out all the DomElements which the tag h3, and check what value it holds by accessing the nodeValue. When you found the h3 tag, you can select the next element in the DomTree by nextSibling.
foreach($dom->getElementsByTagName('h3') as $element)
{
if($element->nodeValue == 'Heading 1')
$table = $element->nextSibling;
}

How to use DOMDocument insertBefore

I have a div and I'm trying to insert a couple elements (h3 and p) into the div ahead of the existing h3 and p elements already living inside the div. The PHP documentation for insertBefore (http://www.php.net/manual/en/domnode.insertbefore.php) says this is exactly what should happen, but instead of inserting ahead of the existing elements, its replacing all existing elements inside my 'content' div.
Here's my code:
$webpage = new DOMDocument();
$webpage->loadHTMLFile("news.html");
$headerelement = $webpage->createElement('h3', $posttitle);
$pelement = $webpage->createElement('p', $bodytext);
$webpage->formatOutput = true;
$webpage->getElementById('content')->insertBefore($headerelement);
$webpage->getElementById('content')->insertBefore($pelement);
$webpage->saveHTMLFile("newpost.html");
I'm sure I'm just not understanding something... any help would be appreciated, thanks.
It's because you're not specifying a reference node that the inserted node should be inserted before. Think of it like this:
$whatTheElementIsInsertedInto->insertBefore($theElement, $whatItIsInsertedBefore)
Live demo (click).
$dom = new DOMDocument();
$dom->loadHtml('
<html><head></head>
<body>
<div id="content">
<h3>Original h3</h3>
</div>
</body>
</html>
');
//find the "content" div
$content = $dom->getElementById('content');
//find the first h3 tag in "content"
$origH3 = $content->getElementsByTagName('h3')->item(0);
//create a new h3
$newH3 = $dom->createElement('h3', 'new h3!');
//insert the new h3 before the original h3 of "content"
$content->insertBefore($newH3, $origH3);
echo $dom->saveHTML();

Rewriting HTML tags with DOM/Xpath (PHP)

I'm parsing a block of HTML with DOM/Xpath in PHP. Within this HTML, there are a few p tags that I want to convert to h4 tags, instead.
Raw HTML =>
<p class="archive">Awesome line of text</p>
Desired HTML =>
<h4>Awesome line of text</h4>
How can I do this with Xpath? I think I need to call on appendChild, but I'm not sure. Thank you for any guidance.
Something along these lines should do it:
<?php
$html = <<<END
<html>
<head>
<title>Test</title>
</head>
<body>
<p>hi</p>
<p class="archive">Awesome line of text</p>
<p>bye</p>
<p class="archive">Another line of <b>text</b></p>
<p>welcome</p>
<p class="archive">Another <u>line</u> of <b>text</b></p>
</body>
</html>
END;
$doc = new DOMDocument();
$doc->loadXML($html);
$xpath = new DOMXPath($doc);
// Find the nodes we want to change
$nodes = $xpath->query("//p[#class = 'archive']");
foreach ($nodes as $node) {
// Create a new H4 node
$h4 = $doc->createElement('h4');
// Move the children of the current node to the new one
while ($node->hasChildNodes())
$h4->appendChild($node->firstChild);
// Replace the current node with the new
$node->parentNode->replaceChild($h4, $node);
}
echo $doc->saveXML();
?>

Categories