Need to get divs from string based on matching class

Need to get divs from string based on matching class - php

I have a variable $company_id = 8; and a block of HTML content stored as a string called all_content:
<div class="company-id-8">
Content One
</div>
<div class="company-id-9">
Content Two
</div>
<div class="company-id-8">
Content Three
</div>
<div class="company-id-3">
Content Four
</div>
I need to remove all of the divs from all_content that don't match the current company ID class. So, once filtered, the above html should become:
<div class="company-id-8">
Content One
</div>
<div class="company-id-8">
Content Three
</div>
I have the following code to filter out divs that don't belong to the current company:
$dom = new DomDocument();
$dom->loadHTML( $full_message );
$finder = new DomXPath($dom);
$classname = "company-id-" . $company_id;
$nodes = $finder->query("//div[contains(#class, '$classname')]");
foreach ( $nodes as $node ) {
$filtered_content .= ;
}
I can't seem to work out how to get my filtered div nodes back into the filtered_content string though?
How can I tidy this up and get it working?

Solution is to do the following:
$filtered_content = "";
foreach ( $nodes as $node ) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($node,true));
$filtered_content .= $tmp_doc->saveHTML();
}
filtered_content ends up being a usable HTML string with the correct content.

Related

DomDocument get all divs and put inside an array

I have have some divs with the same Id and same Class as you can see below:
<div id="results_information" class="control_results">
<!-- I have divs, subDivs, span, images inside -->
</div>
<div id="results_information" class="control_results">
<!-- I have divs, subDivs, span, images inside -->
</div>
....
In my case I want to save all of them inside an array to be used later, I want to save in this format:
[0] => '<div id="results_information" class="control_results">
<!-- I have divs, subDivs, span, images inside -->
</div>',
[1] => '<div id="results_information" class="control_results">
<!-- I have divs, subDivs, span, images inside -->
</div>',
....
For that I'm using this code below:
$dom = new DOMDocument(); // Create DOMDocument object.
$dom->loadHTMLFile($htmlOut); // Load target file.
$div =$dom->getElementById('results_information'); // Take all div elements.
But it doesn't work, how I can solve this problem and put my divs inside an array?

To solve your problem you need to do the following steps below:
First of all, you should be based on selecting a class and not an ID (Because id in this situation should be unique).
In this situation we assume that you have the following html inside a variable called $htmlOut:
<div id="results_information" class="control_results">
<span style="background:black; color:white">
hellow world
</span>
<strong>2</strong>
</div>
<div id="results_information" class="control_results">
<strong>2</strong>
<img src="hello.png" />
</div>
We need to extract all the html that exists inside theses two class called control_results and put inside an array, for this we need to work with DomDocument and DomXPath:
$array = array();
$dom = new DomDocument();
$dom->loadHtml($htmlOut);
$finder = new DomXPath($dom);
$classname = "control_results";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
With that code we can extract all the content of the divs with classname control_results and put inside the variable $nodes.
Now we need to parser the variable $nodes (that is an array) and extract all the HTML of that two class. For this I create a function to handle:
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
This function will extract every childNodes (Every HTML code inside the class control_results) and returns.
Now you only need to create a foreach for the variable $nodes and call that function, like this:
foreach ($nodes as $rowNode) {
$array[] = get_inner_html($rowNode);
}
var_dump($array);
Below is the complete code:
$htmlOut = '
<div id="results_information" class="control_results">
<span style="background:black; color:white">
hellow world
</span>
<strong>2</strong>
</div>
<div id="results_information" class="control_results">
<strong>2</strong>
<img src="hello.png" />
</div>
';
$array = array();
$dom = new DomDocument();
$dom->loadHtml($htmlOut);
$finder = new DomXPath($dom);
$classname = "control_results";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
foreach ($nodes as $rowNode) {
$array[] = get_inner_html($rowNode);
}
var_dump($array);
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
But this code has a little problem, if you check the results in array is:
0 => string '<span style="background:black; color:white">hellow world</span><strong>2</strong>',
1 => string '<strong>2</strong><img src="hello.png"/>'
instead of:
0 => string '<div id="results_information" class="control_results"><span style="background:black; color:white">hellow world</span><strong>2</strong></div>',
1 => string '<div id="results_information" class="control_results"><strong>2</strong><img src="hello.png"/></div>'
In this case you can perform a foreach of this array and include that div in the init of the contents and close that div in the final of the contents and re-save that array.

You will need to use xpath and get the elements using class name.
$dom = new DOMDocument();
$xpath = new DOMXpath($dom);
$div = $xpath->query('//div[contains(#class, "control_results")]')

How to find element in already parsed HTML data

Here I have a very simple code to grab all the 'div' elements with the classname 'info_block'. I am wondering how would I go about finding another element with the classname 'price' from within 'info_block' and display it instead of the whole 'info_block' element.
Main Goal: Find the price in each element with classname 'info_block'. but do inside the foreach, because I may need to find other elements.
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
//echo $dom->saveHTML($var1);
}
?>
There is a element in each of the 'info_block' with a classname 'price' and I would like to display only that element. Like so...
foreach ($div1 as $var1){
$dom2 = new DOMDocument();
$dom2->loadHTML($dom->saveHTML($var1));
$xpath2 = new DOMXPath($dom2);
$div2 = $xpath2->query('//div[#class="price"]');
$div2 = $div2->item(0);
echo $dom2->saveHTML($div2);
}
But instead of just giving me the price it returns the whole HTML for 'info_block' as it did before.

You could provide each <div class="info_block"> found and search for <div class="price">" by providing it in the second argument of ->query():
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath->query('./div[#class="price"]', $var1);
// ^ each div
$div2 = $div2->item(0);
echo $dom->saveHTML($div2);
}
Note: You do not need to create another instance of DOM and DOMXpath.
This example is taken into context of this kind of HTML semantic:
<div class="info_block"> // each info block
<div class="price">1</div> // inside of it has price
</div>
<div class="info_block">
<div class="price">2</div>
</div>

You can combine queries in XPath to find all the desired elements in one go
$xpath->query('//div[#class="info_block"]|//div[#class="price"]');

You can specify dom elements for doing relative XPath queries. Its optional in xpath->query method
<?php
$page = file_get_contents('example.com');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
$div1 = $xpath->query('//div[#class="info_block"]');
foreach ($div1 as $var1){
$div2 = $xpath2->query('//a[#class="price"]', $var1);
foreach ($div2 as $var2) {
echo $var2->nodeValue. "\n";
}
}
?>
For more you can see xpath documentation here
xpath query documentation

DOMDocument: problems with replaceChild()

I'm attempting to find the first <p> in a <div>:
<div class="embed-left">
<h4>Bookmarks</h4>
<p>Something goes here.</p>
<p>Read more...</p>
</div>
Which I've done.
Now, however, I need to replace the found text with a link, as assigned to the <span> before then being used in the $url createElement() method:
$results_links = $this->data_migration->process_embed_find_links();
$dom = new DOMDocument();
foreach ($results_links as $notes):
$dom->loadHTML($notes['note']);
$x = $dom->getElementsByTagName('div')->length;
// Loop through the <div> elements found in the HTML...
for ($i = 0; $i < $x; $i++):
$parentNode = $dom->getElementsByTagName('div')->item($i);
// Here's a <h4> element.
$childNodeHeading = $dom->getElementsByTagName('div')->item($i)->childNodes->item(1);
// If the <h4> element is "Bookmarks"...
if ( $childNodeHeading->nodeValue == "Bookmarks" ):
// ... then grab the first <p> element.
$childNodeTitle = $dom->getElementsByTagName('div')->item($i)->childNodes->item(3);
// Create the appropriate <p> element.
$title = $dom->createElement('p', $childNodeTitle->nodeValue);
echo "<p>" . $title->nodeValue . "</p>";
// Find the `notes_links.from-asset` rows.
$results_bookmarks_links = $this->data_migration->process_embed_find_links_bookmarks_links(array(
'note_id' => $notes['note_id'],
// Send the first <p> tag in the <div> element.
'title' => htmlentities($childNodeTitle->nodeValue)
));
// Loop through the data (one row returned, but it's more neat to run it through a foreach() function)...
foreach ($results_bookmarks_links as $index => $link):
// Assuming there are values (which there has to be, by virtue of the fact that we found the <div> elements in the first place...
if ( isset($results_bookmarks_links) && ( count($results_bookmarks_links) > 0 ) ):
// Create the <span> element for the link item, according to Sina's design.
$span = '<span>[#' . $notes['note_id'] . ']</span>';
**$url = $dom->createElement('span', $span);**
**$parentNode->replaceChild(
$url,
$title
);**
endif;
endforeach;
endif;
endfor;
endforeach;
Which I've had no success with.
I'm unable to figure out either the parent element, or the proper parameters to use in the replaceChild() method.
I've emboldened the main bits that I'm having trouble with, if that helps.

The important thing is to replace the existing p with a newly-created p that contains the child nodes.
Here's an example, using XPath to select the nodes to be replaced:
<?php
$html = <<<END
<div class="embed-left">
<h4>Bookmarks</h4>
<p>Something goes here.</p>
<p>Read more...</p>
</div>
END;
$doc = new DOMDocument;
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[h4[text()="Bookmarks"]]/p[1]');
foreach ($nodes as $oldnode) {
$note = 'TODO'; // build `$note` somewhere
$link = $doc->createElement('a');
$link->setAttribute('href', '#');
$link->textContent = sprintf('[#%s]', $note);
$span = $doc->createElement('span');
$span->appendChild($link);
$newnode = $doc->createElement('p');
$newnode->appendChild($span);
$oldnode->parentNode->replaceChild($newnode, $oldnode);
}
print $doc->saveHTML($doc->documentElement);

PHP DOMDocument: Delete elements by class

I' trying to delete every node with a given class.
To find the elements I use:
$xpath = new DOMXPath($dom);
foreach( $xpath->query('//div[contains(attribute::class, "foo")]') as $e ) {
// Delete this node
}
But how can I delete the elements in this foreach-loop?
Edit: By the way: How can I check first if there is a element with the class "foo" in the DOM (before starting the loop)?
Update:
This is my HTML:
<div class="main">
<div class="delete_this" contenteditable="true">Target</div>
<div class="class1"></div>
<div class="content"><p>Anything</p></div>
</div>
This doesn't work for the example above:
$xpath = new DOMXPath($dom);
foreach( $xpath->query('//div[contains(attribute::class, "delete_this")]') as $e ) {
$e->parentNode->removeChild($e);
}

You need to use the removeChild() method of the parent element:
$xpath = new DOMXPath($dom);
foreach($xpath->query('//div[contains(attribute::class, "foo")]') as $e ) {
// Delete this node
$e->parentNode->removeChild($e);
}
Btw, about your second question, if there are no elements found, the loop won't iterate at all.
Here comes a working example:
$html = <<<EOF
<div class="main">
<div class="delete_this" contenteditable="true">Target</div>
<div class="class1"></div>
<div class="content"><p>Anything</p></div>
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
foreach($selector->query('//div[contains(attribute::class, "delete_this")]') as $e ) {
$e->parentNode->removeChild($e);
}
echo $doc->saveHTML($doc->documentElement);

For the second part of the question, the result of the query has a length property which you can use to see if anything was matched:
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[contains(attribute::class, "foo")]');
printf('Removing %d nodes', $nodes->length);

This removes all divs with that class.
To actually remove all the elements by class use *:
$selector = new \DOMXPath( $doc );
foreach ( $selector->query( '//*[contains(attribute::class, "' . $class . '")]' ) as $e ) {
$e->parentNode->removeChild( $e );
}

PHP: Fetch content from a html page using xpath()

I'm trying to fetch the content of a div in a html page using xpath and domdocument. This is the structure of the page:
<div id="content">
<div class="div1"></div>
<span class="span1></span>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div class="div2"></div>
</div>
I want to get only the content of p, not spans and divs. I came thru this xpath expression .//*[#id='content']/p but guess something's not right because i'm getting only the first p. Tried using other expression with following-sibling and node() but all return the first p only.
.//*[#id='content']/span/following-sibling::p
.//*[#id='content']/node()[self::p]
This is how's used xpath:
$domDocument=new DOMDocument();
$domDocument->encoding = 'UFT8';
$domDocument->loadHTML($page);
$domXPath = new DOMXPath($domDocument);
$domNodeList = $domXPath->query($this->xpath);
$content = $this->GetHTMLFromDom($domNodeList);
And this is how i get html from nodes:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
$node = $domNodeList->item(0);
foreach($node->childNodes as $childNode)
$domDocument->appendChild($domDocument->importNode($childNode, true));
return $domDocument->saveHTML();
}

This XPath expression:
//div[#id='content']/p
Result in the wanted node set (five p elements)
EDIT: Now it's clear what is your problem. You need to iterate over the NodeList:
private function GetHTMLFromDom($domNodeList){
$domDocument = new DOMDocument();
foreach ($nodelist as $node) {
$domDocument->appendChild($domDocument->importNode($node, true));
}
return $domDocument->saveHTML();
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Need to get divs from string based on matching class - php

Solution is to do the following: $filtered_content = ""; foreach ( $nodes as $node ) { $tmp_doc = new DOMDocument(); $tmp_doc->appendChild($tmp_doc->importNode($node,true)); $filtered_content .= $tmp_doc->saveHTML(); } filtered_content ends up being a usable HTML string with the correct content.

Related

DomDocument get all divs and put inside an array

How to find element in already parsed HTML data

DOMDocument: problems with replaceChild()

PHP DOMDocument: Delete elements by class

PHP: Fetch content from a html page using xpath()

Categories

Resources