Get div from external page, then delete an another div from it - php

I need a little help, with getting content from external webpages.
I need to get a div, and then delete another div from inside it. This is my code, can someone help me?
This is the relevant portion of my XML code:
<html>
...
<body class="domain-4 page-product-detail" > ...
<div id="informacio" class="htab-fragment"> <!-- must select this -->
<h2 class="description-heading htab-name">Utazás leírása</h2>
<div class="htab-mobile tab-content">
<p class="tab-annot">* Hivatalos ismertető</p>
<div id="trip-detail-question"> <!-- must delete this -->
<form> ...</form>
</div>
<h3>USP</h3><p>Nagy, jól szervezett és családbarát ...</p>
<div class="message warning-message">
<p>Az árak már minden aktuális kedvezményt tartalmaznak!</p>
<span class="ico"></span>
</div>
</div>
</div>
...
</body>
</html>
I need to get the div with id="informacio", and after that I need to delete the div id="trip-detail-question" from it including the form it contains.
This is my code, but its not working correctly :(.
function get_content($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$query = "//div[#id='informacio']";
$entries = $xpath->query($query)->item(0);
foreach($xpath->query("div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);
$var = $doc->saveXML($entries);
return $var;
}

Your second XPath expression is incorrect. It tries to select a div in the context of the div you selected previously as its child node. You are trying to select:
//div[#id='informacio']/div[#id='trip-detail-question']
and that node does not exist. You want this node:
//div[#id='informacio']/div/div[#id='trip-detail-question']
which you can also select like this (allowing any element, not just div):
//div[#id='informacio']/*/div[#id='trip-detail-question']
or (allowing more than one nesting levels)
//div[#id='informacio']//div[#id='trip-detail-question']
In the context of the first div, the correct XPath expression would be:
.//div[#id='trip-detail-question']
If you change it in your code, it should work:
foreach($xpath->query(".//div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);

Related

Replace a whole div code (with divs inside) with different code using php

I am getting a code of a page using ob_buffer and i try to replace all divs inside the page that contain the class locked-content.
While it's easy with jquery the problem is that with php is a bit harder. Let say for example i have this html code
<div class='class'> cool content </div>
<div class='class more-class life-is-hard locked-content'>
<div class='cool-div'></div>
<div class='anoter-cool-div'></div>
some more code here
</div>
<div class='class'> cool content </div>
Now it seems like a complex task, I think i need to detect somehow how many divs are open after the div with the class 'locked-content' and then count how many closed div there are and when the wanted div was closed and then replace the code with new code while looping the code in case the div exists more than once.
Anyone has an idea on how to do something like this?
Thanks
You can do it via DOMXPath:
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('//div[contains(#class, "locked-content")]');
foreach ($nodes as $node) {
foreach ($node->childNodes as &$cNode) {
if ($cNode instanceOf DOMElement && $cNode->tagName === 'div') {
$cNode->replaceWith(/* Whatever */);
}
}
}

Fetch nested tags in php using simplehtmldom

Lets say I have this code. I want to fetch all p tag data from nested div tag. there can be 15 nested div tag. so want to write a script which can dig all the div and return p tag data from it.
<div>
<div>
<div>
<p>Hi</p>
</div>
<p>Hello</p>
</div>
<p>Hey</p>
</div>
required output(any order):
Hi
Hello
Hey
I have attempted the following:
function divDigger($div)
{
$internalP = $div->getElementsByTagName('p');
echo $internalP->innertext;
$internalDiv = $div->getElementsByTagName('div');
if (count($internalDiv) > 0) {
foreach ($internalDiv as $div) {
divDigger($div);
}
}
}
You may use the XPath API for this:
$doc = new \DOMDocument();
$doc->loadHTML($yourHtml);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query('//div//p') as $pWithinDiv) {
echo $pWithinDiv->textContent, PHP_EOL;
}
This will find any <p> element under a <div> (not necessarily directly under it, otherwise you can change the expression to //div/p), and display its text content.
Demo: https://3v4l.org/43QqX

Create a Container Div with DOM/XPATH

I just created a new DOM XPATH OBJECT.
And, after a couple operations I've storage my result with SaveHtml
$String[] = $dom->saveHTML();
And then, Ive just put the content inside a file.
file_put_contents($filename, $string);
The Html Structure is someting like this.
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div>
There is some methods in order to create new divs. You can use ->createElement. Also, you can place this new element with ->parentNode->insertBefore but its no possible to create a container div, like this.
<div if="container-div">
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div> </div>
I tried multiples ways to do it without success.
So, I have a couple questions:
1. It is possible to create a Container Div modifying the Dom directly?
2. It is possible to adding a new Html element of an Array that contains $dom->saveHTML(); data?
Sure it is. Most of it is actually uses the same methods. For example appendChild()/insertBefore() are not just used for new nodes, they can move existing nodes.
$html = <<<'HTML'
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
// fetch the nodes that should be put inside the container
$nodes = $xpath->evaluate('//div[starts-with(#if, "rand")]');
// validate that here is at least one node
if ($first = $nodes[0]) {
// create the container and insert it before the first
$container = $first->parentNode->insertBefore(
$document->createElement('div'), $first
);
$container->setAttribute('class', 'container-div');
// move all the fetched nodes into the container
foreach($nodes as $node) {
$container->appendChild($node);
}
// output formatted
$document->formatOutput = TRUE;
echo $document->saveHTML($container);
}
Output:
<div class="container-div">
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
</div>

Remove the last child in DomXPath

Current structure looks like
<div class="...">
//more html
<div class="message-right">
<div class="item1"> //more html </div>
<div class="item2"> //more html </div>
<div class="item3"> //more html </div>
</div>
//more html
</div>
I want to be able to get the html content inside the class 'message-right', and remove the last child. (In this case 'item3')
I should be left with the html from 'item1' and 'item2'
So far I have
$dom = new DomDocument();
#$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname = "message-right";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
//this is where I am stuck, need to remove the last child, 'item3'
//this returns the html from 'message-right'
$html = $nodes->item(0)->c14n()
Fetch the last child element (XPath will make that easier) and delete it.
$delete = $finder->query("./*[last()]", $nodes->item(0))->item(0);
$delete->parentNode->removeChild($delete);
Depending on what you really need you might want to fetch (and subsequently delete) that element directly using
//*[contains(#class, '$classname')]/*[last()]

PHP DOMDocument: insertBefore, how to make it work?

I would like to place a new node element, before a given element. I'm using insertBefore for that, without success!
Here's the code,
<DIV id="maindiv">
<!-- I would like to place the new element here -->
<DIV id="child1">
<IMG />
<SPAN />
</DIV>
<DIV id="child2">
<IMG />
<SPAN />
</DIV>
//$div is a new div node element,
//The code I'm trying, is the following:
$maindiv->item(0)->parentNode->insertBefore( $div, $maindiv->item(0) );
//Obs: This code asctually places the new node, before maindiv
//$maindiv object(DOMNodeList)[5], from getElementsByTagName( 'div' )
//echo $maindiv->item(0)->nodeName gives 'div'
//echo $maindiv->item(0)->nodeValue gives the correct data on that div 'some random text'
//this code actuall places the new $div element, before <DIV id="maindiv>
http://pastie.org/1070788
Any kind of help is appreciated, thanks!
If maindiv is from getElementsByTagName(), then $maindiv->item(0) is the div with id=maindiv. So your code is working correctly because you're asking it to place the new div before maindiv.
To make it work like you want, you need to get the children of maindiv:
$dom = new DOMDocument();
$dom->load($yoursrc);
$maindiv = $dom->getElementById('maindiv');
$items = $maindiv->getElementsByTagName('DIV');
$items->item(0)->parentNode->insertBefore($div, $items->item(0));
Note that if you don't have a DTD, PHP doesn't return anything with getElementsById. For getElementsById to work, you need to have a DTD or specify which attributes are IDs:
foreach ($dom->getElementsByTagName('DIV') as $node) {
$node->setIdAttribute('id', true);
}
From scratch, this seems to work too:
$str = '<DIV id="maindiv">Here is text<DIV id="child1"><IMG /><SPAN /></DIV><DIV id="child2"><IMG /><SPAN /></DIV></DIV>';
$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName("div");
$divs->item(0)->appendChild($doc->createElement("div", "here is some content"));
print_r($divs->item(0)->nodeValue);
Found a solution:
$child = $maindiv->item(0);
$child->insertBefore( $div, $child->firstChild );
I don't know how much sense this makes, but well, it worked.

Categories