DOMXPath to find specific image tag - php

My question is very direct. Here is my html dom,
<html>
...
<div class="A B">
<div class="C">
<img src="..." >
</div>
<div>
...
<div class="A">
</div>
...
</html>
Now I want to get the image's src in div[class="A B"]-><div class="C">-><img> using DOMXPath in php code.
The main puzzle is that I do not know how to write it's path correctly.
Update
I have tried How to get data from HTML using regex, but it doesn't work still.
The actual html structure is :
My php code:
$doc = new DOMDocument();
$doc->loadHTML($html);
$title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
$XPath = new DOMXPath($doc);
$vipImg = $XPath->query('//div[#class="show-midpic active-pannel"]/a/div[#class="zoomPad"]/img');
var_dump($vipImg);
foreach($vipImg as $vip)
{
var_dump($vip);
}
And the output is :
object(DOMNodeList)#2 (1) { ["length"]=> int(0) }

Related

Extracting value of a node after a certain tag

Tying to extract the value "Output" between spans only if the title is "ABCD (1,2)" using php. Basically, find "Output (extract Output).
Here is the section of html:
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
Here is the code I like to use:
<?php
$html = file_get_contents('test.html');
$dom = new DOMDocument;
#$dom->loadHTML($html);
//Some code needs to go here!
$tags = $dom->getElementsByTagName('strong');
?>
One way would be to just use xpath in this case, use a query that would select that desired element. Get that element that has that title and get the following div, and under it, go to the span:
Example (using the markup above):
$html = '
<div class="wrap">
<strong title="ABCD (1,2)" class="name">ABCD (1,2):</strong>
<div id="test1">
<div class="testclass" id="test2">
<span>Output</span>
</div>
</div>
</div>
';
$search_string = 'ABCD (1,2)';
$dom = new DOMDocument;
#$dom->loadHTML($html);
$query = "//strong[#title = '{$search_string}']/following-sibling::div/div/span";
$xpath = new DOMXpath($dom);
$result = $xpath->query($query);
if($result->length > 0) {
echo $result->item(0)->nodeValue;
}

Add attributes to outer tags of html fragments

I try to add attributes to outer tags of html code fragments. I prepared some code, but it behaves strange.
The string that is for testing has two outer tags: div and paragraph. But only div gets the new attribute.
And the paragraphs is being moved into the div. What is wrong in the code?
Thanks
https://ideone.com/6Fu2zy
<?php
$html = '
<div>
<a>
<h1>Article 02</h1>
</a>
<img src="abc.jpg">
</div>
<p>
<span>dsaf</span>
</p>';
$dom = new DOMDocument();
#$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$x = new DOMXPath($dom);
foreach ($x->query("/*") as $node) {
$node->setAttribute("style", "xxxx");
}
$newHtml = $dom->saveHtml();
echo $newHtml;
edit:
So I could put the nodes into <root> tags and then add attributes. But I did not know how to do that so I simply left outer <html> and <body> tags.
Adding attributes succeed but then I did not know how to remove outer <html> and <body> tags from the code.
I tried the same way than before but did not succeed.
https://ideone.com/6Fu2zy
<?php
$html = '
<div>
<a>
<h1>Article 02</h1>
</a>
<img src="abc.jpg">
</div>
<p>
<span>dsaf</span>
</p>';
$dom = new DOMDocument();
#$dom->loadHTML($html, LIBXML_HTML_NODEFDTD);
$x = new DOMXPath($dom);
foreach ($x->query("/html/body/*") as $node) {
$node->setAttribute("style", "xxxx");
}
$newHtml = #$dom->saveHtml();
#$dom->loadHTML($newHtml, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$newHtml2 = #$dom->saveHtml();
echo $newHtml2;
The problem is that your HTML has not a root element, so DOMDocument convert the first element (<div>) to a wrapper for all other nodes.
Your:
<div>
<a><h1>Article 02</h1></a>
<img src="abc.jpg">
</div>
<p><span>dsaf</span></p>
loaded by DOMDocument become:
<div>
<a><h1>Article 02</h1></a>
<img src="abc.jpg">
<p><span>dsaf</span></p>
</div>
Consequently the /* pattern return only one node.
Add a root element to your HTML:
<root>
<div>
<a><h1>Article 02</h1></a>
<img src="abc.jpg">
</div>
<p><span>dsaf</span></p>
</root>
then use this path:
/root/*
After transformation, if you need to output only inner HTML, unfortunately DOMDocument doesn't have this feature. You can do something like this:
$innerHTML = "";
foreach( $dom->getElementsByTagName( 'root' )->item(0)->childNodes as $child )
{
$innerHTML .= $dom->saveHTML( $child );
}

Retrieve a text node with Simple HTML DOM Parser

I'm quite new to Simple HTML DOM Parser. I want to get a child element from the following HTML:
<div class="article">
<div style="text-align:justify">
<img src="image.jpg" title="image">
<br>
<br>
"Text to grab"
<div>......</div>
<br></br>
................
................
</div>
</div>
I'm trying to get the text "Text to grab"
So far I've tried the following query:
$html->find('div[class=article] div')->children(3);
But it's not working. Any idea how to solve this ?
You don't need simple_html_dom here. It can be done with DOMDocument and DOMXPath. Both are part of the PHP core.
Example:
// your sample data
$html = <<<EOF
<div class="article">
<div style="text-align:justify">
<img src="image.jpg" title="image">
<br>
<br>
"Text to grab"
<div>......</div>
<br></br>
................
................
</div>
</div>
EOF;
// create a document from the above snippet
// if you are loading from a remote url use:
// $doc->load($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
// initialize a XPath selector
$selector = new DOMXPath($doc);
// get the text node (also text elements in xml/html are nodes
$query = '//div[#class="article"]/div/br[2]/following-sibling::text()[1]';
$textToGrab = $selector->query($query)->item(0);
// remove newlines on start and end using trim() and output the text
echo trim($textToGrab->nodeValue);
Output:
"Text to grab"
If it's always in the same place you can do:
$html->find('.article text', 4);

Extract HTML data using php/DOM

Im newbie with DOM so can someone tell me how to parse the following in php?
<div class="classname1">
<div class="description">some description</div>
<div class="classname2">
<div class="classname3">some text 1</div>
<div class="classname4">some text 2</div>
<div class="classname6">some text 4</div>
</div>
</div>
I would like to retrieve the text in the above class. There could be mode div before and after the html mentioned. I know I should create a dom
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[#class="classname1"]');
foreach ($divs as $div) {
//...
}
I dont know how to access the classnames data
you can use the getAttribute on the DOMElement
http://www.php.net/manual/en/domelement.getattribute.php

PHP Strip_tags for div with a specific ID?

Does anybody know if a modified strip_tags function exsists where you can specify the ID of the tags to be stripped, and possbile also specify to remove ALL THE DATA IN THE TAGS. Take for example:
<div id="one">
<div id="two">
bla bla bla
</div>
</div>
Running:
new_strip_tags($data, 'two', true);
Must return:
<div id="one">
</div>
Is there something like this out there?
You can use DOMDocument and DOMXPath for that.
<?php
$html = '<html><head><title>...</title></head><body>
<div id="one">
<div id="two">
bla bla bla
</div>
</div>
</body></html>';
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$ns = $xpath->query('//div[#id="two"]');
// there can be only one... but anyway
foreach($ns as $node) {
$node->parentNode->removeChild($node);
}
echo $doc->savehtml();
That's not exactly what strip_tags does, it strips the tags but leaves the content. What you want is something like this:
function remove_div_with_id($html, $id) {
return preg_replace('/<div[^>]+id="'.preg_quote($id, '/').'"[^>]*>(.*?)<\/div>/s', '', $html);
}
Note that this will not work correctly with nested tags. If you need that, you might want to use a DOM representation of your HTML.

Categories