i have this html from stagram :
<div id="photo351321902758808423_176859145" class="photoeach">
<div class="photoeachinner">
<div class="left">
<div class="photowrapper">
<div class="infomation_box clearfix">
<div class="profimage_small">
<div id="photo351295515670923844_176859145" class="photoeach">
<div class="photoeachinner">
<div class="left">
<div class="photowrapper">
<div class="infomation_box clearfix">
i need find class photoeach and extract id 352034826703915686_176859145
i did with regex but no luck , so im trying do it with domdocument
i followed step from
Getting DOM elements by classname
$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="photoeach";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
but i cant firgure it out how i can extract ID
As Dave already mentioned you are not really that far off:
$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="photoeach";
$nodes = $finder->query("//*[#class = '$classname']");
foreach ($nodes as $node) {
echo 'Id: ' , substr($node->getAttribute('id'), 5) , '<br>';
}
Demo: http://codepad.viper-7.com/xEdYLr
Note that I have changed the contains selector of the class to only match exact matches, otherwise the photoeachinner would also be matched. If this is not what you want you can easily revert that change.
Related
Using preg_match_all is it possible to match elements within a parent that has a specific class name?
For example I have this HTML markup:
<div class="red lorem-ipsum>
Some link
</div>
<div class="red>
Some link
</div>
<div class="something red lorem-ipsum>
Some link
</div>
Can I match each <a> that's within a parent with class name red?
I tried this but it does not work:
~(?:<div class="red">|\G)\s*<a [^>]+>~i
Use DOMDocument in combination with DOMXPath. Here the HTML is in the $html variable:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$matches = $xpath->query("//div[contains(concat(' ', #class, ' '), ' red ')]/a");
foreach($matches as $a) {
echo $a->textContent;
}
I just created a new DOM XPATH OBJECT.
And, after a couple operations I've storage my result with SaveHtml
$String[] = $dom->saveHTML();
And then, Ive just put the content inside a file.
file_put_contents($filename, $string);
The Html Structure is someting like this.
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div>
There is some methods in order to create new divs. You can use ->createElement. Also, you can place this new element with ->parentNode->insertBefore but its no possible to create a container div, like this.
<div if="container-div">
<div if="rand11">
</div>
<div if="rand24">
</div>
<div if="rand51">
</div> </div>
I tried multiples ways to do it without success.
So, I have a couple questions:
1. It is possible to create a Container Div modifying the Dom directly?
2. It is possible to adding a new Html element of an Array that contains $dom->saveHTML(); data?
Sure it is. Most of it is actually uses the same methods. For example appendChild()/insertBefore() are not just used for new nodes, they can move existing nodes.
$html = <<<'HTML'
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
// fetch the nodes that should be put inside the container
$nodes = $xpath->evaluate('//div[starts-with(#if, "rand")]');
// validate that here is at least one node
if ($first = $nodes[0]) {
// create the container and insert it before the first
$container = $first->parentNode->insertBefore(
$document->createElement('div'), $first
);
$container->setAttribute('class', 'container-div');
// move all the fetched nodes into the container
foreach($nodes as $node) {
$container->appendChild($node);
}
// output formatted
$document->formatOutput = TRUE;
echo $document->saveHTML($container);
}
Output:
<div class="container-div">
<div if="rand11"></div>
<div if="rand24"></div>
<div if="rand51"></div>
</div>
I have layout like this:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
First I get query from xpath :
$a = $xpath->query("//div[#class='fly']""); //to get all elements in class fly
foreach ($a as $p) {
$t = $p->getElementsByTagName('img');
echo ($t->item(0)->getAttributes('data-original'));
}
When I run the code, it will produced 0 result. After I trace I found that <img class="badge"> is processed first. I want to ask, how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
Thank you,
Alernatively, you could use another xpath query on that to add on your current code.
To get the attribute, use ->getAttribute():
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('./img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('./div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('./div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
Sample Output
Thank you for your code!
I try the code but it fails, I don't know why. So, I change a bit of your code and it works!
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('**descendant::**img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('**descendant::**div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('.//div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
I have no idea what is the difference between ./ and descendant but my code works fine using descendant.
given the following XML:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
you asked:
how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
With XPath you can obtain the values as string directly:
string(//div[#class='fly']/img/#data-original)
This is the string from the first data-original attribute of an img tag within all divs with class="fly".
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])
These are the string values of first and second <h4> tag that is not followed on it's own level by another <h4> tag within all divs class="fly".
This looks a bit like standing in the way right now, but with iteration, those parts in front will not be needed any longer soon because the xpath then will be relative:
//div[#class='fly']
string(./img/#data-original)
string(.//h4[not(following-sibling::*//h4)][1])
string(.//h4[not(following-sibling::*//h4)][2])
To use xpath string(...) expressions in PHP you must use DOMXPath::evaluate() instead of DOMXPath::query(). This would then look like the following:
$aye = $xpath->evaluate("string(//div[#class='fly']/img/#data-original)");
$h4_1 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])");
$h4_2 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])");
A full example with iteration and output:
// all <div> tags with class="fly"
$divs = $xpath->evaluate("//div[#class='fly']");
foreach ($divs as $div) {
// the first data-original attribute of an <img> inside $div
echo $xpath->evaluate("string(./img/#data-original)", $div), "<br/>\n";
// all <h4> tags anywhere inside the $div
$h4s = $xpath->evaluate('.//h4[not(following-sibling::*//h4)]', $div);
foreach ($h4s as $h4) {
echo $h4->nodeValue, "<br/>\n";
}
}
As the example shows, you can use evaluate as well for node-lists, too. Obtaining the values from all <h4> tags it not with string() any longer as there could be more than just two I assume.
Online Demo including special string output (just exemplary):
echo <<<HTML
{$xpath->evaluate("string(//div[#class='fly']/img/#data-original)")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])")}<br/>
<hr/>
HTML;
I need a little help, with getting content from external webpages.
I need to get a div, and then delete another div from inside it. This is my code, can someone help me?
This is the relevant portion of my XML code:
<html>
...
<body class="domain-4 page-product-detail" > ...
<div id="informacio" class="htab-fragment"> <!-- must select this -->
<h2 class="description-heading htab-name">Utazás leírása</h2>
<div class="htab-mobile tab-content">
<p class="tab-annot">* Hivatalos ismertető</p>
<div id="trip-detail-question"> <!-- must delete this -->
<form> ...</form>
</div>
<h3>USP</h3><p>Nagy, jól szervezett és családbarát ...</p>
<div class="message warning-message">
<p>Az árak már minden aktuális kedvezményt tartalmaznak!</p>
<span class="ico"></span>
</div>
</div>
</div>
...
</body>
</html>
I need to get the div with id="informacio", and after that I need to delete the div id="trip-detail-question" from it including the form it contains.
This is my code, but its not working correctly :(.
function get_content($url){
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->strictErrorChecking = false;
$doc->recover = true;
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$query = "//div[#id='informacio']";
$entries = $xpath->query($query)->item(0);
foreach($xpath->query("div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);
$var = $doc->saveXML($entries);
return $var;
}
Your second XPath expression is incorrect. It tries to select a div in the context of the div you selected previously as its child node. You are trying to select:
//div[#id='informacio']/div[#id='trip-detail-question']
and that node does not exist. You want this node:
//div[#id='informacio']/div/div[#id='trip-detail-question']
which you can also select like this (allowing any element, not just div):
//div[#id='informacio']/*/div[#id='trip-detail-question']
or (allowing more than one nesting levels)
//div[#id='informacio']//div[#id='trip-detail-question']
In the context of the first div, the correct XPath expression would be:
.//div[#id='trip-detail-question']
If you change it in your code, it should work:
foreach($xpath->query(".//div[#id='trip-detail-question']", $entries) as $node)
$node->parentNode->removeChild($node);
Current structure looks like
<div class="...">
//more html
<div class="message-right">
<div class="item1"> //more html </div>
<div class="item2"> //more html </div>
<div class="item3"> //more html </div>
</div>
//more html
</div>
I want to be able to get the html content inside the class 'message-right', and remove the last child. (In this case 'item3')
I should be left with the html from 'item1' and 'item2'
So far I have
$dom = new DomDocument();
#$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname = "message-right";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
//this is where I am stuck, need to remove the last child, 'item3'
//this returns the html from 'message-right'
$html = $nodes->item(0)->c14n()
Fetch the last child element (XPath will make that easier) and delete it.
$delete = $finder->query("./*[last()]", $nodes->item(0))->item(0);
$delete->parentNode->removeChild($delete);
Depending on what you really need you might want to fetch (and subsequently delete) that element directly using
//*[contains(#class, '$classname')]/*[last()]