DOMXpath remove tag - php

I have this HTML code:
<div class ="lvlone">
<div class = "lvltwo"> Hello
<span>World</span>
</div>
</div>
I do this:$res = $xpath->query(//div[#class='lvlone']/div[#class='lvltwo']);
I get Hello World including the string in <span> tag but i down want it!
I only want the Hello.
What can i do ?
Thanks!

As TheZ points out, you can use the text() function from XPath:
$nodes = $xpath->query( '//div[#class="lvltwo"]/text()');
echo $nodes->item(0)->nodeValue; // Prints 'Hello'

Related

Removing portion between two \n when a specific sub-string is there [PHP]

I have a variable in which I store some HTML code.
Let's say:
<div>
<span> test of {my_string} </span>
{my_string}
test of {my_string}
</div>
<h1> {my_string} </h1>
I would need to remove some lines containing a specific value so the end result looks like:
<div>
</div>
So I was thinking of getting the position of the string with strpos and then get the \n which are before and after. But how can I search backwards with strpos as I already have an offset specified?
$rep_pos = strpos($message, 'my_string');
$line_begining = ????
$line_end = strpos($message, '\n', $rep_pos);
I can't use strip_tags because I don't know in advance what will be the tags around and some other strings can use the same tags.
You should use DOMDocument for parsing HTML tags string. Here we are using XPath query which is //*[text()=" my_string "] which means get all elements which contains my_string text.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string='<html>
<body>
<div>
<span> my_string </span>
</div>
<h1> my_string </h1>
</body>
</html>';
$domobject= new DOMDocument();
$domobject->loadHTML($string);
$xpath= new DOMXPath($domobject);
$result=$xpath->query('//*[text()=" my_string "]');
Foreach($result as $nodes)
{
$nodes->parentNode->removeChild($nodes);
}
echo $domobject->saveHTML();
Solution 2:
Regex demo

How can I strip html tags except some of them?

I need to remove all html codes from a php string except:
<p>
<em>
<small>
You know, strip_tags() function is good, but it strips all html tags, how can I tell it remove all html except those tags above?
You should check out the manual: Example #1 strip_tags() example
Syntax: strip_tags ( Your-string, Allowable-Tags )
If you pass the second parameter, these tags will not be stripped.
strip_tags($string, '<p><em><small>');
According to your comment, you want to remove HTML elements only if they have some class or attribute. You'll need to build up a DOM then:
<?php
$data = <<<DATA
<div>
<p>These line shall stay</p>
<p class="myclass">Remove this one</p>
<p>I will be deleted as well</p>
<p>But keep this</p>
</div>
DATA;
$dom = new DOMDOcument();
$dom->loadHTML($data, LIBXML_HTML_NOIMPLIED);
$xpath = new DOMXPath($dom);
$elements_to_be_removed = $xpath->query("//*[count(#*)>0]");
foreach ($elements_to_be_removed as $element) {
$element->parentNode->removeChild($element);
}
// just to check
echo $dom->saveHTML();
?>
To change which elements shall be removed, you'll need to change the query, ie to remove all elements with the class myclass, it must read "//*[class='myclass']".

Get specific html portion with regex string matching in php

i am trying to get specific HTML code portion with regex preg_match_all by matching it with class tag But it is returning empty array.
This is the html portion which i want to get from complete HTML
<div class="details">
<div class="title">
<a href="citation.cfm?id=2892225&CFID=598850954&CFTOKEN=15595705"
target="_self">Restrictification of function arguments</a>
</div>
</div>
Where I am using this regex
preg_match_all('~<div class=\'details\'>\s*(<div.*?</div>\s*)?(.*?)</div>~is', $html, $matches );
NOTE: $html variable is having the whole html from which I want to search.
Thanks.
You are looking for single quotes in your regex in contrast to the double quotes in $html.
Your regex should look like:
'~<div class="details">\s*(<div.*?</div>\s*)?(.*?)</div>~is'
or better:
'~<div class=[\'"]details[\'"]>\s*(<div.*?</div>\s*)?(.*?)</div>~is'
Better use a DOM approach !
<?php
$html = '<div class="details">
<div class="title">
<a href="citation.cfm?id=2892225&CFID=598850954&CFTOKEN=15595705"
target="_self">Restrictification of function arguments</a>
</div>
</div>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$divs = $xpath->query('//div[#class="title"]');
print_r($divs);
?>

How to select 2nd element with same tag using dom xpath?

I have layout like this:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
First I get query from xpath :
$a = $xpath->query("//div[#class='fly']""); //to get all elements in class fly
foreach ($a as $p) {
$t = $p->getElementsByTagName('img');
echo ($t->item(0)->getAttributes('data-original'));
}
When I run the code, it will produced 0 result. After I trace I found that <img class="badge"> is processed first. I want to ask, how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
Thank you,
Alernatively, you could use another xpath query on that to add on your current code.
To get the attribute, use ->getAttribute():
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('./img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('./div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('./div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
Sample Output
Thank you for your code!
I try the code but it fails, I don't know why. So, I change a bit of your code and it works!
$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);
$parent_div = $xpath->query("//div[#class='fly']"); //to get all elements in class fly
foreach($parent_div as $div) {
$aye = $xpath->query('**descendant::**img[#class="aye"]', $div)->item(0)->getAttribute('data-original');
echo $aye . '<br/>'; // get the data-original
$others = $xpath->query('**descendant::**div[#class="to"]/div[#class="clearfix"]', $div)->item(0);
foreach($xpath->query('.//div/h4', $others) as $node) {
echo $node->nodeValue . '<br/>'; // echo the two h4 values
}
echo '<hr/>';
}
I have no idea what is the difference between ./ and descendant but my code works fine using descendant.
given the following XML:
<div class="fly">
<img src="a.png" class="badge">
<img class="aye" data-original="b.png" width="130" height="253" />
<div class="to">
<h4>Fly To The Moon</h4>
<div class="clearfix">
<div class="the">
<h4>**Wow**</h4>
</div>
<div class="moon">
<h4>**Great**</h4>
</div>
</div>
</div>
</div>
you asked:
how can I get data-original value from <img class="aye">and also get the value "Wow" and "Great" from <h4> tag?
With XPath you can obtain the values as string directly:
string(//div[#class='fly']/img/#data-original)
This is the string from the first data-original attribute of an img tag within all divs with class="fly".
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])
string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])
These are the string values of first and second <h4> tag that is not followed on it's own level by another <h4> tag within all divs class="fly".
This looks a bit like standing in the way right now, but with iteration, those parts in front will not be needed any longer soon because the xpath then will be relative:
//div[#class='fly']
string(./img/#data-original)
string(.//h4[not(following-sibling::*//h4)][1])
string(.//h4[not(following-sibling::*//h4)][2])
To use xpath string(...) expressions in PHP you must use DOMXPath::evaluate() instead of DOMXPath::query(). This would then look like the following:
$aye = $xpath->evaluate("string(//div[#class='fly']/img/#data-original)");
$h4_1 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])");
$h4_2 = $xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])");
A full example with iteration and output:
// all <div> tags with class="fly"
$divs = $xpath->evaluate("//div[#class='fly']");
foreach ($divs as $div) {
// the first data-original attribute of an <img> inside $div
echo $xpath->evaluate("string(./img/#data-original)", $div), "<br/>\n";
// all <h4> tags anywhere inside the $div
$h4s = $xpath->evaluate('.//h4[not(following-sibling::*//h4)]', $div);
foreach ($h4s as $h4) {
echo $h4->nodeValue, "<br/>\n";
}
}
As the example shows, you can use evaluate as well for node-lists, too. Obtaining the values from all <h4> tags it not with string() any longer as there could be more than just two I assume.
Online Demo including special string output (just exemplary):
echo <<<HTML
{$xpath->evaluate("string(//div[#class='fly']/img/#data-original)")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][1])")}<br/>
{$xpath->evaluate("string(//div[#class='fly']//h4[not(following-sibling::*//h4)][2])")}<br/>
<hr/>
HTML;

Php get a value from url using a class

here is the div code on different domains, i want to display total on my homepage. I try to use the file_get_html but it displays all the div content, but i want to save the number within the <dd></dd> in a variables and add them and display them on my page.
here is the div code
<div class="stats">
<dl class="statscount">
<dt>total:</dt>
<dd>5,299</dd>
</dl>
20000
</div>
and here is my current code.
<?php
include 'simple_html_dom.php';
$html = file_get_html('http://www.targetdomain.com');
$result = $html->find('dl[class=statscount]', 0); //Output: THESE
$result = str_replace(",", "", $result);
echo $result;
?>
but there is small problem i don't need to fetch all the data in the class, i just need data for <dd></dd> tag within the class, Can you please tell me how to achieve this. basically i want to fetch the number within the <dd>5,299</dd> and add all the numbers from different pages and display the total on my website. Thanks
I would use XPath for this, this way you won't need simple_html_dom because DOM and XPath is part of the PHP5 core:
$html = <<<EOF
<div class="stats">
<dl class="statscount">
<dt>total posts:</dt>
<dd>5,299</dd>
</dl>
20000
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
$value = $selector
->query('//dl[#class="statscount"]/dd/text()')
->item(0)
->nodeValue;
var_dump($value); // Output: string(5) "5,299"
You can test the code here
Maybe a regex
preg_match('/<dd>[^>]*(.*)<\/dd>/', $htmlcode, $matches);
$result = $matches;

Categories