How to extract the full html content inside a div ? I tried this code,
$html= '<html>
<body>
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
</body>
</html>';
$attr = "id";
$value = "test";
$tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\\/div>/si';
preg_match($tag_regex,$html,$matches);
echo $matches[0];
By running this code I get the result,
<div id="test">
<div id="mydiv1">Hello</div>
Expected result,
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
In my code the regular expression execute till the first occurrence of </div> . How can I get the full code inside <div id="test"> ?
With DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementById('test');
$result = $dom->saveHTML($div);
Related
I have an external file with lots of informations e.g
http://domain.com/thefile.html
Each Data in the file is wrapped into a <div> element:
....
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
<div class="lineData">
<div class="lineLData">xbox one</div>
<div class="lineRData">not awesome</div>
</div>
<div class="lineData">
<div class="lineLData">wii u</div>
<div class="lineRData">mhhhh</div>
</div>
....
Now I want to search the whole file for the Keyword "Playstation" and echo the whole <div>:
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
Is this possible with PHP ?
If we assume the resource / URL is $url :
$result = array();
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents($url));
find all <div>'s with the class lineData using DomXPath :
$xpath = new DomXPath($dom);
$lineDatas = $xpath->query('//div[contains(#class,"lineData")]');
add all lineData <div>'s containing "playstation" to the $result array :
foreach($lineDatas as $lineData) {
if (strpos(strtolower($lineData->nodeValue), 'playstation') !== false) {
$result[] = $lineData;
}
}
example of outputting the result
foreach($result as $lineData) {
echo $dom->saveHTML($lineData);
}
outputs
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
when tested on the example HTML in OP.
Use DOMDocument for this purpose.
$dom = new DOMDocument;
$dom->loadHTMLFile("file.html");
Now you can search for the div:
$xpath = new DOMXPath($dom);
$res = $xpath->query("//*[contains(#class, 'lineData')]");
Now you have the div as DOMElement. Saving should be possible with these few lines:
$html = $res->ownerDocument->saveHTML($res);
Lets say I have this comment block containing HTML:
<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
<code>
<!-- test -->
</code>
</body>
</html>
Using DOMXPath for PHP, how do I get the links and text within the tag?
This is what I have so far:
$dom = new DOMDocument();
$dom->loadHTML("HTML STRING"); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$divs = $dom->getElementsByTagName('div')->item(0);
$entries = $xpath->query($query, $divs);
foreach($entries as $entry) {
# shows entire text block
echo $entry->textContent;
}
How do I navigate so that I can get the "c" classes and then put the links into an array?
EDIT Please note that there are multiple <code> tags within the page, so I can't just get an element with the code attribute.
You already can target the comment containing the links, just follow thru that and make another query inside it. Example:
$sample_markup = '<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
</body>
</html>';
$dom = new DOMDocument();
$dom->loadHTML($sample_markup); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$entries = $xpath->query($query);
foreach ($entries as $key => $comment) {
$value = $comment->nodeValue;
$html_comment = new DOMDocument();
$html_comment->loadHTML($value);
$xpath_sub = new DOMXpath($html_comment);
$links = $xpath_sub->query('//div[#class="c"]/a'); // target the links!
// loop each link, do what you have to do
foreach($links as $link) {
echo $link->getAttribute('href') . '<br/>';
}
}
i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database
I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div
I have a $content with
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div class="myclass">
...
</div>
...
...
<div class="myclass">
...
</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
I would like to remove via PHP all the divs with class="myclass" except the first one, and add another div instead of others, so that the result is:
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div>Check all divs here</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
Would be grateful if someone can point me a solution.
UDATE2:
some similar question here
from that I came up with the following test code:
$content = '<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
</div>
<div class="myclass">
</div>
<div class="myclass">
</div>
</div>
<div class="nav">
</div>
</div>
some other text here, <p></p> bla-bla-bla';
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#class="myClass" and position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
Any ideas where I can test it?
Here is what you are looking for (similar to your edit, but it removes the added html tags):
$doc = new DOMDocument();
$doc->loadHTML($content);
$xp = new DOMXpath($doc);
$elements = $xp->query("//div[#class='myclass']");
if($elements->length > 1)
{
$newElem = $doc->createElement("div");
$newElem->appendChild($doc->createTextNode("Check all divs "));
$newElemLink = $newElem->appendChild($doc->createElement("a"));
$newElemLink->setAttribute("href", "myurl");
$newElemLink->appendChild($doc->createTextNode("here"));
$elements->item(1)->parentNode->replaceChild($newElem, $elements->item(1));
for($i = $elements->length - 1; $i > 1 ; $i--)
{
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
}
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
$var = ':not(.myClass:eq(1))';
$var.removeClass("myClass");
$var.addClass("some_other_Class");
If I got you right, you've got a string called $content with all that content in it
It's not the best solution I guess but here is my attempt (which works fine for me):
if( substr_count($content, '<div class="myclass') > 1 ) {
$parts = explode('<div class="myclass',$content);
echo '<div class="myclass'.$parts[1];
echo '<div>Check all divs here</div>';
}
else {echo $content;}
I want to grab all contents inside body.
<html>
<head><title>Test</title>
</head>
<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>
and the final result that I want :
<div id="dummy"></div>
<p class="p"></p>
<div id="example"></div>
Not like this :
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
$content = '<html>
<head><title>Test</title>
</head>
<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>';
preg_match('/(?:<body[^>]*>)(.*)<\/body>/isU', $content, $matches);
$bodycontent = $matches[1];
echo htmlspecialchars($bodycontent);
preg_match_all('/<[^>]*>/isU', $bodycontent, $matches2);
$tags = implode("",$matches2[0]);
echo htmlspecialchars($tags);
Although this would work :
if (preg_match('%<(body)[^>]*>(.*)<\s*/\1\s*>%s', $subject, $regs)) {
$result = $regs[2];
}
I wouldn't recommend it. You have way better tools for this job with php. For example using this parser:
# create and load the HTML
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load("<html>
<head><title>Test</title></head>
<body>
<div id="dummy">Your contents</div>
<p class="p">Paragraph</p>
<div id="example">My Content</div>
</body>
</html>");
# get an element representing the body
$element = $html->find("body");
Edit:
Since you insist...
$result = preg_replace('%(<(div)[^>]*>).*<\s*/\2\s*>%', '\1</\2>', $subject);
This will remove the contents of a div tag. You can exchange the div tag with other tags as well. Although I really do not know where you are getting at with this and I don't recommend it.