Lets say I have this comment block containing HTML:
<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
<code>
<!-- test -->
</code>
</body>
</html>
Using DOMXPath for PHP, how do I get the links and text within the tag?
This is what I have so far:
$dom = new DOMDocument();
$dom->loadHTML("HTML STRING"); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$divs = $dom->getElementsByTagName('div')->item(0);
$entries = $xpath->query($query, $divs);
foreach($entries as $entry) {
# shows entire text block
echo $entry->textContent;
}
How do I navigate so that I can get the "c" classes and then put the links into an array?
EDIT Please note that there are multiple <code> tags within the page, so I can't just get an element with the code attribute.
You already can target the comment containing the links, just follow thru that and make another query inside it. Example:
$sample_markup = '<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
</body>
</html>';
$dom = new DOMDocument();
$dom->loadHTML($sample_markup); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$entries = $xpath->query($query);
foreach ($entries as $key => $comment) {
$value = $comment->nodeValue;
$html_comment = new DOMDocument();
$html_comment->loadHTML($value);
$xpath_sub = new DOMXpath($html_comment);
$links = $xpath_sub->query('//div[#class="c"]/a'); // target the links!
// loop each link, do what you have to do
foreach($links as $link) {
echo $link->getAttribute('href') . '<br/>';
}
}
Related
I have an external file with lots of informations e.g
http://domain.com/thefile.html
Each Data in the file is wrapped into a <div> element:
....
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
<div class="lineData">
<div class="lineLData">xbox one</div>
<div class="lineRData">not awesome</div>
</div>
<div class="lineData">
<div class="lineLData">wii u</div>
<div class="lineRData">mhhhh</div>
</div>
....
Now I want to search the whole file for the Keyword "Playstation" and echo the whole <div>:
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
Is this possible with PHP ?
If we assume the resource / URL is $url :
$result = array();
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents($url));
find all <div>'s with the class lineData using DomXPath :
$xpath = new DomXPath($dom);
$lineDatas = $xpath->query('//div[contains(#class,"lineData")]');
add all lineData <div>'s containing "playstation" to the $result array :
foreach($lineDatas as $lineData) {
if (strpos(strtolower($lineData->nodeValue), 'playstation') !== false) {
$result[] = $lineData;
}
}
example of outputting the result
foreach($result as $lineData) {
echo $dom->saveHTML($lineData);
}
outputs
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
when tested on the example HTML in OP.
Use DOMDocument for this purpose.
$dom = new DOMDocument;
$dom->loadHTMLFile("file.html");
Now you can search for the div:
$xpath = new DOMXPath($dom);
$res = $xpath->query("//*[contains(#class, 'lineData')]");
Now you have the div as DOMElement. Saving should be possible with these few lines:
$html = $res->ownerDocument->saveHTML($res);
How to extract the full html content inside a div ? I tried this code,
$html= '<html>
<body>
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
</body>
</html>';
$attr = "id";
$value = "test";
$tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\\/div>/si';
preg_match($tag_regex,$html,$matches);
echo $matches[0];
By running this code I get the result,
<div id="test">
<div id="mydiv1">Hello</div>
Expected result,
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
In my code the regular expression execute till the first occurrence of </div> . How can I get the full code inside <div id="test"> ?
With DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementById('test');
$result = $dom->saveHTML($div);
I'm trying to implement a search filter on a Laravel project. As I'm currently storing all the content of a page together through a WYSIWYG editor, the problem I'm facing now is how to get only the "p" tags on the content of the pages I retrieve from the database. I already tried a few things but nothing is working. Can anyone point me in the right direction? Here is what I'm trying so far:
<div id="page-content" class="container">
#foreach($searchResults as $searchResult)
<div class="searchResult_div col-xs-12">
<div class="col-xs-9 col-sm-10 search-result-body">
<?php
$sr = HTML::decode($searchResult['body'], 450, "...");
$dom = new DOMDocument();
#$dom->loadHTML($sr);
$sr_text = $dom->getElementsByTagName("p");
echo $sr_text;
?>
</div>
</div>
#endforeach
</div>
Error: Object of class DOMNodeList could not be converted to string (View: /Users/ruirosa/Documents/AptanaStudio3Workspace/Marave4/app/views/search/search.blade.php)
It would be work:
<?php
$html = <<<HTML
<!DOCTYPE html>
<html>
<head></head>
<body>
<div id="container">
<p>text content</p>
<section>
<div>
<p>inner text</p>
</div>
</section>
</div>
</body>
</html>
HTML;
$dom = new DomDocument();
#$dom->loadHTML($html);
$para = $dom->getElementsByTagName('p'); #DOMNodeList
if ($para instanceof DOMNodeList) {
foreach ($para as $node) {
# $node is a DOMElement instance
printf ("Node name: %s\n", $node->nodeName);
printf ("Node value: %s\n", $node->nodeValue);
}
}
Output:
Also, as "$sr_text" is a iterable/traversable object you can't print out it. You must iterate it.
Read the docs:
The DOMNodeList class
Traversable Interface
Interface NodeList (from W3C WD-DOM-Level-3)
i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database
I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div
I have a $content with
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div class="myclass">
...
</div>
...
...
<div class="myclass">
...
</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
I would like to remove via PHP all the divs with class="myclass" except the first one, and add another div instead of others, so that the result is:
<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
...
</div>
<div>Check all divs here</div>
</div>
<div class="nav">
...
</div>
</div>
some other text here, <p></p> bla-bla-bla
Would be grateful if someone can point me a solution.
UDATE2:
some similar question here
from that I came up with the following test code:
$content = '<div class="slidesWrap">
<div class="slidesСontainer">
<div class="myclass">
</div>
<div class="myclass">
</div>
<div class="myclass">
</div>
</div>
<div class="nav">
</div>
</div>
some other text here, <p></p> bla-bla-bla';
$dom = new DOMDocument();
$dom->loadHtml($content);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//*[#class="myClass" and position()>1]') as $liNode) {
$liNode->parentNode->removeChild($liNode);
}
echo $dom->saveXml($dom->documentElement);
Any ideas where I can test it?
Here is what you are looking for (similar to your edit, but it removes the added html tags):
$doc = new DOMDocument();
$doc->loadHTML($content);
$xp = new DOMXpath($doc);
$elements = $xp->query("//div[#class='myclass']");
if($elements->length > 1)
{
$newElem = $doc->createElement("div");
$newElem->appendChild($doc->createTextNode("Check all divs "));
$newElemLink = $newElem->appendChild($doc->createElement("a"));
$newElemLink->setAttribute("href", "myurl");
$newElemLink->appendChild($doc->createTextNode("here"));
$elements->item(1)->parentNode->replaceChild($newElem, $elements->item(1));
for($i = $elements->length - 1; $i > 1 ; $i--)
{
$elements->item($i)->parentNode->removeChild($elements->item($i));
}
}
echo $doc->saveXML($doc->getElementsByTagName('div')->item(0));
$var = ':not(.myClass:eq(1))';
$var.removeClass("myClass");
$var.addClass("some_other_Class");
If I got you right, you've got a string called $content with all that content in it
It's not the best solution I guess but here is my attempt (which works fine for me):
if( substr_count($content, '<div class="myclass') > 1 ) {
$parts = explode('<div class="myclass',$content);
echo '<div class="myclass'.$parts[1];
echo '<div>Check all divs here</div>';
}
else {echo $content;}