PHP - search for value in file and echo the whole <div> - php

I have an external file with lots of informations e.g
http://domain.com/thefile.html
Each Data in the file is wrapped into a <div> element:
....
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
<div class="lineData">
<div class="lineLData">xbox one</div>
<div class="lineRData">not awesome</div>
</div>
<div class="lineData">
<div class="lineLData">wii u</div>
<div class="lineRData">mhhhh</div>
</div>
....
Now I want to search the whole file for the Keyword "Playstation" and echo the whole <div>:
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
Is this possible with PHP ?

If we assume the resource / URL is $url :
$result = array();
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents($url));
find all <div>'s with the class lineData using DomXPath :
$xpath = new DomXPath($dom);
$lineDatas = $xpath->query('//div[contains(#class,"lineData")]');
add all lineData <div>'s containing "playstation" to the $result array :
foreach($lineDatas as $lineData) {
if (strpos(strtolower($lineData->nodeValue), 'playstation') !== false) {
$result[] = $lineData;
}
}
example of outputting the result
foreach($result as $lineData) {
echo $dom->saveHTML($lineData);
}
outputs
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
when tested on the example HTML in OP.

Use DOMDocument for this purpose.
$dom = new DOMDocument;
$dom->loadHTMLFile("file.html");
Now you can search for the div:
$xpath = new DOMXPath($dom);
$res = $xpath->query("//*[contains(#class, 'lineData')]");
Now you have the div as DOMElement. Saving should be possible with these few lines:
$html = $res->ownerDocument->saveHTML($res);

Related

Regular expression to extract full content inside a div

How to extract the full html content inside a div ? I tried this code,
$html= '<html>
<body>
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
</body>
</html>';
$attr = "id";
$value = "test";
$tag_regex = '/<div[^>]*'.$attr.'="'.$value.'">(.*?)<\\/div>/si';
preg_match($tag_regex,$html,$matches);
echo $matches[0];
By running this code I get the result,
<div id="test">
<div id="mydiv1">Hello</div>
Expected result,
<div id="test">
<div id="mydiv1">Hello</div>
<div id="mydiv2">How are you</div>
</div>
In my code the regular expression execute till the first occurrence of </div> . How can I get the full code inside <div id="test"> ?
With DOMDocument:
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementById('test');
$result = $dom->saveHTML($div);

DOMXPath / DOMDocument - Getting divs within a comment block

Lets say I have this comment block containing HTML:
<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
<code>
<!-- test -->
</code>
</body>
</html>
Using DOMXPath for PHP, how do I get the links and text within the tag?
This is what I have so far:
$dom = new DOMDocument();
$dom->loadHTML("HTML STRING"); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$divs = $dom->getElementsByTagName('div')->item(0);
$entries = $xpath->query($query, $divs);
foreach($entries as $entry) {
# shows entire text block
echo $entry->textContent;
}
How do I navigate so that I can get the "c" classes and then put the links into an array?
EDIT Please note that there are multiple <code> tags within the page, so I can't just get an element with the code attribute.
You already can target the comment containing the links, just follow thru that and make another query inside it. Example:
$sample_markup = '<html>
<body>
<code class="hidden">
<!--
<div class="a">
<div class="b">
<div class="c">
Link Test 1
</div>
<div class="c">
Link Test 2
</div>
<div class="c">
Link Test 3
</div>
</div>
</div>
-->
</code>
</body>
</html>';
$dom = new DOMDocument();
$dom->loadHTML($sample_markup); # not actually in code
$xpath = new DOMXPath($dom);
$query = '/html/body/code/comment()';
$entries = $xpath->query($query);
foreach ($entries as $key => $comment) {
$value = $comment->nodeValue;
$html_comment = new DOMDocument();
$html_comment->loadHTML($value);
$xpath_sub = new DOMXpath($html_comment);
$links = $xpath_sub->query('//div[#class="c"]/a'); // target the links!
// loop each link, do what you have to do
foreach($links as $link) {
echo $link->getAttribute('href') . '<br/>';
}
}

Setting a nodeValue of a DOMelement : getElementbyId returns null

When running this php script :
$doc = new DOMDocument();
$doc->loadHTMLFile("../counter.html");
$ele2 = $doc->getElementById ( "coupon_id" );
if($ele2){
$ele2->nodeValue = $result["coupon_code"];
}
$response["list"]= $doc->saveHTML();
$ele2 is found to be null an so it does not enter to the if condition, here is my counter.html file :
<div class="panel panel-success">
<div class="panel-heading">
<h3 id="coupon" class="panel-title">Coupon name 1</h3>
</div>
<p id="coupon_id" hidden>coupon id</p>
<div id="counter-up" class="panel-body">
0
</div>
</div>
I already made sure the html file was loaded successfully
Your $doc->getElementById is returning null value. You have to find out why?
using Xpath you can achieve this
<?php
$xml = '<div class="panel panel-success">
<div class="panel-heading">
<h3 id="coupon" class="panel-title">Coupon name 1</h3>
</div>
<p id="coupon_id">coupon id</p>
<div id="counter-up" class="panel-body">
0
</div>
</div>';
//create dom object
$doc = new DOMDocument();
//load xml string
$doc->loadHTML($xml);
$xpath = new DOMXPath($doc);
$result = $xpath->query("//*[#id='coupon_id']")->item(0);
$result->nodeValue = 'hello world';
echo $doc->saveHTML();
There is an alternate two solutions are listed here this may be useful for you
<?php
$xml = '<div class="panel panel-success">
<div class="panel-heading">
<h3 id="coupon" class="panel-title">Coupon name 1</h3>
</div>
<p id="coupon_id">coupon id</p>
<div id="counter-up" class="panel-body">
0
</div>
</div>';
//create dom object
$doc = new DOMDocument();
//load xml string
$doc->loadHTML($xml);
//create element objects
$ele2 = $doc->getElementsByTagName("p");
//process each object element
foreach($ele2 as $obj)
{
//change thenode value
$obj->nodeValue = 'hello';
}
//display the html
echo $doc->saveHTML();
?>
Using simple XML you can achieve this like below
<?php
//create object from the string
$simplxml = simplexml_load_string($xml);
//overwrite the first p tag value
$simplxml->p = 'hello world';
//display the xml
echo $simplxml->asXML();
//check you object details in debug function
print_r($simplxml);

Retrieve elements with xpath and DOMDocument

I have a list of ads in the html code below.
What I need is a PHP loop to get the folowing elements for each ad:
ad URL (href attribute of <a> tag)
ad image URL (src attribute of <img> tag)
ad title (html content of <div class="title"> tag)
<div class="ads">
<a href="http://path/to/ad/1">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/1/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #1</div>
</div>
</div>
</a>
<a href="http://path/to/ad/2">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/2/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #2</div>
</div>
</div>
</a>
</div>
I managed to get the ad URL with the PHP code below.
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
}
But I didn't manage to get the 2 other elements (image url and title). Any idea?
I managed to get what I need with this code (based on Khue Vu's code) :
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
// get ad url
$ad_url = $ad->getAttribute('href');
// set current ad object as new DOMDocument object so we can parse it
$ad_Doc = new DOMDocument();
$cloned = $ad->cloneNode(TRUE);
$ad_Doc->appendChild($ad_Doc->importNode($cloned, True));
$xpath = new DOMXPath($ad_Doc);
// get ad title
$ad_title_tag = $xpath->query("//div[#class='title']");
$ad_title = trim($ad_title_tag->item(0)->nodeValue);
// get ad image
$ad_image_tag = $xpath->query("//img/#src");
$ad_image = $ad_image_tag->item(0)->nodeValue;
}
for other elements, you just do the same:
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
$ad_Doc = new DOMDocument();
$ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad));
$xpath = new DOMXPath($ad_Doc);
$img_src = $xpath->query("//img[#src]");
$title = $xpath->query("//div[#class='title']");
}

get complete 'div' content using class name or id using php

i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database
I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div

Categories