I want to get a specific tag from url, from example:
If I have this content:
<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>
And I want to get all divs with the id "hey", ( i think its with preg_match_all ), How can I do that?
The content inside the tag can be changed.
I recommend use DOMDocument class instead of regular expressions (is less resource consumer and more clear IMHO).
$content = '<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>';
$doc = new DOMDocument();
#$doc->loadHTML($content); // # for possible not standard HTML
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//div[#id='hey']");
/*#var $elements DOMNodeList */
for ($i=0;$i<$elements->length;$i++) {
/*#var $curr_element DOMElement */
$curr_element = $elements->item($i);
// Here do what you want with the element
var_dump($curr_element);
}
If you want to get the content from an URL you can use this line instead to fill the variable $content:
$content = file_get_contents('http://yourserver/urls/page.php');
Related
I want to replace string from specific classes from HTML.
In HTML there is other content which I don't want to change.
In below code want to change data on class one and three only, class two content should be as it is.
I need to this in dynamic way.
<div class="one"> I want to change this </div>
<div class="two"> I don't want to change this </div>
<div class="three"> I want to change this </div>
Dom functions are helpful
php manual
//your html file content
$str = '...<div class="one"> I want to change this </div>
<div class="two"> I don\'t want to change this </div>
<div class="three"> I want to change this </div>... ';
$dom = new DOMDocument();
$dom->loadHtml($str);
$domXpath = new DOMXPath($dom);
//query the nodes matched
$list = $domXpath->query('//div[#class!="two"]');
if ($list->length > 0) {
foreach ($list as $node) {
//change node value
$node->nodeValue = 'Content changed!';
}
}
//get the result
$new_str = $dom->saveHTML();
var_dump($new_str);
Neither of these work:
$html = file_get_html("https://www.example.com/page/");
print($html->find('[data-reactid=10]', 0)->plaintext);
print($html->find('[data-reactid=11]', 0)->plaintext);
where the html looks like this:
<div class="stuff" data-reactid="10">
<span data-reactid="11">Value I want</span>
</div>
what am I doing wrong?
FYI. this does work:
print($html->find('[data-reactid=5]', 0)->plaintext);`
where:
<div class"stuff" data-reactid="5">
<!-- react-text: 6 -->
Value I want
<!-- /react-text: -->
</div>
So how do I get the value with the span?
I can get the value with the div.
This works.
$html_str = '
<div class="stuff" data-reactid="10">
<span data-reactid="11">Value I want</span>
</div>
';
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($html_str);
// Get the value
echo $html->find('div[data-reactid=10]', 0)->find('span', 0)->{'data-reactid'};
I have an external file with lots of informations e.g
http://domain.com/thefile.html
Each Data in the file is wrapped into a <div> element:
....
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
<div class="lineData">
<div class="lineLData">xbox one</div>
<div class="lineRData">not awesome</div>
</div>
<div class="lineData">
<div class="lineLData">wii u</div>
<div class="lineRData">mhhhh</div>
</div>
....
Now I want to search the whole file for the Keyword "Playstation" and echo the whole <div>:
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
Is this possible with PHP ?
If we assume the resource / URL is $url :
$result = array();
$dom = new DOMDocument;
$dom->loadHTML(file_get_contents($url));
find all <div>'s with the class lineData using DomXPath :
$xpath = new DomXPath($dom);
$lineDatas = $xpath->query('//div[contains(#class,"lineData")]');
add all lineData <div>'s containing "playstation" to the $result array :
foreach($lineDatas as $lineData) {
if (strpos(strtolower($lineData->nodeValue), 'playstation') !== false) {
$result[] = $lineData;
}
}
example of outputting the result
foreach($result as $lineData) {
echo $dom->saveHTML($lineData);
}
outputs
<div class="lineData">
<div class="lineLData">Playstation</div>
<div class="lineRData">awesome</div>
</div>
when tested on the example HTML in OP.
Use DOMDocument for this purpose.
$dom = new DOMDocument;
$dom->loadHTMLFile("file.html");
Now you can search for the div:
$xpath = new DOMXPath($dom);
$res = $xpath->query("//*[contains(#class, 'lineData')]");
Now you have the div as DOMElement. Saving should be possible with these few lines:
$html = $res->ownerDocument->saveHTML($res);
I have a list of ads in the html code below.
What I need is a PHP loop to get the folowing elements for each ad:
ad URL (href attribute of <a> tag)
ad image URL (src attribute of <img> tag)
ad title (html content of <div class="title"> tag)
<div class="ads">
<a href="http://path/to/ad/1">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/1/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #1</div>
</div>
</div>
</a>
<a href="http://path/to/ad/2">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/2/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #2</div>
</div>
</div>
</a>
</div>
I managed to get the ad URL with the PHP code below.
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
}
But I didn't manage to get the 2 other elements (image url and title). Any idea?
I managed to get what I need with this code (based on Khue Vu's code) :
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
// get ad url
$ad_url = $ad->getAttribute('href');
// set current ad object as new DOMDocument object so we can parse it
$ad_Doc = new DOMDocument();
$cloned = $ad->cloneNode(TRUE);
$ad_Doc->appendChild($ad_Doc->importNode($cloned, True));
$xpath = new DOMXPath($ad_Doc);
// get ad title
$ad_title_tag = $xpath->query("//div[#class='title']");
$ad_title = trim($ad_title_tag->item(0)->nodeValue);
// get ad image
$ad_image_tag = $xpath->query("//img/#src");
$ad_image = $ad_image_tag->item(0)->nodeValue;
}
for other elements, you just do the same:
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
$ad_Doc = new DOMDocument();
$ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad));
$xpath = new DOMXPath($ad_Doc);
$img_src = $xpath->query("//img[#src]");
$title = $xpath->query("//div[#class='title']");
}
i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database
I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div