Zend_Dom_Query multiple selects - php

I'm trying to use Zend_Dom_Query to get some specific content from a webpage.
I got a query working to get the content from one dom-element. Now i want to select a second dom-element to get this content also.
This is the html:
<div class="blocks">
<div class="w2">
<h2>Some title</h2>
<p>some text</p>
<p>more text</p>
<p class="more-info">link</p>
</div>
<div class="w2">
<h2>Some title</h2>
<p>some text</p>
<p>more text</p>
<p class="more-info">link</p>
</div>
</div>
My code so far:
$client = new Zend_Http_Client();
$client->setUri('http://awsomewebsite');
$result = $client->request('GET');
$response = $result->getBody();
$dom = new Zend_Dom_Query($response);
foreach ($dom->query('div.w2') as $content) {
echo $content->getElementsByTagName('h2')->item(0)->nodeValue; // this gives me the h2 value
echo $content->getElementsByTagName('a')->item(0)->getAttribute('href');
}
Now the problem is when there are more anchor links this solution isn't working. My question is: What is the correct way to select multiple elements? Or can i use a new query inside this foreach to select the right element?

$i=0;
foreach ($dom->query('div.w2') as $content) {
echo $content->getElementsByTagName('h2')->item($i)->nodeValue; // this gives me the h2 value
echo $content->getElementsByTagName('a')->item($i)->getAttribute('href');
$i++;
}

Related

How to use Simple HTML DOM PHP to get span data-reactid value?

Neither of these work:
$html = file_get_html("https://www.example.com/page/");
print($html->find('[data-reactid=10]', 0)->plaintext);
print($html->find('[data-reactid=11]', 0)->plaintext);
where the html looks like this:
<div class="stuff" data-reactid="10">
<span data-reactid="11">Value I want</span>
</div>
what am I doing wrong?
FYI. this does work:
print($html->find('[data-reactid=5]', 0)->plaintext);`
where:
<div class"stuff" data-reactid="5">
<!-- react-text: 6 -->
Value I want
<!-- /react-text: -->
</div>
So how do I get the value with the span?
I can get the value with the div.
This works.
$html_str = '
<div class="stuff" data-reactid="10">
<span data-reactid="11">Value I want</span>
</div>
';
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($html_str);
// Get the value
echo $html->find('div[data-reactid=10]', 0)->find('span', 0)->{'data-reactid'};

How to replace custom html tag with html code in php

This is my scenario: In a custom CMS developed in PHP, I need to parse HTML string searching some custom tags to replace them with some HTML code. Here is an example for clarifying:
<h2>Some Title</h2>
<p>Some text</p>
[[prod_id=123]] [[prod_id=165]] // custom tag
<p>More text</p>
I need to find the custom tags and replace them with the template of the item, as a result:
<h2>Some Title</h2>
<p>Some text</p>
<!--Start Product123-->
<p>Title Product 123</p>
<!--End Product123-->
<!--Start Product165-->
<p>Title Product 165</p>
<!--End Product165-->
<p>More text</p>
This would be very helpful, but I need to do something else, I need to detect blocks of tags and add some code before - after the tags, but only once per block of tags. In this example, the final code needed would be something like:
<h2>Some Title</h2>
<p>Some text</p>
<div><!-- Here the start of the block -->
<!--Start Product123-->
<p>Title Product 123</p>
<!--End Product123-->
<!--Start Product165-->
<p>Title Product 165</p>
<!--End Product165-->
</div><!-- Here the end of the block -->
<p>More text</p>
The perfect solution for me would be a function with the original HTML code as argument, and returning the final html code. Any help is appreciated.
I will advise you to not use Regex along with HTML, this can result in a lot of problems. Instead do something like where you store the text/content of articles and then only process that.
But for the sake of completeness, you can use something like this:
$html = preg_replace_callback("/\[\[prod_id=(\d+)\]\]/",
function($matches)
{
$prod_id = $matches[1];
return '<p>Title Product ' . $prod_id . '</p>';
},
$html); // where $html is the html you want to process
If you don't "have" the HTML, then you can use ob_start() and ob_get_clean().
ob_start();
?>
<h2>Some Title</h2>
<p>Some text</p>
[[prod_id=123]] [[prod_id=165]] // custom tag
<p>More text</p>
<?php
$html = ob_get_clean();
// do the regex_replace_callback here
I haven't tested this, just did it on top of my head. So there might be some typos!

PHP simple HTML Dom select inner text only not child inne rtext

I'm using PHP simple HTML DOM class to parse html.
I want to select div with id="content" inner text, but when I call $selecrot->plaintext, it also return sub div text
Sample HTML
<div id="content">
Hello World.
<div id="sub-content1">
Text I don't want to select.
</div>
<div id="sub-content2">
Text I don't want to select
</div>
</div>
Sample code
//suppose $html contains above html
$selector = $html->find("div#content", 0);
echo $selector->innertext;
//it outputs "Hello World. Text I don't want to select. Text I don't want to select"
//but
I want only "Hello World"
include_once('simple_html_dom.php');
$html = new simple_html_dom();
$text = '<div id="content">
Hello World.
<div id="sub-content1">
Text I don\'t want to select.
</div>
<div id="sub-content2">
Text I don\'t want to select
</div>
</div>';
$html->load($text);
$selector =$html->find("div#content",0)->find("*");
foreach($selector as $node){
$node->outertext = '';
}
$html->load($html->save());
$selector =$html->find("div#content",0);
echo $selector->innertext;

get complete 'div' content using class name or id using php

i got a page source from a file using php and its output is similar to
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>
from this i need to got only a particular 'div' with whole div and contents inside like below when i give input as 'under'(class name) . anybody suggest me how to do this one using php
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Try this:
$html = <<<HTML
<div class="basic">
<div class="math">
<div class="winner">
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
</div>
</div>
</div>;
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="under"]');
$div = $div->item(0);
echo $dom->saveXML($div);
This will output:
<div class="under">
<div class="checker">
<strong>check</strong>
</div>
</div>
Function to extract the contents from a specific div id from any webpage
The below function extracts the contents from the specified div and returns it. If no divs with the ID are found, it returns false.
function getHTMLByID($id, $html) {
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$node = $dom->getElementById($id);
if ($node) {
return $dom->saveXML($node);
}
return FALSE;
}
$id is the ID of the <div> whose content you're trying to extract, $html is your HTML markup.
Usage example:
$html = file_get_contents('http://www.mysql.com/');
echo getHTMLByID('tagline', $html);
Output:
The world's most popular open source database
I'm not sure what you asking but this might be it
preg_match_all("<div class='under'>(.*?)</div>", $htmlsource, $output);
$output should now contain the inner content of that div

How to parse HTML with nested tags using Simple DOM Parser?

I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:
<div class="doc-overview">
<h2>Description</h2>
<div id="doc-description-container" class="" style="max-height: 605px;">
<div class="doc-description toggle-overflow-contents" data-collapsed-height="200">
<div id="doc-original-text">
Content of the div without paragraph tags.
<p>Content from the first paragraph </p>
<p>Content from the second paragraph</p>
<p>Content from the third paragraph</p>
</div>
</div>
<div class="doc-description-overflow"></div>
</div>
I tried this:
foreach($html->find('div[id=doc-original-text]') as $div) {
echo $div->innertext;
}
You notice that I directly find the doc-original-text but I also tried to parse from outer divs to inner divs.
Try This,
foreach($html->find('div#doc-original-text') as $div) {
echo $div->innertext;
}

Categories