Extracting href attribute and the value using php dom parser - php

From the given markup i have to extract the hyperlink and the ALL title of hyperlink
<span></span>
<span>Chapter1</span>
<span>Chapter2</span>
<span>Chapter3</span>
for this i've written follwing code but its not working
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
foreach ($tags as $tag) {
echo '\n'.$tag->nodeValue;
if($tag->hasChildNodes()) {
echo $tag->childNodes->getAttribute('href');
} else {
echo 'default.htm';
}
}
i am expecting this output:
Chapter1 default.htm
Chapter2 page2.htm
Chapter3 page3.htm
and so on

Could you please try this ?
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
for($i=0;$i<$tags->length;$i++){
echo $tags->item($i)->nodeValue;
if($tags->item($i)->hasChildNodes()) {
if($tags->item($i)->firstChild->nodeName=='a'){
echo " ".$tags->item($i)->firstChild->getAttribute('href').'<br/>';
}else{
echo " default.htm<br/>";
}
}
}

Related

How can I get the span tag value using php curl and simple html dom parser?the exact value does not show

code for this:
<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
}
output:{{totalResultCount | number}} but i want 304,575
You can use xpath selector
<?php
$html = '<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>';
$doc = new DOMDocument;
$doc->loadHTML($html);
$finder = new DomXPath($doc);
$classname="file-count-label";
$videos = $finder->query("//*[contains(#class, '$classname')]");
foreach($videos as $e)
{
echo $e->nodeValue;
}
Output:- https://eval.in/1056186
Reference taken:- https://stackoverflow.com/a/6366390/4248328
It works for me:
include 'simple_html_dom.php';
$html = str_get_html('<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>');
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
// 304,575
}

Get first li Simple DOM Parser

I just try to create small simplephpdome
target is
<ul id=filter><li><a href="url1"></li><li><a href="url2"></li></ul>
<ul id=filter><li><a href="url3"></li><li><a href="url4"></li></ul>
How to get just first li result for every ul?
I have try this
$html = file_get_html($url);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$first_list_links = $xpath->evaluate('//ul[#id="filter"]/li/a');
foreach($first_list_links as $links) {
echo $dom->saveHTML($links);
}
but all li still included
You can achieve this using the PHP Simple HTML DOM Parser :
PHP
$html = file_get_html('<ul class="filter"><li><a href="url1"></li><li><a href="url2"></li></ul><ul class="filter"><li><a href="url3"></li><li><a href="url4"></li></ul>');
$urls = [];
foreach($html->find('.filter') as $element) {
$url = $element->firstChild()->find('a', 0)->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
should output :
url1
url2

html DOM program to find href value

I am a newbie in php and I have been assigned with a project to fetch the HREF value from the following HTML snippet:
<p class="title">
<a href="http://canon.com/">Canon Pixma iP100 + Accu Kit
</a>
</p>
Now for this am using the following code:
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach($dom->getElementsByTagName('p') as $link) {
# Show the <a href>
foreach($link->getElementsByTagName('a') as $link)
{
echo $link->getAttribute('href');
echo "<br />";
}
}
This code gives me the HREF value of all <a href> from all the <P> tag in that page. I want to parse the <P> with the class "title" only...I can't use Simple_HTML_DOM or any kind of library here.
Thanks in advance.
Alternatively, you could use DOMXpath for this one. Like this:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
// target p tags with a class with "title" with an anchor tag
$target_element = $xpath->query('//p[#class="title"]/a');
if($target_element->length > 0) {
foreach($target_element as $link) {
echo $link->getAttribute('href'); // http://canon.com/
}
}
Or If if you want to traverse it. Then you need to have to search it manually.
foreach($dom->getElementsByTagName('p') as $p) {
// if p tag has a "title" class
if($p->getAttribute('class') == 'title') {
foreach($p->childNodes as $child) {
// if has an anchor children
if($child->tagName == 'a' && $child->hasAttribute('href')) {
echo $child->getAttribute('href'); // http://cannon.com
}
}
}
}

Simple HTML DOM Not Finding DIV

I have code trying to extract the Event SKU from the Robot Events Page, here is an example. The code that I am using dosn't find any of the SKU on the page. The SKU is on line 411, with a div of the class "product-sku". My code doesn't event find the Div on the page and just downloads all the events. Here is my code:
<?php
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = file_get_html($event[4]);
$html->load($htmldown);
echo "Downloaded";
foreach ($html->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
?>
Can anyone help me fix my code?
This code is used DOMDocument php class. It works successfully for below sample HTML. Please try this code.
// new dom object
$dom = new DOMDocument();
// HTML string
$html_string = '<html>
<body>
<div class="product-sku1" name="div_name">The this the div content product-sku</div>
<div class="product-sku2" name="div_name">The this the div content product-sku</div>
<div class="product-sku" name="div_name">The this the div content product-sku</div>
</body>
</html>';
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//the table by its tag name
$divs = $dom->getElementsByTagName('div');
// loop over the all DIVs
foreach ($divs as $div) {
if ($div->hasAttributes()) {
foreach ($div->attributes as $attribute){
if($attribute->name === 'class' && $attribute->value == 'product-sku'){
// Peri DIV class name and content
echo 'DIV Class Name: '.$attribute->value.PHP_EOL;
echo 'DIV Content: '.$div->nodeValue.PHP_EOL;
}
}
}
}
I would use a regex (regular expression) to accomplish pulling skus out.
The regex:
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
See php regex docs.
New code:
<?php
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = curl_init($event[4]);
curl_setopt($htmldown, CURLOPT_RETURNTRANSFER, true);
$html=curl_exec($htmldown);
curl_close($htmldown)
echo "Downloaded";
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
foreach ($matches as $row) {
echo $row;
}
}
?>
And actually in this case (using that webpage) being that there is only one sku...
instead of:
foreach ($matches as $row) {
echo $row;
}
You could just use: echo $matches[1]; (The reason for array index 1 is because the whole regex pattern plus the sku will be in $matches[0] but just the subgroup containing the sku is in $matches[1].)
try to use
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = str_get_html($event[4]);
echo "Downloaded";
foreach ($htmldown->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
and if class "product-sku" is only for div's then you can use
$htmldown->find('.product-sku')

innerHTML of each link not working

I have the following code
function DOMinnerHTML($element)
{
$innerHTML="";
$children=$element->childNodes;
foreach($children as $child)
{
$tmp_dom=new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child,true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
$doc=new DOMDocument();
$doc->loadHtmlFile('http://www.google.com/');
$links=$doc->getElementsByTagName('a');
foreach($links as $m)
{
echo DOMinnerHTML($links[$m]).'<br />';
}
And it outputs nothing.
How can I do so that it outputs the content of each link in all http://google.com ?
it seems this:
echo DOMinnerHTML($links[$m]).'<br />';
should be just this:
echo DOMinnerHTML($m).'<br />';

Categories