Extracting href attribute and the value using php dom parser

Extracting href attribute and the value using php dom parser - php

From the given markup i have to extract the hyperlink and the ALL title of hyperlink
<span></span>
<span>Chapter1</span>
<span>Chapter2</span>
<span>Chapter3</span>
for this i've written follwing code but its not working
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
foreach ($tags as $tag) {
echo '\n'.$tag->nodeValue;
if($tag->hasChildNodes()) {
echo $tag->childNodes->getAttribute('href');
} else {
echo 'default.htm';
}
}
i am expecting this output:
Chapter1 default.htm
Chapter2 page2.htm
Chapter3 page3.htm
and so on

Could you please try this ?
$doc = new DOMDocument();
$doc->loadHTML($page_links);
$tags = $doc->getElementsByTagName('span');
for($i=0;$i<$tags->length;$i++){
echo $tags->item($i)->nodeValue;
if($tags->item($i)->hasChildNodes()) {
if($tags->item($i)->firstChild->nodeName=='a'){
echo " ".$tags->item($i)->firstChild->getAttribute('href').'<br/>';
}else{
echo " default.htm<br/>";
}
}
}

Related

How can I get the span tag value using php curl and simple html dom parser?the exact value does not show

code for this:
<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
}
output:{{totalResultCount | number}} but i want 304,575

You can use xpath selector
<?php
$html = '<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>';
$doc = new DOMDocument;
$doc->loadHTML($html);
$finder = new DomXPath($doc);
$classname="file-count-label";
$videos = $finder->query("//*[contains(#class, '$classname')]");
foreach($videos as $e)
{
echo $e->nodeValue;
}
Output:- https://eval.in/1056186
Reference taken:- https://stackoverflow.com/a/6366390/4248328

It works for me:
include 'simple_html_dom.php';
$html = str_get_html('<span class="file-count-label" ng-init="totalResultCount=304575" ng-show="totalResultCount" style="">304,575</span>');
$videos=$html->find('span[class=file-count-label]');
foreach($videos as $e)
{
echo $e->plaintext;
// 304,575
}

Get first li Simple DOM Parser

I just try to create small simplephpdome
target is
<ul id=filter><li><a href="url1"></li><li><a href="url2"></li></ul>
<ul id=filter><li><a href="url3"></li><li><a href="url4"></li></ul>
How to get just first li result for every ul?
I have try this
$html = file_get_html($url);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$first_list_links = $xpath->evaluate('//ul[#id="filter"]/li/a');
foreach($first_list_links as $links) {
echo $dom->saveHTML($links);
}
but all li still included

You can achieve this using the PHP Simple HTML DOM Parser :
PHP
$html = file_get_html('<ul class="filter"><li><a href="url1"></li><li><a href="url2"></li></ul><ul class="filter"><li><a href="url3"></li><li><a href="url4"></li></ul>');
$urls = [];
foreach($html->find('.filter') as $element) {
$url = $element->firstChild()->find('a', 0)->href;
if (!in_array($url, $urls)) {
echo $url . "<br/>";
$urls[] = $url;
}
}
should output :
url1
url2

html DOM program to find href value

I am a newbie in php and I have been assigned with a project to fetch the HREF value from the following HTML snippet:
<p class="title">
<a href="http://canon.com/">Canon Pixma iP100 + Accu Kit
</a>
</p>
Now for this am using the following code:
$dom = new DOMDocument();
#$dom->loadHTML($html);
foreach($dom->getElementsByTagName('p') as $link) {
# Show the <a href>
foreach($link->getElementsByTagName('a') as $link)
{
echo $link->getAttribute('href');
echo "<br />";
}
}
This code gives me the HREF value of all <a href> from all the <P> tag in that page. I want to parse the <P> with the class "title" only...I can't use Simple_HTML_DOM or any kind of library here.
Thanks in advance.

Alternatively, you could use DOMXpath for this one. Like this:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
// target p tags with a class with "title" with an anchor tag
$target_element = $xpath->query('//p[#class="title"]/a');
if($target_element->length > 0) {
foreach($target_element as $link) {
echo $link->getAttribute('href'); // http://canon.com/
}
}
Or If if you want to traverse it. Then you need to have to search it manually.
foreach($dom->getElementsByTagName('p') as $p) {
// if p tag has a "title" class
if($p->getAttribute('class') == 'title') {
foreach($p->childNodes as $child) {
// if has an anchor children
if($child->tagName == 'a' && $child->hasAttribute('href')) {
echo $child->getAttribute('href'); // http://cannon.com
}
}
}
}

Simple HTML DOM Not Finding DIV

I have code trying to extract the Event SKU from the Robot Events Page, here is an example. The code that I am using dosn't find any of the SKU on the page. The SKU is on line 411, with a div of the class "product-sku". My code doesn't event find the Div on the page and just downloads all the events. Here is my code:
<?php
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = file_get_html($event[4]);
$html->load($htmldown);
echo "Downloaded";
foreach ($html->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
?>
Can anyone help me fix my code?

This code is used DOMDocument php class. It works successfully for below sample HTML. Please try this code.
// new dom object
$dom = new DOMDocument();
// HTML string
$html_string = '<html>
<body>
<div class="product-sku1" name="div_name">The this the div content product-sku</div>
<div class="product-sku2" name="div_name">The this the div content product-sku</div>
<div class="product-sku" name="div_name">The this the div content product-sku</div>
</body>
</html>';
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//the table by its tag name
$divs = $dom->getElementsByTagName('div');
// loop over the all DIVs
foreach ($divs as $div) {
if ($div->hasAttributes()) {
foreach ($div->attributes as $attribute){
if($attribute->name === 'class' && $attribute->value == 'product-sku'){
// Peri DIV class name and content
echo 'DIV Class Name: '.$attribute->value.PHP_EOL;
echo 'DIV Content: '.$div->nodeValue.PHP_EOL;
}
}
}
}

I would use a regex (regular expression) to accomplish pulling skus out.
The regex:
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
See php regex docs.
New code:
<?php
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = curl_init($event[4]);
curl_setopt($htmldown, CURLOPT_RETURNTRANSFER, true);
$html=curl_exec($htmldown);
curl_close($htmldown)
echo "Downloaded";
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
foreach ($matches as $row) {
echo $row;
}
}
?>
And actually in this case (using that webpage) being that there is only one sku...
instead of:
foreach ($matches as $row) {
echo $row;
}
You could just use: echo $matches[1]; (The reason for array index 1 is because the whole regex pattern plus the sku will be in $matches[0] but just the subgroup containing the sku is in $matches[1].)

try to use
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = str_get_html($event[4]);
echo "Downloaded";
foreach ($htmldown->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
and if class "product-sku" is only for div's then you can use
$htmldown->find('.product-sku')

innerHTML of each link not working

I have the following code
function DOMinnerHTML($element)
{
$innerHTML="";
$children=$element->childNodes;
foreach($children as $child)
{
$tmp_dom=new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child,true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
$doc=new DOMDocument();
$doc->loadHtmlFile('http://www.google.com/');
$links=$doc->getElementsByTagName('a');
foreach($links as $m)
{
echo DOMinnerHTML($links[$m]).'<br />';
}
And it outputs nothing.
How can I do so that it outputs the content of each link in all http://google.com ?

it seems this:
echo DOMinnerHTML($links[$m]).'<br />';
should be just this:
echo DOMinnerHTML($m).'<br />';

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extracting href attribute and the value using php dom parser - php

Related

How can I get the span tag value using php curl and simple html dom parser?the exact value does not show

Get first li Simple DOM Parser

html DOM program to find href value

Simple HTML DOM Not Finding DIV

innerHTML of each link not working

Categories

Resources