Extract html from url with DOM

Extract html from url with DOM - php

I've already search about this but most of the topics used java language, but i need using DOM in PHP. I wanna extract this element from example.com :
<div id="download" class="large-12 medium-12 columns hide-for-small-only">
<a href="javascript:void(0)" link="https://mediamusic.com/media/mp3/mp3-256/Mas.mp3" target="_blank" class="mp3_download_link">
<i class="fa fa-cloud-download">Download Now</i>
</a>
</div>
How can i get mp3_download_link class from this code using DOM in PHP! as i said i have already search about this but really i confused...

You can use library to parsing DOM. For example: https://github.com/tburry/pquery
Usage:
$dom = pQuery::parseStr($html);
$class = $dom->query('#download a')->attr('class');

You can try file_get_html to parse html
$html=file_get_html('http://demo.com');
and use the below to get all the attributes of anchor tag.
foreach($html->find('div[id=download] a') as $a){
var_dump($a->attr);
}

Let's assume you have this DOM as a string. Then you may use built-in DOM extension to get link you need. Here is the example of a code:
$domstring = '<div id="download" class="large-12 medium-12 columns hide-for-small-only">
<a href="javascript:void(0)" link="https://mediamusic.com/media/mp3/mp3-256/Mas.mp3" target="_blank" class="mp3_download_link">
<i class="fa fa-cloud-download">Download Now</i>
</a>
</div>';
$links = array();
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($domstring);//here $domstring is a string containing html you posted in your question
$node_list = $dom->getElementsByTagName('a');
foreach ($node_list as $node) {
$links[] = $node->getAttribute('link');
}
print_r(array_shift($links));

Related

Remove span tag from element html dom parser

I have code like this, and it's fetching data from other website.
require('simple_html_dom.php');
$html = file_get_html("www.example.com");
$info['diesel'] = $html->find(".on .price",0)->innertext;
$info['pb95'] = $html->find(".pb .price",0)->innertext;
$info['lpg'] = $html->find(".lpg .price",0)->innertext;
The html code from other website looks:
<a href="#" class="station-detail-wrapper on text-center active">
<h3 class="fuel-header">ON</h3>
<div class="price">
5,97
<span>zł</span>
</div>
</a>
So if i use echo $info['diesel'] it shows me 5,97 zł. I would like to delete this <span>zł</span> to show price only.

May be you can replace that span tag with blank:
echo $info['diesel']=str_replace("<span>zł</span>","",$info['diesel']);

Replace class content using php

I want to replace string from specific classes from HTML.
In HTML there is other content which I don't want to change.
In below code want to change data on class one and three only, class two content should be as it is.
I need to this in dynamic way.
<div class="one"> I want to change this </div>
<div class="two"> I don't want to change this </div>
<div class="three"> I want to change this </div>

Dom functions are helpful
php manual
//your html file content
$str = '...<div class="one"> I want to change this </div>
<div class="two"> I don\'t want to change this </div>
<div class="three"> I want to change this </div>... ';
$dom = new DOMDocument();
$dom->loadHtml($str);
$domXpath = new DOMXPath($dom);
//query the nodes matched
$list = $domXpath->query('//div[#class!="two"]');
if ($list->length > 0) {
foreach ($list as $node) {
//change node value
$node->nodeValue = 'Content changed!';
}
}
//get the result
$new_str = $dom->saveHTML();
var_dump($new_str);

how to remove link from simple dom html data

I have this code, i get the info but with this i get the data + the link for example
require_once('simple_html_dom.php');
set_time_limit (0);
$html ='www.domain.com';
$html = file_get_html($url);
// i read the first div
foreach($html->find('#content') as $element){
// i read the second
foreach ($element->find('p') as $phone){
echo $phone;
Mobile Pixel 2 -
google << there the link
But i need remove these link, the problem is the next, i scrape this:
<p>the info that i really need is here<p>
<p class="text-right"><a class="btn btn-default espbott aplus" role="button"
href="brand/google.html">Google</a></p>
I read this:
Simple HTML Dom: How to remove elements?
But i cant find the answer
update: if i use this:
foreach ($element->find('p[class="text-right"]');
It will select the links but can't remove scrapped data

You can use file_get_content with str_get_html and replace it :
include 'simple_html_dom.php';
$content=file_get_contents($url);
$html = str_get_html($content);
// i read the first div
foreach($html->find('#content') as $element){
// i read the second
foreach ($element->find('p[class="text-right"]') as $phone){
$content=str_replace($phone,'',$content);
}
}
print $content;
die;

Or here a native version:
PHP-CODE
$sHtml = '<p>the info that i really need is here<p>
<p class="text-right"><a class="btn btn-default espbott aplus" role="button"
href="brand/google.html">Google</a></p>';
$sHtml = '<div id="wrapper">' . $sHtml . '</div>';
echo "org:\n";
echo $sHtml;
echo "\n\n";
$doc = new DOMDocument();
$doc->loadHtml($sHtml);
foreach( $doc->getElementsByTagName( 'a' ) as $element ) {
$element->parentNode->removeChild( $element );
}
echo "res:\n";
echo $doc->saveHTML($doc->getElementById('wrapper'));
Output
org:
<div id="wrapper"><p>the info that i really need is here<p>
<p class="text-right"><a class="btn btn-default espbott aplus" role="button"
href="brand/google.html">Google</a></p></div>
res:
<div id="wrapper">
<p>the info that i really need is here</p>
<p>
</p>
<p class="text-right"></p>
</div>
https://3v4l.org/RhuEU

Fetching Image from particular div Only via DOMDocument in PHP

I have website, where i have posted few images inside particular div :-
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
And from my 2nd website, i want to fetch all images on that particular div.. I have below code.
<?php
$htmlget = new DOMDocument();
#$htmlget->loadHtmlFile('http://www.example.com');
$xpath = new DOMXPath( $htmlget);
$nodelist = $xpath->query( "//img/#src" );
foreach ($nodelist as $images){
$value = $images->nodeValue;
echo "<img src='".$value."' /><br />";
}
?>
But this is fetching all images from my website and not just particular div. It also prints out my RSS image, Social icon image, etc.,
Can i specify particular div in my php code, so that it only fetch image from div.posts class.

first give a "id" for the outer div container. Then get it by its id. Then get its child image nodes.
an example:
$tables = $dom->getElementsById('node_id');
$table = $tables->item(1);
//get the number of rows in the 2nd table
echo $table->childNodes->length;
//content of each child
foreach($table->childNodes as $child)
{
echo $child->ownerDocument->saveHTML($child);
}
may be this like will help you. It has a good tutorial.
http://www.binarytides.com/php-tutorial-parsing-html-with-domdocument/

With PHP Simple HTML Parser, this will be:
include('simple_html_dom.php');
$html=file_get_html("http://your_web_site.com");
foreach($html->find('div.posts img') as $img_posts){
echo $img_posts->src.<br>; // to show the source attribute
}
Still reading about PHP Simple HTML Dom parser. And so far, it's faster(in implementation) than regex.

Here is another code that may help. You are looking for
doc->getElementsByTagName
which can help target a tag directly.
<?php
$myhtml = <<<EOF
<html>
<body>
<div class="posts">
<div class="separator">
<img src="http://www.example.com/image.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
<div class="separator">
<img src="http://www.example.com/imagesda.jpg" />
<p>Be, where I am today, and i will be one where you will search me tomorrow</p>
</div>
.... few more images
</div>
</body>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($myhtml);
$divs = $doc->getElementsByTagName('img');
foreach ($divs as $div) {
foreach ($div->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
echo "Attribute '$name' :: '$value'<br />";
}
}
?>
Demo here http://codepad.org/keZkC377
Also the answer here can provide further insights
Not finding elements using getElementsByTagName() using DomDocument

PHP - GET tag from url

I want to get a specific tag from url, from example:
If I have this content:
<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>
And I want to get all divs with the id "hey", ( i think its with preg_match_all ), How can I do that?
The content inside the tag can be changed.

I recommend use DOMDocument class instead of regular expressions (is less resource consumer and more clear IMHO).
$content = '<div id="hey">
<div id="bla"></div>
</div>
<div id="hey">
<div id="bla"></div>
</div>';
$doc = new DOMDocument();
#$doc->loadHTML($content); // # for possible not standard HTML
$xpath = new DOMXPath($doc);
$elements = $xpath->query("//div[#id='hey']");
/*#var $elements DOMNodeList */
for ($i=0;$i<$elements->length;$i++) {
/*#var $curr_element DOMElement */
$curr_element = $elements->item($i);
// Here do what you want with the element
var_dump($curr_element);
}
If you want to get the content from an URL you can use this line instead to fill the variable $content:
$content = file_get_contents('http://yourserver/urls/page.php');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Extract html from url with DOM - php

You can use library to parsing DOM. For example: https://github.com/tburry/pquery Usage: $dom = pQuery::parseStr($html); $class = $dom->query('#download a')->attr('class');

You can try file_get_html to parse html $html=file_get_html('http://demo.com'); and use the below to get all the attributes of anchor tag. foreach($html->find('div[id=download] a') as $a){ var_dump($a->attr); }

Related

Remove span tag from element html dom parser

Replace class content using php

how to remove link from simple dom html data

Fetching Image from particular div Only via DOMDocument in PHP

PHP - GET tag from url

Categories

Resources