I want to extract content from a div, but not needed the contents from its childrens. I m using simplehtmldom parser and the following code
//html code
<div id="frame">
Needed this content
Not needed
</div>
//php code
$elem = file_get_html($url);
$content = $elem->find('div#frame')->plaintext;
echo $content;
but this code results,
Needed this contentNot needed
I want the result as,
Needed this content
How to change th code for getting that output. Help plz. thanks in advance
The only way I can think of, is to delete all your div's children, then print the left content... Here's how:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$text = '<div id="frame">
<span><b>Not needed</b></span>
Needed this content
Not needed
</div>';
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($text);
$content = $html->find('div#frame',0);
// Before
echo $content->innertext;
// Delete all unwanted children
foreach( $content->children() as $i => $unwantedTags ) {
echo "<br/>$i => ".$unwantedTags->tag;
$unwantedTags->outertext = '';
}
// After
echo "<br/>".$content->innertext;
// Clear dom object
$html->clear();
unset($html);
See this working DEMO
Related
I need some help with my php. I'm trying to fetch the list of elements from html tag called streams. When I try to fetch the elements from the html tags, it will not fetch it as it will show as a blank.
Here is the code:
<?php
//ini_set('max_execution_time', 1000);
error_reporting(0);
//$errmsg_arr = array();
//$errflag = false;
$baseUrl = file_get_contents('http://www.example.com/streams.php');
$domdoc = new DOMDocument();
$domdoc->strictErrorChecking = false;
$domdoc->recover=true;
#$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementById('streams');
echo $links;
?>
Here is the html source:
<p id='channels'>BBC One South E</p><p id='streams'>http://www.example.com/live/35vlz78As8/191</p>
Can you please help me with how I can be able to fetch the elements from the id of the html tags?
Remove the # at the begin of $domdc so you can see if some error happen and
You should use textContent
$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementById('streams');
echo $links->textContent;
I'm using Simple HTML DOM to try and extract a div and all of it's contents from a target URL, here is my code:
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
foreach($html->find('.accordion') as $element)
echo $element . '<br>';
?>
The problem I have is that the above code only extracts the plain text of the div. There are also images in the div that I need to extract. If I use this following code, then all images are extracted but so is everything else in the page.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
echo $html;
?>
So my question is, how can I use the first bit of code to extract the contents + images from .accordion?
Thanks
You could always try;
$imgs = array();
foreach($html->find('.accordion',0)->find('img') as $img){
$imgs[] = $img->src;
}
print_r($imgs);
This should populate the $imgs variable with all of the image links from the .accordion div.
:)
I know this topic was posted everywhere, but their question is not I want. I want to insert some HTML codes before the page is loaded without touching the original code in the page.
Suppose my header was rendered by a function called render_header():
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
From now, I want to insert HTML codes using PHP without editing the render_body(). I want a function that insert some divs to container'div.
render_body();
<?php *//Insert '<div class="c" inside div container* ?>
Just as an alternative using XPath - this should load in the output from render_body() to an XML (DOMDocument) object and create an XPath object to query your HTML so you can easily work out where you want to insert the new HTML.
This will probably only work if you're using XML well formed HTML though.
//read in the document
$xml = new DOMDocument();
$xml->loadHTML(render_body());
//create an XPath query object
$xpath = new DOMXpath($xml);
//create the HTML nodes you want to insert
// using $xml->createElement() ...
//find the node to which you want to attach the new content
$xmlDivClassA = $xpath->query('//body/div[#class="a"]')->item(0);
$xmlDivClassA->appendChild( /* the HTML nodes you've previously created */ );
//output
echo $xml->saveHTML();
Took a little while as I had to refer to the documentation ... too much JQuery lately it's ruining my ability to manipulate the DOM without looking things up :\
The only thing I can think of is to turn on output buffering and then use the DOMDocument class to read in the entire buffer and then make changes to it. It is worth doing some reading of the documentation (http://www.php.net/manual/en/book.dom.php) provided in the script...
ie.:
<?php
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
$dom = new DOMDocument();
$dom->loadHTML(render_body());
// get body tag
$body = $dom->getElementsByTagName('body')->item(0);
// add a new element at the end of the body
$element = $dom->createElement('div', 'My new element at the end!');
$body->appendChild($element);
echo $dom->saveHTML(); // echo what is in the dom
?>
EDIT:
As per CD001's suggestions, I have tested this code and it works.
For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.
Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.
Here is the code I have written so far after following some tutorials.
<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {
foreach($tr->find('td[class=title-col]') as $t) {
//get the inner HTML
$data = $t->outertext;
echo $data;
}
}
?>
Hopefully someone can point me in the right direction as to how I can get this working.
Thanks.
The raw source code is different, that's why you're not getting the expected results...
You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:
$url = 'http://www.freelancer.com/jobs/Website-Design/1/';
// Create DOM from URL
$html = file_get_html($url);
//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
// Skip the first empty element
if ($i==0) {
continue;
}
echo "<br/>\$i=".$i;
// get the first anchor
$anchor = $tr->find('a', 0);
echo " => ".$anchor->href;
}
// Clear dom object
$html->clear();
unset($html);
Demo
Folks,
I am using SIMPLEHTMLPARSER.
I am not able to parse HTML, When i var_dump the html document, it just shows the DOM structure and no HTML content.
$produrl = 'http://wap.ebay.com/Pages/ViewItem.aspx?aid=160586179890&sv=160586179890/';
var_dump(file_get_html($produrl));
$html = file_get_html($produrl);
var_dump($html->find('div[id=Teaser_Item] img[src]', 0));
Actually, what i want to extract is the IMG SRC which is:
http://wap.ebay.com/Pages/RbHttpHandler.ashx?width=51&height=240&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C!jEE2n%28iTLozBNwBPG0bUg~~0_1.JPG%3Fset_id%3D8800005007
can someone help me debugging this, please?
Cheers
Natasha Thomas
<?php
require_once('simple_html_dom.php');
$produrl = 'http://wap.ebay.com/Pages/ViewItem.aspx?aid=160586179890&sv=160586179890/';
// Grab the document
$html = file_get_html($produrl);
// Find the img tag in the Teaser_Item div
$a = $html->find('div[id=Teaser_Item] img', 0);
// Display the src
echo($a->attr['src']);
?>