Get all HTML comments on website with html simple dom - php

I've tried to grab all the comments from a website (The text between <!-- and -->), but without luck.
Here is my current code:
include('simple_html_dom.php');
$html = file_get_html('THE URL');
foreach($html->find('comment') as $element)
echo $element->plaintext;
Anyone have any ideas how to grab the comments, at the moment it's only giving me a blank page

I know regex is not supposed to parse HTML, but <!--(.*?)--> you can use a similar regex to find and fetch the comments...

Related

how to parse this to get title in php

I want to parse HTML code present in $raw to get the title and save it mysql. I have tried to do it with php dom and Ganon HTML parser but when I run it, shows me an error 500. it would be great if you solve this problem with Ganon.
function store($raw)
{
include_once('ganon.php');
$html = file_get_dom($raw);
echo $html('title', 0)->parent->getPlainText();
}
store ('<html> all html code </html>');
There are a few problems with your code.
Firstly you use file_get_dom() which is expecting to be passed in a file name, so usestr_get_dom() instead.
Secondly, the example HTML doesn't contain a title, so this won't work.
Then when you find the title, you go to the parent element and output from there. You just need to use that nodes content.
include_once('ganon.php');
function store($raw)
{
$html = str_get_dom($raw);
echo $html('title', 0)->getPlainText();
}
store ('<html><title>Title</title> all html code </html>');
outputs...
Title of page

How to get desire innertext from html tag in simple html dom

I have some text in which there is codes. I want to get last text from the link. here is an example
Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>
I want to get Google.com text from the above code. I have tried and use Simple html dom. Anyway Here is my code
<?PHP
require_once('simple_html_dom.php');
$html = new simple_html_dom();
function tags($ddd){
$bbb=$ddd->find('a',1);
foreach($bbb as $bs){
echo $bs->innertext;
}
}
$html = str_get_html('Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>');
echo tags($html);
?>
I want to get Google.com how to get. Please help me
I strongly recommend you use some external library to parse HTML. Any HTML you need. As you need today or in future needs.
Some very good tools are named inside these stackoverflow post.
I personally use simplehtmldom.sourceforge.net since ages with very good results.

how do I find a tag with simple_html_DOM

Im trying to use simple_html_dom with php to parse a webpage with this tag:
<div class=" row result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
where data-tn-component="organicJob" is the identifier I want to parse based on, I cant seem to specify the text in a way that simple_html_dom recognizes.
Ive tried a few things along this line:
<?PHP
include 'simple_html_dom.php';
$f="http://www.indeed.com/jobs?q=Electrician&l=maine";
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
?>
but the parser doesn't find any of the results, even though i know they are there. Probably I'm not making specifying the thing I find correctly.
I'm looking at the API, but I still don't understand how to format the find string.
what am I doing wrong?
Your selector is correct but i see other problems in your code
1) you are missing .php in your include include 'simple_html_dom'; it should be
include '/absolute_path/simple_html_dom.php';
2) to load content through url use file_get_html function instead $html->load_file($f); which is wrong as php don't know that $html is simple_html_dom object
$html = file_get_html('http://www.google.com/');
// then only call
$html->find( ...
3) in your provided link: http://www.indeed.com/jobs?q=Electrician+Helper&l=maine there is no present element with data-tn-component attribute
so final code should be
include '/absolute_path/simple_html_dom.php';
$html = file_get_html('http://www.indeed.com/jobs?q=Electrician&l=maine');
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}

Using cURL or DOM to webscrape

I've been working on this for about four hours and have been all over the internet trying to understand it, so please be gentle.
I'd like to display a div from an external source on my php page. I've tried usingfile_get_dom, simplexml_load_file, file_get_contents with preg_match_all, then printed them on my page, but they don't work. cURLing is over my head from what I have seen and can't understand any of it, but I've been told it is the best way to do it. They all result in various errors when all I want is to grab the contents of an external div. What should I do?
An example would be scraping the div id='hmenus' on this page, then displaying it on my local page.
Thanks!
If cURL is over your head then perhaps try Simple HTML DOM
$html = file_get_html($url);
echo $html->find('div[id=hmenus]', 0);

Php file_get_contents() issue

With php file_get_contents() i want just only the post and image. But it's get whole page. (I know there is other way to do this)
Example:
$homepage = file_get_contents('http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5',
true);
echo $homepage;
It's show full page. Is there any way to show only the post which cid=2&id=221107&hb=5.
Thanks a lot.
Use PHP's DomDocument to parse the page. You can filter it more if you wish, but this is the general idea.
$url = 'http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5';
// Create new DomDocument
$doc = new DomDocument();
$doc->loadHTMLFile($url);
// Get the post
$post = $doc->getElementById('opage_mid_left');
var_dump($post);
Update:
Unless the image is a requirement, I'd use the printer-friendly version: http://www.bdnews24.com/pdetails.php?id=221107, it's much cleaner.
You will need to parse the resulting HTML using a DOM parser to get the HTML of only the part you want. I like PHP Simple HTML DOM Parser, but as Paul pointed out, PHP also has it's own.
you can extract the
<div id="page">
//POST AND IMAGE EXIST HERE
</div>
part from the fetched contents using regex and push it on your page...

Categories