I can't work out how to get the contents of the second span 'cantfindme' using php simple html dom parser(http://simplehtmldom.sourceforge.net/manual.htm). Using the code below I can get the contents of the first span 'dontneedme'. I cant seem to get anything from second span at all.
$html = str_get_html('<html><body><table><tr><td class="bar">bar</td><td><div class="foo"><span class="dontneedme">Hello</span></div></td></tr><tr><td class="bar">bar</td><td><div class="foo"><span class="cantfindme">Goodbye</span></div></td></tr></body></html>');
foreach($html->find('.foo', 0) as $article)
{
echo "++</br>";
echo $article->plaintext;
echo "--</br>";
}
Can anyone see where I'm going wrong?
Try using this selector.
$html->find('div.foo .cantfindme');
Check out the documentation for more examples.
Related
I want to parse HTML code present in $raw to get the title and save it mysql. I have tried to do it with php dom and Ganon HTML parser but when I run it, shows me an error 500. it would be great if you solve this problem with Ganon.
function store($raw)
{
include_once('ganon.php');
$html = file_get_dom($raw);
echo $html('title', 0)->parent->getPlainText();
}
store ('<html> all html code </html>');
There are a few problems with your code.
Firstly you use file_get_dom() which is expecting to be passed in a file name, so usestr_get_dom() instead.
Secondly, the example HTML doesn't contain a title, so this won't work.
Then when you find the title, you go to the parent element and output from there. You just need to use that nodes content.
include_once('ganon.php');
function store($raw)
{
$html = str_get_dom($raw);
echo $html('title', 0)->getPlainText();
}
store ('<html><title>Title</title> all html code </html>');
outputs...
Title of page
I'm using PHP and simple HTML DOM Parser to try and grab song lyrics from a website. The song lyrics are held in a div with the class "lyrics". Here's the code I'm using to try and grab the div and display it. Currently it only returns "Array" onto my webpage. When I jsonify the array I can see that the array is empty.
<?php
include('simple_html_dom.php');
$data = file_get_contents("https://example.com/songlyrics");
$html = str_get_html($data);
$lyr = $html->find('div.lyrics');
echo $lyr;
?>
I know that the Simple HTML Dom Parser is being included correctly, and I have no problem displaying the full webpage when I echo $html with some small changes to the code, however I can't seem to echo just this div. Is there something wrong with my code? Why is $lyr returning an array?
There's nothing wrong with your code.
Why is $lyr returning an array?
It's because a class is considered to be used multiple times. If you var_dump($lyr) instead, you should see all the div-elements found with that class name.
You can either echo $lyr[0] or you can $html->find('div.lyrics',0) to select a specific div element.
I have some text in which there is codes. I want to get last text from the link. here is an example
Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>
I want to get Google.com text from the above code. I have tried and use Simple html dom. Anyway Here is my code
<?PHP
require_once('simple_html_dom.php');
$html = new simple_html_dom();
function tags($ddd){
$bbb=$ddd->find('a',1);
foreach($bbb as $bs){
echo $bs->innertext;
}
}
$html = str_get_html('Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>');
echo tags($html);
?>
I want to get Google.com how to get. Please help me
I strongly recommend you use some external library to parse HTML. Any HTML you need. As you need today or in future needs.
Some very good tools are named inside these stackoverflow post.
I personally use simplehtmldom.sourceforge.net since ages with very good results.
Im trying to use simple_html_dom with php to parse a webpage with this tag:
<div class=" row result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
where data-tn-component="organicJob" is the identifier I want to parse based on, I cant seem to specify the text in a way that simple_html_dom recognizes.
Ive tried a few things along this line:
<?PHP
include 'simple_html_dom.php';
$f="http://www.indeed.com/jobs?q=Electrician&l=maine";
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
?>
but the parser doesn't find any of the results, even though i know they are there. Probably I'm not making specifying the thing I find correctly.
I'm looking at the API, but I still don't understand how to format the find string.
what am I doing wrong?
Your selector is correct but i see other problems in your code
1) you are missing .php in your include include 'simple_html_dom'; it should be
include '/absolute_path/simple_html_dom.php';
2) to load content through url use file_get_html function instead $html->load_file($f); which is wrong as php don't know that $html is simple_html_dom object
$html = file_get_html('http://www.google.com/');
// then only call
$html->find( ...
3) in your provided link: http://www.indeed.com/jobs?q=Electrician+Helper&l=maine there is no present element with data-tn-component attribute
so final code should be
include '/absolute_path/simple_html_dom.php';
$html = file_get_html('http://www.indeed.com/jobs?q=Electrician&l=maine');
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
I've tried to grab all the comments from a website (The text between <!-- and -->), but without luck.
Here is my current code:
include('simple_html_dom.php');
$html = file_get_html('THE URL');
foreach($html->find('comment') as $element)
echo $element->plaintext;
Anyone have any ideas how to grab the comments, at the moment it's only giving me a blank page
I know regex is not supposed to parse HTML, but <!--(.*?)--> you can use a similar regex to find and fetch the comments...