I want to parse HTML code present in $raw to get the title and save it mysql. I have tried to do it with php dom and Ganon HTML parser but when I run it, shows me an error 500. it would be great if you solve this problem with Ganon.
function store($raw)
{
include_once('ganon.php');
$html = file_get_dom($raw);
echo $html('title', 0)->parent->getPlainText();
}
store ('<html> all html code </html>');
There are a few problems with your code.
Firstly you use file_get_dom() which is expecting to be passed in a file name, so usestr_get_dom() instead.
Secondly, the example HTML doesn't contain a title, so this won't work.
Then when you find the title, you go to the parent element and output from there. You just need to use that nodes content.
include_once('ganon.php');
function store($raw)
{
$html = str_get_dom($raw);
echo $html('title', 0)->getPlainText();
}
store ('<html><title>Title</title> all html code </html>');
outputs...
Title of page
Related
I'm using PHP and simple HTML DOM Parser to try and grab song lyrics from a website. The song lyrics are held in a div with the class "lyrics". Here's the code I'm using to try and grab the div and display it. Currently it only returns "Array" onto my webpage. When I jsonify the array I can see that the array is empty.
<?php
include('simple_html_dom.php');
$data = file_get_contents("https://example.com/songlyrics");
$html = str_get_html($data);
$lyr = $html->find('div.lyrics');
echo $lyr;
?>
I know that the Simple HTML Dom Parser is being included correctly, and I have no problem displaying the full webpage when I echo $html with some small changes to the code, however I can't seem to echo just this div. Is there something wrong with my code? Why is $lyr returning an array?
There's nothing wrong with your code.
Why is $lyr returning an array?
It's because a class is considered to be used multiple times. If you var_dump($lyr) instead, you should see all the div-elements found with that class name.
You can either echo $lyr[0] or you can $html->find('div.lyrics',0) to select a specific div element.
Im trying to use simple_html_dom with php to parse a webpage with this tag:
<div class=" row result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
where data-tn-component="organicJob" is the identifier I want to parse based on, I cant seem to specify the text in a way that simple_html_dom recognizes.
Ive tried a few things along this line:
<?PHP
include 'simple_html_dom.php';
$f="http://www.indeed.com/jobs?q=Electrician&l=maine";
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
?>
but the parser doesn't find any of the results, even though i know they are there. Probably I'm not making specifying the thing I find correctly.
I'm looking at the API, but I still don't understand how to format the find string.
what am I doing wrong?
Your selector is correct but i see other problems in your code
1) you are missing .php in your include include 'simple_html_dom'; it should be
include '/absolute_path/simple_html_dom.php';
2) to load content through url use file_get_html function instead $html->load_file($f); which is wrong as php don't know that $html is simple_html_dom object
$html = file_get_html('http://www.google.com/');
// then only call
$html->find( ...
3) in your provided link: http://www.indeed.com/jobs?q=Electrician+Helper&l=maine there is no present element with data-tn-component attribute
so final code should be
include '/absolute_path/simple_html_dom.php';
$html = file_get_html('http://www.indeed.com/jobs?q=Electrician&l=maine');
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
I am studying parsing HTML on PHP and I am using DOM for this.
I write this code inside my php file:
<?php
$site = new DOMDocument();
$div = $site->createElement("div");
$class = $site->createAttribute("class");
$class->nodeValue = "wrapper";
$div->appendChild($class);
$site->appendChild($div);
$html = $site->saveHTML();
echo $html;
?>
And when I run this on the browser and view the page source, only this code comes out:
<div class="wrapper"></div>
I don't know why it is not showing the whole html document that supposedly have to be. I am using XAMPP v3.2.1.
Please tell me where did I gone wrong with this. Thanks.
It's showing the whole HTML you created. A div node with a wrapper class attribute.
See the example in the docs. There the html, head, etc. nodes are explicitly created.
PHP only adds missing DOCTYPE, html and body elements when loading HTML, not when saving.
Adding $site->loadHTML($site->saveHTML()); before $html = $site->saveHTML(); will demonstrate this.
I've tried to grab all the comments from a website (The text between <!-- and -->), but without luck.
Here is my current code:
include('simple_html_dom.php');
$html = file_get_html('THE URL');
foreach($html->find('comment') as $element)
echo $element->plaintext;
Anyone have any ideas how to grab the comments, at the moment it's only giving me a blank page
I know regex is not supposed to parse HTML, but <!--(.*?)--> you can use a similar regex to find and fetch the comments...
I can't work out how to get the contents of the second span 'cantfindme' using php simple html dom parser(http://simplehtmldom.sourceforge.net/manual.htm). Using the code below I can get the contents of the first span 'dontneedme'. I cant seem to get anything from second span at all.
$html = str_get_html('<html><body><table><tr><td class="bar">bar</td><td><div class="foo"><span class="dontneedme">Hello</span></div></td></tr><tr><td class="bar">bar</td><td><div class="foo"><span class="cantfindme">Goodbye</span></div></td></tr></body></html>');
foreach($html->find('.foo', 0) as $article)
{
echo "++</br>";
echo $article->plaintext;
echo "--</br>";
}
Can anyone see where I'm going wrong?
Try using this selector.
$html->find('div.foo .cantfindme');
Check out the documentation for more examples.