I'm using the simple HTML DOM parser for my own template system and found a problem.
Here's my markup:
<div class=content>
<div class=navigation></div>
</div>
I'm replacing the div.navigation with own content like:
$navi= $dom->find("div.navigation",0);
$navi->outertext = "<a class=aNavi>click me!</a>";
works nicely - i can echo it but the problem is - before echoing i still want to access/manipulate that link with the parser, but the parser won't find it.
$link = $dom->find("a.aNavi");
will return null :(
Seems like the parser needs to be refreshed/updated after changing the outertext - any ideas if it's possible?
I don't see any createElement-like method in the API reference, which means either the documentation is incomplete or you're using the wrong tool for the job.
I suggest using DOMDocument, and the DOMDocument::createElement() method. However, if you're dead set on using Simple HTML DOM Parser, you could try this hack:
$navi = $dom->find('div.navigation', 0);
$navi->outertext = '<a class="aNavi">click me!</a>';
$dom = $dom->save();
$dom = str_get_html($dom);
$link = $dom->find('a.aNavi');
Related
I have some text in which there is codes. I want to get last text from the link. here is an example
Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>
I want to get Google.com text from the above code. I have tried and use Simple html dom. Anyway Here is my code
<?PHP
require_once('simple_html_dom.php');
$html = new simple_html_dom();
function tags($ddd){
$bbb=$ddd->find('a',1);
foreach($bbb as $bs){
echo $bs->innertext;
}
}
$html = str_get_html('Some textBeezfeed.cu.ma<br>
another textGoogle.com<br>');
echo tags($html);
?>
I want to get Google.com how to get. Please help me
I strongly recommend you use some external library to parse HTML. Any HTML you need. As you need today or in future needs.
Some very good tools are named inside these stackoverflow post.
I personally use simplehtmldom.sourceforge.net since ages with very good results.
Im trying to use simple_html_dom with php to parse a webpage with this tag:
<div class=" row result" id="p_a8a968e2788dad48" data-jk="a8a968e2788dad48" itemscope itemtype="http://schema.org/JobPosting" data-tn-component="organicJob">
where data-tn-component="organicJob" is the identifier I want to parse based on, I cant seem to specify the text in a way that simple_html_dom recognizes.
Ive tried a few things along this line:
<?PHP
include 'simple_html_dom.php';
$f="http://www.indeed.com/jobs?q=Electrician&l=maine";
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
?>
but the parser doesn't find any of the results, even though i know they are there. Probably I'm not making specifying the thing I find correctly.
I'm looking at the API, but I still don't understand how to format the find string.
what am I doing wrong?
Your selector is correct but i see other problems in your code
1) you are missing .php in your include include 'simple_html_dom'; it should be
include '/absolute_path/simple_html_dom.php';
2) to load content through url use file_get_html function instead $html->load_file($f); which is wrong as php don't know that $html is simple_html_dom object
$html = file_get_html('http://www.google.com/');
// then only call
$html->find( ...
3) in your provided link: http://www.indeed.com/jobs?q=Electrician+Helper&l=maine there is no present element with data-tn-component attribute
so final code should be
include '/absolute_path/simple_html_dom.php';
$html = file_get_html('http://www.indeed.com/jobs?q=Electrician&l=maine');
$html->load_file($f);
foreach($html->find('div[data-tn-component="organicJob"]') as $div)
{
echo $div->innertext ;
}
I am studying parsing HTML on PHP and I am using DOM for this.
I write this code inside my php file:
<?php
$site = new DOMDocument();
$div = $site->createElement("div");
$class = $site->createAttribute("class");
$class->nodeValue = "wrapper";
$div->appendChild($class);
$site->appendChild($div);
$html = $site->saveHTML();
echo $html;
?>
And when I run this on the browser and view the page source, only this code comes out:
<div class="wrapper"></div>
I don't know why it is not showing the whole html document that supposedly have to be. I am using XAMPP v3.2.1.
Please tell me where did I gone wrong with this. Thanks.
It's showing the whole HTML you created. A div node with a wrapper class attribute.
See the example in the docs. There the html, head, etc. nodes are explicitly created.
PHP only adds missing DOCTYPE, html and body elements when loading HTML, not when saving.
Adding $site->loadHTML($site->saveHTML()); before $html = $site->saveHTML(); will demonstrate this.
With php file_get_contents() i want just only the post and image. But it's get whole page. (I know there is other way to do this)
Example:
$homepage = file_get_contents('http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5',
true);
echo $homepage;
It's show full page. Is there any way to show only the post which cid=2&id=221107&hb=5.
Thanks a lot.
Use PHP's DomDocument to parse the page. You can filter it more if you wish, but this is the general idea.
$url = 'http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5';
// Create new DomDocument
$doc = new DomDocument();
$doc->loadHTMLFile($url);
// Get the post
$post = $doc->getElementById('opage_mid_left');
var_dump($post);
Update:
Unless the image is a requirement, I'd use the printer-friendly version: http://www.bdnews24.com/pdetails.php?id=221107, it's much cleaner.
You will need to parse the resulting HTML using a DOM parser to get the HTML of only the part you want. I like PHP Simple HTML DOM Parser, but as Paul pointed out, PHP also has it's own.
you can extract the
<div id="page">
//POST AND IMAGE EXIST HERE
</div>
part from the fetched contents using regex and push it on your page...
First, I know that I can get the HTML of a webpage with:
file_get_contents($url);
What I am trying to do is get a specific link element in the page (found in the head).
e.g:
<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)
My question is: How can I get that specific element with the "rel" attribute equal to "service" so I can get the href?
My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.
Also, the html might have errors. I don't have control on how my users code there stuff.
Using PHP's DOMDocument, this should do it (untested):
$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
if($l->getAttribute("rel") == "service") {
echo $l->getAttribute("href");
}
}
You should get the Base element, but know how it works and its scope.
In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.
http://code.google.com/p/phpquery/
I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.
Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.