I have used this script which i found in the official simple html dom site to find hyperlinks in a website
foreach($html->find('a') as $element)
echo $element->href . '<br>';
it returned all the links found in the website but i want only specific links in that website.
is there a way of doing it in simple html dom. This is the html code for that specific links
<a class="z" href="http://www.bbc.co.uk/news/world-middle-east-16893609" target="_blank" rel="follow">middle east</a>
where this is the html tag which is different from other hyperlinks
<a class="z"
and also there is any way i can get the link text ("middle east") together with the link.
I understand you'd like all a elements with the class z? You can do that like this:
foreach($html->find('a.z') as $element)
You can get an element's value (which for links will be the link text) with the plaintext property:
$element->plaintext
Please note that this can all be found in the manual.
Related
I have no idea of what to do about this and I'm probably gonna get some down votes.
I have an web page similar to this:
<li class="specific-class">
Unknown Link
</li>
I want to crawl a page filled with several other elements I'm not interested in retrieving.
I want to retrieve only the href attribute in the anchor tag, within the li element and nothing else. After which I will then follow the link and get another webpage that has something like this:
<h1 class="specific-class">Blah Blah Blah</h1>
So at the end of it all, I'll get whatever is in the h1 element:
Blah Blah Blah
If you guys could help me get around this I'd greatly appreciate. Also, any API's will do nicely.
I have this piece of code that gets attributes from an element but I've not been able to get it to crawl elements found within a specific element.
<?php
include_once('simple_html_dom.php');
$target_url = "https://www.google.com/";
$html = new simple_html_dom();
$html->load_file($target_url);
foreach($html->find('a') as $link){
echo $link->href."<br>";
}
?>
Please read about DOMDocument. You can use the methods: getElementsByTagName, getElementById etc.
How to grab all the site links recursively by entering domain name in PHP? Please give me some idea.
For grab the all link of site you need to use Simple Html Dom. here is demo link.
http://simplehtmldom.sourceforge.net/manual.htm
Example : If you want to get all link of the website.
$html = file_get_html('http://www.example.com/'); // Create DOM from URL or file
// For Find all links
foreach($html->find('a') as $element){
echo $element->href . '<br>';
}
Not grab all links, just grab "useful" links by designing a algorithm to evaluate.And set the depth of recursion.
I've tried to grab all the comments from a website (The text between <!-- and -->), but without luck.
Here is my current code:
include('simple_html_dom.php');
$html = file_get_html('THE URL');
foreach($html->find('comment') as $element)
echo $element->plaintext;
Anyone have any ideas how to grab the comments, at the moment it's only giving me a blank page
I know regex is not supposed to parse HTML, but <!--(.*?)--> you can use a similar regex to find and fetch the comments...
I want to use a scripting language(Javascript, PHP) to achieve the following task.
1)I need to open a new webpage, given a URL, in a different window.
2)Find in its contents a specific link and open it in the same window.
Is this possible with Javascript? If yes, how is this possible?
PS:The first link is dynamic so I can only to hit it once in order to open it and read it. I have noticed that if I open it and then read it,using get_contents for PHP, there are some differences in the content.
You can you PHP Simple HTML DOM Parser to open the page and find the link you need.
An example for find all links:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
The method 'find' has similar jQuery sintax.
And PHP Simple HTML DOM Parser have a good documentation an examples.
Hope this help!
I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.
Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order.
I have tried GetElementbyName but then you have no possibility to organize information.
I have tried DOMXPath->query() but I found it really confusing.
Just parsing something like:
<html>
<head></head>
<body>
<h2>Title1</h2>
<p>Paragraph1</p>
<p>Paragraph2</p>
<h2>Title2</h2>
<p>Paragraph3</p>
</body>
</html>
into:
Title1
Paragraph1
Paragraph2
Title2
Paragraph3
With a few bits of HTML code I do not need between all.
Thank you. I hope question does not look like homework.
I think DOMXPath->query() is the right approach. This XPath expression will return all nodes that are either a <h2> or a <p> on the same level (since you said they were siblings).
/html/body/*[name() = 'p' or name() = 'h2']
The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.
I have uased a few times simple html dom by S.C.Chen.
Perfect class for access dom elements.
Example:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
Check it out here. simplehtmldom
May help with future projects
Try having a look at this library and corresponding project:
Simple HTML DOM
This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.