How to grab all the site links recursively by entering domain name?

How to grab all the site links recursively by entering domain name? - php

How to grab all the site links recursively by entering domain name in PHP? Please give me some idea.

For grab the all link of site you need to use Simple Html Dom. here is demo link.
http://simplehtmldom.sourceforge.net/manual.htm
Example : If you want to get all link of the website.
$html = file_get_html('http://www.example.com/'); // Create DOM from URL or file
// For Find all links
foreach($html->find('a') as $element){
echo $element->href . '<br>';
}

Not grab all links, just grab "useful" links by designing a algorithm to evaluate.And set the depth of recursion.

Related

How to fetch all the urls which are not linked using regex

I need to fetch all the urls from the given string which are not linked(url without anchor tag).
I know the regex (http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? to fetch all the urls from the given string.
Input:
<div class='test'>
<p>Heading</p>
<a href='http://www.google.com'>google</a>
www.yahoo.com
http://www.rediff.com
<a href='http://www.overflow.com'>www.overflow.com</a>
</div>
output:
www.yahoo.com
http://www.rediff.com
Kindly advise.

Use library for get dom tree html, and get all links.
for example you can use simplehtml http://simplehtmldom.sourceforge.net/
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all links
foreach($html->find('a') as $element) {
echo $element->href . '<br>';
}

Simple use this will get href source:
href='(.+?)'

Open web-page, get its contents and follow a link that is in there

I want to use a scripting language(Javascript, PHP) to achieve the following task.
1)I need to open a new webpage, given a URL, in a different window.
2)Find in its contents a specific link and open it in the same window.
Is this possible with Javascript? If yes, how is this possible?
PS:The first link is dynamic so I can only to hit it once in order to open it and read it. I have noticed that if I open it and then read it,using get_contents for PHP, there are some differences in the content.

You can you PHP Simple HTML DOM Parser to open the page and find the link you need.
An example for find all links:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
The method 'find' has similar jQuery sintax.
And PHP Simple HTML DOM Parser have a good documentation an examples.
Hope this help!

How can we get specific links using simple html dom

I have used this script which i found in the official simple html dom site to find hyperlinks in a website
foreach($html->find('a') as $element)
echo $element->href . '<br>';
it returned all the links found in the website but i want only specific links in that website.
is there a way of doing it in simple html dom. This is the html code for that specific links
<a class="z" href="http://www.bbc.co.uk/news/world-middle-east-16893609" target="_blank" rel="follow">middle east</a>
where this is the html tag which is different from other hyperlinks
<a class="z"
and also there is any way i can get the link text ("middle east") together with the link.

I understand you'd like all a elements with the class z? You can do that like this:
foreach($html->find('a.z') as $element)
You can get an element's value (which for links will be the link text) with the plaintext property:
$element->plaintext
Please note that this can all be found in the manual.

PHP replace text within a <h1> </h1> tag

I'm using AJAX to call a PHP file which will effectively edit particular bits of content within another HTML file. My problem is that I'm not sure of the best way of targeting these particular areas.
I figured some sort of unique identifier would need to attached to the tag that needs to be edited or in a comment perhaps, and then PHP simply searches for this before doing the replacing?

Use simplehtml for this.
You can change all <h1> to foo like this:
$html = file_get_html('http://www.google.com/');
foreach($html->find('h1') as $element)
{
$element->innertext = 'foo';
}
echo $html;

The simplehtmldom framework allows you to search and modify the DOM of a HTML file or url.
http://simplehtmldom.sourceforge.net/
// Create DOM from URL or file $html =
file_get_html('http://www.google.com/');
// Find all images foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links foreach($html->find('a') as $element)
echo $element->href . '<br>';
Another nice library is querypath. It is very similar to jquery:
qp($html_code)->find('body')->text('Hello World')->writeHTML();
https://fedorahosted.org/querypath/wiki/QueryPathTutorial

How to write this crawler in php?

I need to create a php script.
The idea is very simple:
When I send a link of a blogpost to this php script, then the webpage is crawled and the first image with the title page are saved on my server.
What PHP function I have to use for this crawler ?

Use PHP Simple HTML DOM Parser
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all images
$images = array();
foreach($html->find('img') as $element) {
$images[] = $element->src;
}
Now $images array have images links of given webpage. Now you can store your desired image in database.

HTML Parser: HTMLSQL
Features: you can get external html file, http or ftp link and parse content.

Well, you'll have to use quite a few functions :)
But I'm going to assume that you're asking specifically about finding the image, and say that you should use a DOM parser like Simple HTML DOM Parser, then curl to grab the src of the first img element.

I would user file_get_contents() and a regular expression to extract the first image tags src attribute.
CURL or a HTML Parser seem overkill in this case, but you are welcome to check it out.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to grab all the site links recursively by entering domain name? - php

How to grab all the site links recursively by entering domain name in PHP? Please give me some idea.

Not grab all links, just grab "useful" links by designing a algorithm to evaluate.And set the depth of recursion.

Related

How to fetch all the urls which are not linked using regex

Open web-page, get its contents and follow a link that is in there

How can we get specific links using simple html dom

PHP replace text within a <h1> </h1> tag

How to write this crawler in php?

Categories

Resources